Research

My research combines statistical modeling, genomic data analysis, and computer programming to answer fundamental questions about the nature and consequences of duplication in biology. Below are three of my main research foci.

Population Genomics of Polyploidy

For my postdoc, the primary question I’m interested in addressing is what are the demographic consequences of WGD? We are currently working to develop models for inferring past demographic events (eg, migration, divergence, bottlenecks) in polyploids using genome-wide SNP data. We have also incorporated inbreeding into these inferences, as well as building models that can handle differences in the rate of homoeolgous chromosome exchange between polyploid subgenomes. All of the new methods that we develop are being added to the Python package dadi.

With our newly developed models in hand, the second phase of my postdoc project aims to gain a better understanding of the process of domestication in polyploid crops. Using whole-genome resequencing data, we plan to estimate the demographic history of several species, focusing primarily in the genus Brassica, and then to use this demographic knowledge to do a better job of modeling selection.

Modeling High-Throughput Sequencing Data

During my Ph.D., I spent most of my time developing methods for SNP and haplotype inference in polyploids. As part of this, I developed several software packages to distribute these models for other people to use. The two main software packages I developed are EBG and Fluidigm2PURC. EBG has has several models for inferring biallelic SNPs from high-throughput sequencing data in both auto- and allopolyploids. Fluidigm2PURC was developed to process paired-end sequencing data generated from double-barcoded PCR amplicons (Uribe-Convers et al. 2016) and to infer haplotypes from these data for diploids, polyploids, and even individuals with unknown ploidy.

Statistical Phylogenomics

In addition to building tools for HTS data, I have been interested in building models for phylogenetic inference in hybrid and polyploid taxa as well. One of the main methods I contributed to is the hybridization detection software HyDe, where I modified the original method of Kubatko and Chifman to accommodate multiple individuals sampled per species and added individual-level hypothesis testing. I also developed QCF as part of my dissertation, which is a method that estimates concordance factors for quartets of species. Quartet concordance factors quantify the number of genes that support a given species-level quartet topology (AB|CD, AC|BD, and AD|BC). QCF allows these values to be calculated for multiple individuals/haplotypes sampled per gene, and can also conduct gene-level and quartet-level bootstrapping to account for gene tree uncertainty and to estimate confidence itervals for the condordance factors. Because both of these methods can handle multiple haplotypes sampled per taxon, they can be used for both diploid and polyploid species.