Bayesian population genomics in autopolyploids

Usage

polyfreqs(tM, rM, ploidy, iter = 1e+05, thin = 100, burnin = 20, print = 1000, error = 0.01, genotypes = FALSE, geno_dir = "genotypes", col_header = "", outfile = "polyfreqs-mcmc.out", quiet = FALSE)

Arguments

tM
Total reads matrix: matrix containing the total number of reads mapping to each locus for each individual.
rM
Reference reads marix: matrix containing the number of reference reads mapping to each locus for each individual.
ploidy
The ploidy level of individuals in the population (must be >= 2).
iter
The number of MCMC generations to run (default=100,000).
thin
Thins the MCMC output by sampling everything thin generations (default=100).
burnin
Percent of the posterior samples to discard as burn-in (default=20).
print
Frequency of printing the current MCMC generation to stdout (default=1000).
error
The level of sequencing error. A fixed constant (default=0.01).
genotypes
Logical variable indicating whether or not to print the values of the genotypes sampled during the MCMC (default=FALSE).
geno_dir
File path to directory containing the posterior samples of genotypes output by polyfreqs (default = "genotypes").
col_header
Optional column header tag for use in running loci in parallel (default="").
outfile
The name of the ouput file that samples from the posterior distribution of allele frequencies are written to (default="polyfreqs-mcmc.out").
quiet
Suppress the printing of the current MCMC generation to stdout (default=FALSE).

Value

Returns a list of 3 (4 if genotypes=TRUE) items:
posterior_freqs
A matrix of the posterior samples of allele frequencies. These are also printed to the file with the name given by the outfile argument.

map_genotypes
If genotypes=TRUE, then a fourth item will be returned as a matrix containing the maximum a posteriori genotype estimates accounting for burn-in.

het_obs
Matrix of posterior samples of observed heterozygosity.

het_exp
Matrix of posterior samples of expected heterozygosity.

Description

polyfreqs implements a Gibbs sampling algorithm to perform Bayesian inference on the allele frequencies (and other quantities) in a population of autopolyploids. It is the main function for conducting inference with the polyfreqs package.

Details

Data sets run through polyfreqs must be of class "matrix" with row names representing the names of the individuals sampled. The simplest way to get data into R for running an analysis is to format the total read matrix and reference read matrix as tab delimited text files with the first column containing the individual names and one column after that with the read counts for each locus. These data can then be read in using the read.table function with the row.names argument set equal to 1. An optional tab delimited list of locus names can be included as the first row and are treated as column headers for each locus (set header=T in the read.table function). When running the polyfreqs, there are a number of options that control what the function returns. To estimate genotypes and print posterior genotype samples to file, set the genotypes argument to TRUE and select a name for the output directory geno_dir (defaults to "genotypes"). polyfreqs also prints the current MCMC generation (with a frequency set by the print_freqs argument) to the R console so that users can track run times. This print can be turned off by setting quiet=TRUE. More details on using polyfreqs can be found in the introductory vignette.

References

Blischak PD, LS Kubatko and AD Wolfe. Accounting for genotype uncertainty in the estimation of allele frequencies in autopolyploids. In revision.

Examples

data(total_reads) data(ref_reads) polyfreqs(total_reads,ref_reads,4,iter=100,thin=10)
Starting MCMC...
$posterior_freqs [,1] [,2] [1,] 0.3905059 0.4236661 [2,] 0.4762795 0.4194708 [3,] 0.6663735 0.4693287 [4,] 0.5252888 0.4155101 [5,] 0.4254997 0.2642039 [6,] 0.3952163 0.3830535 [7,] 0.4731001 0.3573495 [8,] 0.4098106 0.4160133 [9,] 0.4497170 0.4514914 [10,] 0.4588010 0.5026862 $het_obs [,1] [,2] [1,] 0.6333333 0.6333333 [2,] 0.6000000 0.6333333 [3,] 0.6166667 0.6166667 [4,] 0.6166667 0.6333333 [5,] 0.6500000 0.6000000 [6,] 0.6333333 0.6166667 [7,] 0.6000000 0.6000000 [8,] 0.6333333 0.6333333 [9,] 0.6166667 0.5833333 [10,] 0.5833333 0.6166667 $het_exp [,1] [,2] [1,] 0.4761357 0.4898292 [2,] 0.5043052 0.4883668 [3,] 0.4426553 0.5020762 [4,] 0.5027455 0.4869143 [5,] 0.4890549 0.3820004 [6,] 0.4783786 0.4737745 [7,] 0.5039475 0.4603352 [8,] 0.4847019 0.4871028 [9,] 0.4985480 0.5017154 [10,] 0.5031725 0.5041506

Author

Paul Blischak