Simulation of sequencing read counts and genotypes

Usage

sim_reads(pVec, N_ind, coverage, ploidy, error)

Arguments

pVec
A vector of allele frequencies strung together using the concatenate function.
N_ind
The number of individuals to simulate.
coverage
The average number of sequences simulated per individual per locus (Poisson distributed).
ploidy
The ploidy level of individuals in the population.
error
The level of sequencing error. A fixed constant.

Value

A list of 3 matrices:
genos
A matrix of the simulated genotypes.

tot_read_mat
A matrix of the simulated number of total reads.

ref_read_mat
A matrix of the simulated number of reference reads.

Description

Simulates genotypes and read counts under the model of Blischak et al.

Details

Total reads are simulated using a Poisson distribution with mean equal to the coverage set by the user. Next, genotypes are simulated for the specified number of individuals using the vector of allele frequencies provided to the function. The number of loci simulated is equal to the number of elements supplied by the vector of allele frequencies. The number of reference reads is then simulated using Eq. 1 from Blischak et al. using the total reads, genotypes and sequencing error.

References

Blischak PD, Kubatko LS, Wolfe AD. 2015. Accounting for genotype uncertainty in the estimation of allele frequencies in autopolyploids. In review. bioRxiv, doi:####.