next up previous contents
Next: Population Groups of the Up:   Sharing of Very Previous: How Many Individuals Share   Contents


Small Likelihood of Identity by State: Small MAF vs. Many Individuals

In step 1 of IBD segment extraction from FABIA models as described in subsection ``Extraction of IBD Segments from FABIA Models'' of the main manuscript, the likelihood of observing identity by state (IBS) without IBD and, therefore, without recombination, is computed. If $ q$ is the minor allele frequency (MAF) for one SNV, the probability of observing the minor allele of this SNV in all $ t$ individuals is $ q^t$ . We assumed that all SNVs have the same MAF $ q$ -- in the experiments we used the average MAF. According to the main manuscript, the probability of observing $ k$ or more counts of model SNVs by chance in an interval of $ n$ SNVs is

  $\displaystyle \genfrac{(}{)}{0pt}{0}{l}{t} \ \sum_{i=k}^n \ \genfrac{(}{)}{0pt}{0}{n}{i} \ q^{it} \ \left(1-q^t \right)^{n-i} \ ,$ (2)

where $ l$ is the number of individuals and $ \genfrac{(}{)}{0pt}{1}{l}{t}$ is the number of possibilities to chose $ t$ individuals from the $ l$ individuals of the study. The number of counts $ i$ runs from $ k$ to the number of SNVs $ n$ .

If we try to minimize this probability, we observe a trade-off between small average minor allele frequency (MAF) $ q$ and the number $ t$ of individuals that share an IBD segment. For a small IBS likelihood, the average MAF $ q$ should be small and $ t$ large. However, increasing $ t$ also increases the lower bound $ \frac{t}{l}$ on $ q$ . If we assume minimal $ q=\frac{t}{l}$ , then above probability Eq. (2) is governed by the term

$\displaystyle \left(\frac{t}{l}\right)^{it} \ \left(1-\left(\frac{t}{l}\right)^t \right)^{n-i} \ .$ (3)

For small $ q=\frac{t}{l}$ we have $ \left(1-\left(\frac{t}{l}\right)^t \right)^{n-i} \approx 1$ . The function $ \left(\frac{t}{l}\right)^{it}$ has its minimum at $ t=l/e$ independent of $ i$ ($ e$ is Euler's number). Thus, if the factor $ {l \choose t}$ is ignored and $ t=l/e$ individuals share the minor allele, the evidence for IBD is maximal.

To avoid IBS without IBD by recombination, we focused on rare SNVs (see main manuscript). IBD can be distinguished from IBS without IBD by rare alleles, because for them two independent origins are unlikely, so IBS generally implies IBD, which is not true for common alleles (3, chapter 15.3, p. 441). Hence, the minimum $ t=l/e$ will not be attained with rare SNVs. For fewer individuals $ t$ than the minimum $ t=l/e$ , the function $ \left(\frac{t}{l}\right)^{it}$ is decreasing. This justifies that more individuals give more evidence for IBD because random minor allele sharing (IBS without IBD) is less likely.


next up previous contents
Next: Population Groups of the Up:   Sharing of Very Previous: How Many Individuals Share   Contents
Sepp Hochreiter 2013-11-13