YOUR MOST FREQUENT QUESTIONS ANSWERED ABOUT 23ANDME, ANCESTRYDNA, & OTHER GENEOLOGICAL SERVICES

I receive quite a few questions via email. Most are regarding results users get from geneological serrvices such as 23andMe, AncestryDNA MyHeritage DNA, and so on. Here I will be answering the most frequent questions I get.

I’m an ethnic Turk or Kurd why don’t I get substantial East or Central Asian ancestry in my 23andMe or AncestryDNA results?

The main reason is these geneological companies compare your DNA with a bunch of present-day references covering pretty much all the ethnic groups in your region. For example, if you are from Turkey they will compare your DNA with a bunch of other people from Turkey and surrounding areas to see if you cluster with them. So obviously you should expect to match other references from Turkey & surrounding areas, and therefore will see a result such as 95% or 100% Turkish, or Kurdish, or Persian, or West Asian.

Since they are not comparing your DNA with Ancient East or Central Asian references, why would you expect to receive any significant East or Central Asian percentages. You should also not expect to receive any significant present-day East or Central Asian percentages since they already have other present-day Turks or Kurds they are comparing your DNA with.

Therefore, those services are not useful for trying to figure out how much ancient East or Central Asian admixture you have since they are comparing your DNA with DNA of many other Turkish or Kurdish individuals. So that is what you expect to score. Think of their service as a service that determines whether you cluster with other Turks and Kurds, and not how much ancient East or Central Asian admixture you have.

If you are a Kurd or Turk and are interested in how much ancient East or Central Asian admixture you have, you should contact a professional geneticist, and inform them your are interested in running qpAdm on your raw DNA file. It’s important that the geneticist performing the analysis have substantial experience in bioinformatics and statistical tools for analyzing population genetics, since like most tools a “garbage in” gives a “garbage out” result, and based on our extensive experience in bioinformatics, genotyping, and using statistical tools in population genetics, it’s important to pick the proper pleft and pright populations as well as be able to properly interpret the results of qpAdm. Unfortunately, we don’t time to entertain requests for qpAdm calculations for individuals.

How reliable are PCA coordinate based tools such as G25 for determining ancestry?

In a nutshell, G25 should never be used to determine admixture percentages. There are a few reasons why population genetics professionals in the field don’t use G25 in scientific publications, and there have been papers published detailing why PCA coordinates should not be used for ancestry computations. Two main reasons come to mind:

  • PCA or G25 coordinates are highly dependent on which individuals or populations were used to determine the coordinates. For example in the computation of the G25 coordinates , one can easily exaggerate the West Eurasian ancestry in Kurds or Turks by increasing the number of individuals related to Kurds or Turks but with greater West Eurasian ancestry than Kurds or Turks, vs individuals related to Kurds or Turks but with greater East Eurasian ancestry than Kurds or Turks. These individuals will cause Kurds or Turks to have G25 coordinates that are artificially more West Eurasian shifted than they should be.
  • There is no built-in safeguard in the G25 calculations to reject overfitted results. In contrast, using qpAdm, standard errors become unacceptably high when using highly related pleft sources (in other words overfitting of the source populations). As a simplified example, let’s assume an individual A’s mom is C (Chinese) and dad is T (Turkish). If we use C and T as references, then A’s ancestry would correctly be 50% Chinese & 50% Turkish.
  • To illustrate overfitting, let’s now use C, T, & T1 as references where T1 is T’s brother (A’s uncle). If we do the calculation, A’s ancestry proportions now become 33.3% Chinese, 33.3% Turkish, and another 33.3% Turkish, in other words 33.3% Chinese and 66.6% Turkish which is wrong since he is in fact 50/50 Chinese/Turkish. We were able to artificially increase A’s Turkish ancestry by using 2 related Turkish sources (overfitting). QpAdm would have rejected this result with high standard errors. In contrast, G25 would have made this a desirable result by showing a very small distance, and since G25 is so easy to use and can be used by hobbyists who don’t understand these concepts, misleading results are spread online and become the norm..
  • The biggest problem of placing easy to use tools such as the G25 in the hands of hobbyists is that they facilitate the spread and normalization of misleading results and population histories.
  • To summarize, don’t use an Admixture calculator with more than 4 components to get an idea of East vs West Eurasian ancestry, and never use G25 for this purpose. Seek out a professional who is proficient with qpAdm and qpWave.

Scroll to Top
Scroll to Top