Impact of the Iron Age Saka and Scythians on the demography of Kurds

31 Comments / Analysis, East Asian Gene Flow, Genotyping, Kurd Genetics, Uncategorized / By Dilawer

ABSTRACT

The flood of recently sequenced ancient DNA has tremendously increased our understanding of demography of various Eurasian modern populations. Thanks to the recently sequenced genomes in Damgaard et al, Narasimhan et al, Olaide et al, Lazaridis et al, and Mathieson et al, and the suite of tools within Admixtools, detailed in Patterson et al we are finally able to uncover the shroud surrounding the ethnogenesis of a relatively poorly understood population; the Kurds.

Utilizing multiple lines of evidence, we were able to determine with a high degree of certainty the ancestral populations which hybridized to form present day Kurds. The analysis was performed via carefully designed tests using formal statistical methods, utilizing the various suite of tools, such as dstats and qpAdm, contained in the ADMIXTOOLS software package at Reich Lab.

Although it has been relatively easy to determine that the 5000 year old chalcolithic Zagrosian populations from Iran (Seh Gabi and Haji Firuz) formed the majority genetic input of present day Kurds, the remaining genetic input which relates to admixture from various Eurasian steppe and Central Asian populations was not as easy to ascertain until now.

The confounding issue has been that various genetic tests such as ADMIXTURE, PCA, and IBS, test for allele sharing and total shared genetic drift, but since Eurasian steppe populations, and Kurds are both substantially West Asian derived, the agreement in alleles or shared genetic drift may simply be a result of shared distant ancestry between Kurds and various steppe populations, and not necessarily due to introgression from the Eurasian steppe.

Here we are able to determine the character of this Eurasian steppe and Central Asia admixture layer on top of their 8000 year old Zagrosian farmer core. Somewhat surprisingly, this introgression is not primarily from 3500-4000 year old Middle or Late Bronze age (MLBA) steppe cultures related to Sintashta, Srubna, or Andronovo, but rather from 2000-2500 year old Iron Age (IA) populations related to Scythians and Saka, and from Middle Age (~800 year old) Turkic populations related to Kipchaks and Karakhanids.

**Fig 1 – Eurasian tribes of the Iron Age**

METHODOLOGY

The 8000 year old Iranian Chalcolithic samples used here were featured in previously published papers, and consist of 5 samples from the Seh Gabi area, which is on the outskirts of Kermanshah, which is within the Kurdistan region of Iran (West-Central Iran). They also consist of 5 samples from the Haji Firuz area, which is also located in the Kurdistan region of Iran (NW Iran).

The Saka and Hun Iron Age samples are also from previously published papers, and are from the Tien Shan region, which is located near the borders of Kyrgystan, Kazakhstan, and China.

The software qpAdm which is included in ADMIXTOOLS at Reich Lab, and is extensively utilized to obtain admixture proportions. Since there is some confusion regarding the code, some details are provided here based on prior experience with the software. QpAdm is a formal statistical test which uses the Likelihood Ratio Test (LRT) to test null hypotheses against alternate hypotheses.

The null hypothesis used consists of modeling a target as a combination of source populations which are analyzed against outgroups, which are phylogenetically more distant from the target. Thus the test is more informative to introgression instead of genomic similarity due to general shared drift.

For example, if a target is modeled with 3 ancestry streams as T = ∝ PopA + β PopB + γ PopC, the program calculates the p-value associated with the model. The p-value is a probability of the null hypothesis being true, ie the probability of there being no (null) difference between the observed admixture proportions and the expected admixture proportions, ∝, β, & γ. Thus a p-value of 0.5 would indicate that there is a 50% probability that the mixture model for T or the null hypothesis is true. By contrast a p-value of 0.01 would indicate that there is only a 1% probability that the mixture model is correct, in which case we can reject it based on a threshold of p>0.05.

Therefore, higher p-values in qpAdm indicate less discrepancy between the expected and observed values, which in turn confers greater confidence in the mixture model for the test subject. This is also referred to as a better fit.

Generally, the analysis is performed as follows:

We start with knowns and solve for unknowns. Here the known core ancestral populations for Kurds are Chalcolithic Iranians, such as the Haji-Firuz genomes;
We then test using the following dstats; D [ Mbuti/Chimp, steppe ; Iran-Chl, Kurds ] to determine which steppe populations Kurds share the most drift with to the exclusion of Iran-Chl;
Test f3 [ Kurds; Iran, steppe ] to determine which steppe – Iran-Chl combinations produce the strongest signals of admixture for;
Finally, using qpAdm, we determine high confidence mixture models for Kurds, using p-values, using a core Zagrosian Chalcolithic population along with a steppe population such as Sintashta-MLBA, Iron Age Saka or similar using various outgroups. The qpAdm methodology also provides a formal test of whether the model of the Test population as a mixture of 2 or 3 source streams that are clades with West Eurasians and East Eurasians, is a fit to the data, with all outgroups which are phylogenetically more distant.

The qpAdm test gives a p-value which is between 0 and 1. Generally, the p-value threshold is set at 0.05. P-values > 0.05 indicates that we can’t reject the mixture model, with higher p-values indicating greater confidence in the models.

QpAdm gives us a p-value for whether the matrix is rank=1 ( consistent with 2 streams of ancestry), or rank=2 ( consistent with 3 streams of ancestry), and an estimate of the proportion ? of ancestry that is a clade with various Steppe-MLBA or Steppe-IA populations.

With the qpAdm test, very low p-values could also indicate that the relationship of the test population to the outgroups may be more complex, and that the outgroups may be interact with the test population outside the source populations. In this case outgroups could be dropped or changed one at a time to determine whether a passing p-value can be obtained. The disadvantage of dropping an outgroup is that we are removing potentially phylogenetic useful branching points, which may result in an increase in standard errors.

Larger standard errors can indicate that we are still missing aDNA which carries important phylogenetic information not present in the current outgroups used.

Steppe-IA versus Steppe-MLBA geneflow into West Asia

Here we show that for Kurds, there is substantially greater introgression from Iron Age Central Asian populations such as Scythians, Saka, and Huns, than from MLBA steppe populations such as Andronovo, Srubna, and Sintashta.

A couple of big hurdles in academia in determining whether Steppe-MLBA or Steppe-IA had the biggest impact on the demography of Kurds have been:

A shortage of appropriate Central Asian aDNA sources to date;
The wrong questions have been addressed in analyses.

With regards to (1) above, we now have a decent amount of aDNA from Central Asia to make more accurate calculations thanks to the aDNA published in Damgaard et al, 2018, and Narasimhan et al, 2018,

With regards to (2) above, much of the research seeks to assess steppe ancestry in Kurds via ADMIXTURE, PCA, and IBS. The problem with doing so is that only total accumulated mutations, and total shared drift between West Asian and Steppe populations, since Out-of-Africa (OOA) is addressed. Since Steppe-MLBA is more West Eurasian than Steppe-IA, these types of analyses will overestimate Steppe-MLBA, not necessarily because of greater introgression into West Asians such as Kurds, but rather because they are substantially more derived from the same ancestral core populations that contributed to them, ie Iran-N, CHG, and Antatolia-N related.

The proper question we should seek to answer with analyses would be whether modern West Asians such as Kurds are differentially more Steppe-IA or Steppe-MLBA shifted POST Chalcolithic Iran.

Modern Kurds compared with their Zagrosian Chalcolithic Ancestors

One of the most important questions to answer is what demographic events have taken place in the Kurdistan area that differentiate modern Kurds from their Zagrosian Chalcolithic ancestors, which are here represented by the genomes from the Seh Gabi and Haji Firuz areas, which are within the Kurdistan region of Iran.

Here dstats of the form D [ Kurds, Haji-Firuz; Steppe, Mbuti ] can be used to compare various Kurdish samples from the Iraq/Iran area with Haji-Firuz for shared drift with steppe samples post since Haji-Firuz-Chl. Kurds C1-C3 are Kurmanji Kurd samples from northern Iraq, and Kurds F1-F7 are Feyli Kurd samples from around the Iraq/Iran border.

Here we show that the most important demographic events affecting Kurds in the Kurdistan region since the Chalcolithic involve introgression of smaller amounts of DNA from Middle to Late Bronze Age cultures related to Sintashta and Andronovo, followed by introgression of larger amounts of DNA related to Saka and Hun steppe nomads during the Iron Age, and related to Turkic Medieval populations such as Kipchaks and Karakhanids. It is the latter mechanism that explains the significant increase in East Eurasian admixture in present day Kurds when compared to their Iranian Chalcolithic forefathers.

This increase in East Eurasian admixture for present day Kurds is easily observed with dstats of the form D [Kurds, Haji-Firuz; Steppe, Mbuti ]. The results are shown in figures 2 – 4.

With regards to Steppe-MLBA, dstats indicate a dilution of this type of ancestry post Haji-Firuz, especially with some of the Feyli Kurd samples.

**Fig 2- Dstats of the form D [Kurds, Haji-Firuz; Steppe, Mbuti ] showing the biggest change since Haji-Firuz-Chl is E Asian geneflow into modern Kurds**

**Fig 3 – Dstats of the form D [Kurds, Haji-Firuz; Steppe, Mbuti ] showing the biggest change since Haji-Firuz-Chl is E Asian geneflow into modern Kurds**

**Fig 4 – Dstats of the form D [Kurds, Haji-Firuz; Steppe, Mbuti ] showing the biggest change since Haji-Firuz-Chl is E Asian geneflow into modern Kurds**

We next attempt to identify possible vectors for this E Asian geneflow into modern Kurds. Based on history and geography, Scythians and Turkic tribes are the best candidates for this E Asian geneflow. The aforementioned dstats point to Central Asian Saka, Hun and Medieval Turkics. These are further investigated using the qpAdm method.

The dstats shown in figures 2-4 don’t support introgression of Steppe-MLBA such as Sintashta and Andronovo into the Kurdistan region, post Haji-Firuz-Chl, however, this is also further investigated via the qpAdm method.

To determine whether modern Kurds have a larger layer of Steppe-IA, or a larger layer of Steppe-MLBA layer on top of their Iranian Chalcolithic core, dstats of the form D [ Mbuti, Steppe-IA; Haji-Firuz-Chl , Kurds ] vs. D [ Mbuti, Steppe-MLBA; Haji-Firuz-Chl , Kurds ] are used to shed light on whether Kurds differentially share greater genetic drift with Steppe-MLBA populations, or with Iron Age Saka and Huns, to the exclusion of their Haji-Firuz-Chl core. Thus it is helpful to compare the drift path lengths of Kurds & IA Tien Shan Huns against Kurds & MLBA steppe populations, subsequent to Chalcolithic Iran. In figure 5 we see that every Kurdish test sample is differentially more net Tien Shan Hun shifted than Sintashta-MLBA shifted, to the exclusion of Haji-Firuz-Chl.

**Fig 5 – Differential shared genetic drift comparison of Iron Age Tien Shan Huns vs MLBA Sintashta to the exclusion of Haji-Firuz-Chl for Kurds**

The strategy we use with the qpAdm method is to let the program pick admixture proportions based on 3 source populations (pleft), using outgroups (pright) which are phylogenetically more distant from the Kurdish test subjects, than the 3 source populations. This will also enable us to reject models which are not a fit to the data. The combinations of sources we use are;

Haji-Firuz-Chl, Saka-TienShan, Sintashta-MLBA
Haji-Firuz-Chl, proto-Turkic XiongNu_WE, Sintashta-MLBA

We highlight higher confidence models based on p-values. We first check to see fits for models consisting of 3 ancestry streams; Chalcolithic Iranians , Saka, Sintashta-MLBA, using outgroups (pright) which are phylogenetically more distant from the Kurdish test subjects, than the 3 ancestry sources (pleft). This will also enable us to reject models which are not a fit to the data.

The following outgroups (pright) were used throughout, which produced the highest confidence qpAdm models for Kurds:

Mbuti
ShamankaEN
Karitiana
Ust_Ishim
WHG
West_Siberia_N
Anatolia_N
Ganj_Dareh_N
Onge
EHG

Table 1 shows that for 5 of the 9 tested Kurd samples, the 3-way models with Sintashta-MLBA included were infeasible. However,the models become feasible for all Kurdish samples when Sintashta is dropped, and Kurds are modeled as a 2-way combination of [Haji-Firuz-Chl – Saka] or [Haji-Firuz-Chl – proto-Turkic Iron Age XiongNu].

**Table 1 – QpAdm 3-ancestry model for Kurds. Models with Sintashta-MLBA as a source are feasible for only about half of the Kurdish samples.**

The p-values for the 2 ancestry stream models consisting of Haji-Firuz-Chl – Saka, or Haji-Firuz-Chl XiongNu-IA, ranged between 0.19 and 0.97 for the Kurdish samples. Thus, unlike the aforementioned models with Sintashta-MLBA included, we were not able to reject any of those models for the Kurdish samples. Standard errors were relatively low, and in the single digits (tables 2 and 3)

**Table 2- QpAdm 2-ancestry model consisting of Haji-Firuz + Saka-TienShan**

**Fig 6 – QpAdm 2-ancestry model for Kurds consisting of Haji-Firuz + Saka-TienShan**

**Table 3 – QpAdm 2-ancestry models for Kurds consisting of Haji-Firuz + XiongNu**

**Fig 7 – QpAdm 2-ancestry model consisting of Haji-Firuz + XiongNu**

Unlike the previous 2-ancestry models consisting of Haji-Firuz and Saka, or Haji-Firuz and XiongNu, which we could not reject, we can reject a 2-ancestry model consisting of Haji-Firuz and Sintashta-MLBA, as p-values are significantly below our threshold of 0.05 for all but one of the Kurdish samples (Table 4).

**Table 4 – QpAdm 2-ancestry model consisting of Haji-Firuz-Chl & Sintashta-MLBA can be rejected for all the Kurdish sample**

Although the 2 ancestry stream models consisting of Haji-Firuz-Chl and Sintashta-MLBA are infeasible and can be rejected, the models with Sintashta-MLBA become feasible with the addition of Medieval Turkics such as Kipchaks (Table 5)

**Table 5 – 3-ancestry source models consisting of Haji-Firuz-Chl + Sintashta-MLBA + Kipchak Medieval Turkics produce good fits**

A summary of some of the highlights of this study is as follows:

There is a very high degree of genetic continuity in the Kurdistan region over the past 8000 years, where most Kurdish samples in this study can be modeled as over 70% Haji-Firuz-Chalcolithic (HF-Chl) with a high degree of confidence.
In 3-ancestry stream models consisting of HF-Chl, TienShan-Saka, and Sintashta-MLBA, there is evidence of less than 10% Sintashta genetic input in about half of the Kurdish samples.
Simple 2-ancestry stream models consisting of only HF-Chl and Sintashta-MLBA can be rejected for all the Kurdish samples, as p-values are significantly below the p=0.05 threshold.
Substantial introgression of East Eurasian DNA into present day Kurds since their 8000 year old Iranian farmer forefathers.
There is evidence of substantial hybridization between the Chalcolithic Iranian ancestors of Kurds and Iron Age Central Asian Saka, and subsequent Turkic populations, which had a substantial impact on present day Kurdish ethnogenesis, because simple 2-ancestry stream models consisting of either HF-Chl and TienShan-Saka, or HF-Chl and proto-Turkic XiongNu-IA can not be rejected for any of the Kurdish samples, as p-values for those are significantly above the p=0.05 threshold, with the average p-value for all Kurds for the HF-Chl – Saka models being p=0.66, and the average p-value for the HF-Chl – XiongNu models being p=0.53.
The models with Sintashta-MLBA as a source are feasible only when Turkics are added as an additional source.

UPDATE

It appears likely that the early Indo-Iranians of the Bronze Age Eurasian Steppe such as the pastoralist Sintashta culture came into contact with the agriculturalists of the Bactria Margiana Archaeological Complex (BMAC), who themselves were derived from earlier agriculturalists who had moved east from Iran, Western Siberian Neolithic Hunter Gatherers and Ancestral South Asians ( Narasimhan 2018).

From the linguistic side there is support for this interaction because of traces of prehistoric non-Indo-European loanwords within Indo-Iranian (Damgaard 2018 Linguistic Supplement). Some of these loanwords were retained by the Iranian branch subsequent to the Iranian – Indo-Aryan split of Indo-Iranian. Such words include Hushtra camel) and kara (donkey).

This was investigated to determine whether we could pick up any evidence of this interaction between the early Indo-Iranians and the BMAC in the genetic substructure of various West Asian populations using formal stats (qpAdm) by attempting to model West Asian populations as a 4 way mix of:

Haji-Firuz-Chl
BMAC (Gonur-BA, Turkmenistan)
Sintashta-MLBA
Turkic (Medieval Kipchaks)

The following outgroups were used:

Mbuti
ShamankaEN
Karitiana
Ust_Ishim
WHG
West_Siberia_N
Anatolia_N
Ganj_Dareh_N
Onge
EHG

From the West Asian populations tested only the Kurds showed evidence of BMAC ancestry. This indicates greater Central Asian geneflow in comparison to the other populations studied, and likely explains the differential ASI shift of Kurds compared to the other West Asian populations studied. The Kurdish samples consisted of 2 Kurmanji Kurd samples from the Kurdistan region of Iraq and 4 Feyli Kurd samples.

Four populations; Georgian Jews, Bedouin, Jordanians, and Karakalpaks produced infeasible results for both the full rank model as well as all the nested models.

**Table 6 – qpAdm 4 ancestry models for various West Asian populations.**

Fig 8 – qpAdm 4 ancestry models for various West Asian populations. Only Kurds produced evidence of BMAC admixture indicating greater Central Asian geneflow. This likely also explains the differential Ancestral South Asian (ASI) shift of Kurds compared to the other West Asian populations studied.

The following are the individual results for the Kurdish samples studied:

**Table 8 – Individual Kurdish sample results**

**Fig 9 – Individual Kurdish sample results**

REFERENCES

The first horse herders and the impact of early Bronze Age steppe expansions into Asia, Peter de Barros Damgaard et al, 2018.
The Genomic Formation of South and Central Asia, Vagheesh M. Narasimhan et al, 2018.
The population genomics of archaeological transition in west Iberia: Investigation of ancient substructure using imputation and haplotype-based methods, Rui Martiniano et al, 2017.
The Beaker Phenomenon And The Genomic Transformation Of Northwest Europe, Olaide et al, 2017.
Genome-wide patterns of selection in 230 ancient Eurasians, Mathieson et al, 2015.
The Genomic Formation of South and Central Asia, Narasimhan et al, 2018.
ADMIXTOOLS, Reich Lab, https://reich.hms.harvard.edu/software.
Ancient Admixture in Human History, Patterson et al, 2012.

31 thoughts on “Impact of the Iron Age Saka and Scythians on the demography of Kurds”

Rudy
August 12, 2018 at 2:02 am

Dilawer,

Excellent article. Makes sense with Kurds picking up Saka related ancestry from central Asia, since your dstats in fig 2-4 clearly show that Kurds picked up significant east Asian related over the past 8000 years since Haji Firuz, and that E Asian type ancestry could not have come with MLBA such as Sintashta.

Central Asian Saka and Turkics would be the logical source for this east Asian ancestry, so it would be those same populations that would be the source for steppe related ancestry which is confused with MLBA Sintashta related by some.

Have you by chance looked at a parallel situation for south Asians.

Rudy
1. Dilawer Khan
  August 12, 2018 at 2:57 am
  
  Thanks Rudy.
  
  I am actually in the midst of an in-depth analysis of S Asians. Yes, I do see parallels, where many NW S Asians are best modeled using qpAdm as Iron-Age Swat + Saka, with some needing additional AASI.
  
  Some S Asians have received the majority of their E Eurasian from AASI, whereas others have received it from Saka related. QpAdm, which uses outgroups, is able to distinguish the character of this E Eurasian in S Asians, and we clearly see better fits for Swat-IA + Saka than Swat-IA + Steppe-MLBA for the majority of NW S Asians.
  
  However, there are amateurs and lay people on various forums and blogs that attempt to model S Asians using much less suitable methods, for example ADMIXTURE, PCA, and PCA based oracles. The problem here is that these methods don’t discount very old common alleles as they don’t use outgroups. Additionally, those methods don’t output standard errors or adequately grade the fits. Thus MLBA gets overplayed quite a bit.
  
  Thus a model showing a NW S Asian as Swat-IA + Sintashta-MLBA is not as good as Swat-IA + Saka when analyzed at depth using formal statistics.
  
  I expect to have an article on this in the next couple of months…
Rudy
August 12, 2018 at 4:01 am

Thanks, looking forward to it. Btw I found the detailed explanation on qpAdm invaluable.
REAL
August 13, 2018 at 8:37 pm

Those ancestry models (ie. 20% XiongnuWE) do not look accurate at all. Kurds have 0-2% East Eurasian ancestry on average while the Xiongnu sample has ~40% East Eurasian ancestry.
1. Dilawer Khan
  August 13, 2018 at 10:12 pm
  
  This is not an amateur or layman forum where anything flies. Only posts with substance from scientists are welcome here.
  
  Your post lacks substance for the following reasons:
  
  1- You don’t specify a baseline population from which E Eurasian is measured. Are you measuring 2% from a present day Bedouin population, or 2% E Eurasian above a Georgian population, or 2% above some ancient population. You need to specify the baseline population, otherwise your post makes no sense.
  
  It’s analogous to saying the temperature today is 40 deg C, without specifying what 0 deg C is defined as. If for example, 0 deg C is defined as the freezing/melting point of iodine instead of the freezing/melting point of water, then the temperature would not be 40 deg. It maybe more like -73 deg C.
  
  As mentioned in the article, the 20% XiongNu for Kurds is measured relative to their 8000 year old Chalcolithic Iranian ancestors using qpAdm. If a different baseline other than Iran-Chl is used then this will change.
  
  2- You don’t specify what software is used to arrive at your number and what the methodology was.
  
  Although I don’t advocate using ADMIXTURE for introgression calculations for various reasons, here is what ADMIXTURE in supervised mode shows for one of the Kurdish samples. I have outlined how many samples were used for each population, because with ADMXITURE it is extremely important to keep the number of samples for each population similar ( for example when I used 15 Sintashta samples the Sintashta % jumped to 40 or 50%. This is one of a couple of reasons why some ADMIXTURE or PCA runs out there show inflated Sintashta shared drift, namely the Sintashta and MLBA samples outnumber the other population samples in the run) :
  
  POP Kurd-C3
  Iran-Chl & Haji-Firuz (9 samples) 47.44%
  Mongol (6 samples) 26.49%
  Onge (7 samples ) 7.36%
  Sintashta_MLBA (6 samples) 18.71%
Dilawer Khan
August 13, 2018 at 10:20 pm

@ REAL

You have to be prepared to answer 1 & 2 above, and not keep spamming with your original post otherwise your posts will be removed
Dilawer Khan
August 15, 2018 at 3:52 am

Although REAL’s question lacked substance he did raise something that I’m sure the millions of people who have tested with ancestry or genealogical services companies are confused about, namely how to interpret the ancestry results percentages, such as 2% E Asian from an admixture test.

Since there does not appear to be any online resources that address the detailed interpretation of individual results, I have decided to dedicate an article with case examples to this matter to clear up the confusion.

Stay tuned for “How to correctly interpret your admixture results “ coming soon
REAL
August 16, 2018 at 3:02 pm

How can I start a counter-argument if you keep removing my comments?

“Only posts with substance from scientists are welcome here.”

I think this is a fancy was of saying ” I remove any comment that questions my post.”

So be it. 🙂
1. Dilawer Khan
  August 16, 2018 at 3:07 pm
  
  Easy, don’t keep repeating your original post, and answer questions 1 and 2 from my reply and you’ll be fine.
  
  I’ll help you out a little “baseline” is equivalent to “the exclusion of”. If 2% is based on some admixture test that has all sorts of present day west Asian references, then the 2% E Asian represents a Kurdish subject score to the exclusion of all the modern Caucasian and Middle Eastern references. Think of the 2% as above and beyond what the Caucasus and other W Adian references have.
  
  In reality those Caucasus and other W Asian references themselves have E Asian derived alleles which they have accumulated over the past 2500 years due to various admixture pulses originating in C Asia.
  
  To get TOTAL accumulated alleles derived in E Asians shared with Kurds or W Asians or Europeans one has to use references such as Neolithic W Asian/ European farmers who predated all those admixture events from populations originating in C Asia. They will represent a better baseline than using modern Caucasian and W Asian references who themselves are E Eurasian admixed
  
  I’ll have some examples to illustrate what I just discussed in the near future
Kaye
September 5, 2018 at 2:17 pm

Really very interesting!

Is there a way to determine whether Kurds are more likely a 3 way mix of Haji-Firuz + Saka + Xiongnu Proto-Turkic rather than a two-way mix of either Haji Firuz + Saka, or Haji Firux + Xiongnu Proto-Turkic? If a 3 way mix is preferred, is there a way to determine how much each of Iron Age Scythian and Proto-Turkic contributed, or at least whether one contributed more than the other? If 2 way mixes are preferred to the 3-way, does the higher p-value for Saka necessarily imply they are a more likely source than Xiongu Proto-Turkic or can there be more complex explanations?
1. Dilawer Khan
  September 6, 2018 at 4:30 am
  
  Hi Kaye,
  
  It’s hard to get good fits with qpAdm using 2 input population sources such as XiongNu-W and Saka TienShan that share alot of genetic drift with each other. The standard errors start blowing up.
  
  The large standard errors indicate we are still missing aDNA which carries important phylogenetic information not present in the current outgroups used.
  
  If we keep in mind that the p-value is a probability of the null hypothesis being true, ie the probability of there being no (null) difference between the observed admixture proportions and the expected admixture proportions, then the higher p-values confer greater confidence in the mixture model. However, the number of samples in each source population also affect p-values. Thus smaller differences in p-values are not significant.
  
  Substituting Sintashta-MLBA for Saka-TienShan, and using a Turkic as a another source does yield decent mixture models. For example, here are some results for a northern Iraqi Kurd:
  
  .Kurd_C3
  Hajji_Firuz_C
  Sintashta_MLBA
  XiongNu_WE
  
  best coefficients: 0.768 0.074 0.158
  Jackknife mean: 0.766252249 0.075813266 0.157934485
  std. errors: 0.056 0.066 0.033
  
  fixed pat wt dof chisq tail prob
  000 0 7 3.190 0.866861 0.768 0.074 0.158
  left pops:
  
  left pops:
  .Kurd_C3
  Hajji_Firuz_C
  Sintashta_MLBA
  Karakhanid
  
  best coefficients: 0.734 0.118 0.148
  Jackknife mean: 0.731962090 0.120123734 0.147914176
  std. errors: 0.068 0.073 0.038
  
  fixed pat wt dof chisq tail prob
  000 0 7 3.494 0.835854 0.734 0.118 0.148
  
  left pops:
  .Kurd_C3
  Hajji_Firuz_C
  Sintashta_MLBA
  Mongola
  
  best coefficients: 0.732 0.182 0.086
  Jackknife mean: 0.730851390 0.183284876 0.085863735
  std. errors: 0.053 0.055 0.016
  
  fixed pat wt dof chisq tail prob
  000 0 7 4.513 0.719194 0.732 0.182 0.086
  
  left pops:
  .Kurd_C3
  Hajji_Firuz_C
  Sintashta_MLBA
  Kiptjak
  
  best coefficients: 0.699 0.135 0.166
  Jackknife mean: 0.697829856 0.136188917 0.165981227
  std. errors: 0.055 0.059 0.033
  
  fixed pat wt dof chisq tail prob
  000 0 7 4.308 0.743693 0.699 0.135 0.166
Luke
September 6, 2018 at 1:16 am

That’s pretty interesting. It might imply that the spread of Iranian to Western Iranic groups was overall much later (Iron Age) OR that the genetics associated with the initial spread of Iranian to West Asia were very diluted or represented a very small core group and that most of the steppe ancestry Kurds have is actually from later groups that didn’t change the linguistics of the area.

What happens when you add some other references too that can conceivably have contributed to modern Kurds via intermediate populations, like Bronze Age Levant? Does it change anything about the proportions of those other two?
1. Dilawer Khan
  September 6, 2018 at 4:31 am
  
  Hi Luke,
  
  Adding Levant-BA as an additional source to the 3 sources shown in my reply to Kaye above yielded infeasible results for all the combinations except this one:
  
  left pops:
  .Kurd_C3
  Hajji_Firuz_C
  Levant_BA
  Sintashta_MLBA
  Karakhanid
  
  best coefficients: 0.775 0.004 0.078 0.144
  Jackknife mean: 0.748957678 0.027387809 0.076529009 0.147125504
  std. errors: 0.254 0.248 0.089 0.055
  
  fixed pat wt dof chisq 0000 0 0001 1 0010 1 0100 1 1000 1 0011 2 0101 2 0110 2 1001 2 1010 2 1100 2 0111 3 1011 3 1101 3 1110 3 best pat: 0000 best pat: 0100 best pat: 0110 best pat: 0111 tail prob
  7 5.460 0.604043 0.775 0.004 0.078 0.144
  8 10.786 0.214111 1.294 -0.488 0.193 0.001 infeasible
  8 6.350 0.608144 0.739 0.092 0.000 0.169
  8 5.495 0.703556 0.774 0.000 0.083 0.143
  8 17.438 0.0258616 -0.000 0.741 -0.004 0.262 infeasible
  9 14.242 0.113981 1.621 -0.621 -0.000 0.000 infeasible
  9 16.477 0.0575645 0.839 -0.000 0.161 0.000
  9 6.953 0.641975 0.841 -0.000 0.000 0.159
  9 57.493 4.065e-09 0.000 0.737 0.263 0.000
  9 18.230 0.0325927 0.000 0.740 0.000 0.260
  9 140.093 9.8713e-26 -0.000 -0.000 0.772 0.228
  10 20.328 0.0262969 1.000 -0.000 0.000 0.000
  10 67.788 1.18328e-10 0.000 1.000 0.000 0.000
  10 165.706 2.1456e-30 0.000 -0.000 1.000 0.000
  10 236.447 0 0.000 -0.000 0.000 1.000
  0.604043 – –
  0.703556 chi(nested): 0.036 p-value for nested model: 0.850364
  0.641975 chi(nested): 1.458 p-value for nested model: 0.227245
  0.0262969 chi(nested): 13.375 p-value for nested model: 0.000255022
  
  which essentially says that 0% Levant-BA is needed when Haji-Firuz-Chl is already as source. This in combination with the increased standard errors of 25% associated with both Levant-BA and Haji-Firuz indicates to me that Levant-BA is itself substantially admixed with something related to Haji-Firuz.
  
  You’ll also note that by adding Levant-BA as a source, Karakhanid remains unchanged at about 14%, with Sintashta-MLBA decreasing from 11% to about 8%.
  1. Luke
    September 8, 2018 at 10:57 am
    
    Thanks for running this. Levant_BA is definitely mixed with something related to Haji-Firuz, probably the kind of Iran_N (well, Iran_CA) ancestry that entered the Levant post Levant_N.
    
    But man, this experiment of yours is really interesting…Along with the recent South Asia paper they open up some questions about the Indo-Europeanization of West and South Asia and make it out to be an interesting and complex process. I probably shouldn’t be surprised though, so far we keep seeing relatively small(er than present-day) amounts of steppe ancestry in ancient populations with the exception of the north european plain and directly adjacent regions where the effects of Corded Ware and Bell Beaker were rather impactful and even there complex dynamic processes seem to emerge post those large horizons. It’s good some qualified bloggers are focusing on regions other than mostly just Europe lol.
    1. Dilawer Khan
      September 8, 2018 at 5:21 pm
      
      Yes Luke, qpAdm indeed indicates Levant-BA to be mixed with something related to HF.
      
      The Indo-Europeanization of West and South Asia is indeed a complex process. At this point in time genetics alone can’t resolve all the questions. Archeological context becomes important…
      1. Rob
        September 29, 2018 at 4:05 am
        
        I agree Kurd. Context is everything. Many amateur bloggers like Eurogenes and on anthrogenica have little knowledge of the subject matter, but have taught themselves qpAdm and now deem themselves proficiently educated in Eurasian prehistory, without taking the time to actually be so
        The fact is- these people, and some of the academics, don’t even have a proper grasp of Europe, let alone something as complex as Asia.
      2. Dilawer Khan
        September 29, 2018 at 9:58 am
        
        Indeed Rob, sad but true…. Generally speaking, I would say probably 95 to 100% of the users at forums are non-scientists or serious researchers, judging on the material posted and responses given, depending on the forum. Anthrogenica may be a little better than some of the other forums out there where genetics and anthropology related material is discussed, but it is still a forum nonetheless.
        
        I do communicate and exchange ideas with leading scientists in the field all the time. They have pretty much all conveyed to me that they don’t like posting comments on public forums, and understandably so. I don’t either, except at Anthrogenica sometimes. If I had to pull some numbers out of the hat based on the responses to my posts, I would say that about 5%-10% of the users at AG truly comprehend SOME of my substantial posts, and just about 0% of the users there truly grasp EVERYTHING I post…
Kaye
September 7, 2018 at 5:05 pm

Thanks for explaining. Because you’ve now found a 3-way mix of Sintashta-MLBA + Turkic feasible for Kurd_C3, is it worth repeating Table 1 for all your Kurd samples, but using Turkic sources in place of Saka-TienShan this time? If you don’t have time for that, is it worth checking the Sintashta-MLBA + Turkic combination just for the rows in Table 1 that are currently marked infeasible (rows 2-4, 6 and 9)? If all Kurd samples of Table 1, or at the very least the rows currently infeasible with Sintashta + Saka, come out as feasible using Sintashta + Turkic instead, maybe that makes Sintashta-MLBA relevant again as one of the fixed admixture sources for Kurds besides Haji-Firuz-Chl? Then the variable factor could be whether the 3rd admixture source might be Turkic for some Kurds, versus Saka for others, or a combination for some.
1. Dilawer Khan
  September 8, 2018 at 2:38 am
  
  Good points Kaye, I repeated table 1 with Medieval Turkics replacing Saka-TienShan. Although the straight 2-ancestry models consisting of Haji-Firuz-Chl & Sintashta-MLBA were infeasible,the models produced good fits for all Kurds with the addition of the Turkic admixture layer (table 5).
  
  It is thus difficult to determine with any certainty what the contribution of Saka vs Turkics is to Kurd ethnogenesis.
  
  In a straight 2-ancestry model, there is a preference for Haji-Firuz & Saka over Haji-Firuz & Turkics likely due to the higher steppe MLBA admixture in Saka.
Goga
September 8, 2018 at 10:28 pm

Thank you for your work.

Kurds as Aryan people are mostly native to their homeland Kurdistan. Most of the non-Kurdish geneflow came from SouthCentral Asia with the Parthians & Central Asian Saka Scythians. Other geneflow was from Northern Caucasus and came with Cimmerians and European Scythians.

Of course there is some gene flow from ancient (Chaldeans, Babylonians) and modern Semitic people in Kurdistan, but that is not very significant to mention it.
1. Dilawer Khan
  September 9, 2018 at 2:21 am
  
  Fair enough Goga. Let’s also not forget Turkic influence as the dstats in fig 4 show that modern Kurds are differentially much more Mongol shifted than their Chalcolithic Iranian ancestors. Seljuk and Ottoman empire probably also affected Kurds genetically.
Goga
September 9, 2018 at 10:34 am

I believe that a lot of the so called Turkic ancestry was actually brought by the late Iranian tribes who lived outside Kurdistan. The rest could be from the Golden Horde Mongols and the Seljuks. Those people somehow influenced Kurdish DNA a little bit, but they were never able to change Kurdish Iranian bloodline significantly.
Goga
September 9, 2018 at 10:41 am

Kurdistan is the land of the mountains. It is difficult to change DNA of people in the mountains if they don’t want to mix.

When Kurds felt some threat they simply retreated into the mountains and isolated themselves from the rest.

Zagros Mountains are Kurds’ best friends, since Zagros saved Kurdish identity and still protecting Kurdish ethnicity..
Dilawer Khan
September 18, 2018 at 12:41 am

@ Goga

Your comments were removed because parts were inflammatory and accusatory and devoid of science.

Also, let me make this clear to you. The results shown here are not from my calculators. They are from qpAdm which is part of ADMIXTOOLS. If you have a gripe with qpAdm then I suggest you take it up with Nick Patterson.

If you need the full output for any of the samples, let me know and I will provide it.

It seems that much of what you post is based on guess or intuition. Sometimes genetic analysis results are not in line with our intuition. When this happens try to resist making wild attacks against the methods such as qpAdm or throwing accusations because this is not scientific. Remember that there is no written record for 4000 years documenting every detail of Kurdish ethnogenisis, and paper trails only go back a couple of hundred years.

The results you see are qpAdm outputs, and can be duplicated by other researchers

BTW, I’ve been personally acquainted with the Kurdish samples for many years. I have visited their families in Dohuk. Their 4 grandparents are Kurds and they are from Dohuk.
1. Goga
  September 18, 2018 at 2:02 am
  
  My posts are based on my own DNA. I trust my own samples the most and since I’m an Ezdi Kurd (Kurmanji) and I’m not mixed at all I do consider myself very close to ancient proto-Kurds.
  
  All I’m sure about it is that Kurds are just native to their own homeland Kurdistand. Nobody on this planet is more native to Kurdistan than Kurds themselves. From my own observation, Kurds who are the least mixed are also the most native to Kurdistan. My own DNA (very pure Kurmanji Kurdish) doesn’t follow your conclusions. I don’t have as much non-Kurdish East Asian DNA, by far not as you are claiming here.
  
  Nobody on this planet is more NorthWest Iranic than Kurds and nobody on htis planet is more native to Kurdistan than my people including myself.
  
  I’m out.
Goga
September 18, 2018 at 11:46 am

According to thios academic paper of less that 1 year ago Ezdi Kurds (purest Kurmnaji) are the most native people of the Northern Mesopotamia.

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0187408
Goga
September 21, 2018 at 5:52 pm

Kurds are Aryan people. Kurds are the uncontested and direct descendants of the Aryan Medes. Our Aryan history is more than 10000 years old and it is just native to our Aryan homeland Kurdistan. Kurds and their ancient native Aryan homeland are inseparable. Nobody is more native to the Taurus-Zagros Mountains than Kurds themselves. Kurds are the most native people of the Aryan lands of the Northern Mesopotamia, the Garden of Eden. Prehistoric Ancestors of the Kurds are arguably the most influential people in regard to the civilisation of the humankind.

“Oldest village in al-Jazeera region, its association with Kurdish history”

http://www.hawarnews.com/en/haber/oldest-village-in-al-jazeera-region-its-association-with-kurdish-history-h3886.html

No matter how genetic calculators are tweaked, they will never change the Kurdish DNA. Kurdish Aryan DNA is a reality and the fact is that it is just native to Kurdistan.

Of course there is a continuous gene flow from all parts of the world into Kurdistan, but they are minor and can’t change the race of the Kurds. Kurds are still Kurds and still very native to their Aryan homeland.

Like I said earlier, most of the East Asian aDNA in Kurdistan was brought mostly by the Parthians and Saka. Some of it was borught by the Turkic tribes, like some of the Afro-Asiatic (Semitic) aDNA was brought by the Semites. But Kurds are mostly evolved from the Medes.

Kurds = 70% Medes + 20% Pathians

Medes = native to Kurdistan
Parthians = 80% Medes + 20% Saka

Parthians lived in Kurdistan and that is a well-known fact. The Parthians fought even against the Roman Empire.

People who come up with wild and fake fantasies doesn’t have any knowledge on Aryan Kurdish history. Or they do it deliberately to make Kurds less native to Kurdistan.

No matter what they claim, Kurds are still the most native people of Kurdistan. It is in our DNA.

Sintashta-MLBA in Kurds is big nonsense. Kurds as Aryan people have nothing to do with Sintashta at all. Sintashta-MLBA we NOT Aryan people at all, they have nothing to do with the Aryan civilizations in the Northern Mesopotamia.
Goga
September 29, 2018 at 9:25 pm

Maybe some methology is somehow difficult to grasp, but conclusions are very easy to consume.

And some conclusions don’t make any sense and I will tell why. Untill 1000 years ago Northern Mesopotamia (Kurdistan) and Southern Caucasus + Anatolia was Turk free. Turks started to arrive into West Asia only 1000 years ago.

Let’s compare Azeri to Kurds. Before the Turks Azeri were very similar to Kurds. Later on Azer were Turkified. What I don;t understand from your conclusion is that why Azeri don;t have any BMAC auDNA while Kurds do some BMAC ancestry. Ancient Azeri of West Asia as people from Media Atropatene Iranian people and shared Median ancestry with the Kurds. Nowadays Kurds are the modern Medes who were native to Kurdistan. According to your stats Azeri don’t have any BMAC ancestry. SO that means that NorthWest Iranian (Aryan) Medes didn’t have any BMAC ancestry.
So the Medes as pure Aryan poeple, people from which Kurds got their NorthWest Iranian language, were not from SouthCentral Asia at all, because Azeri don;t have BMAC auDNA and since Azeri shoudl have some Medes ancestry.

That means that BMAC ancestry in Kurdistan came just AFTER the Medes. The most likely explanation are the Parthians, who were themselves for a huge part of the Medes ancestry Those Parthians mixed a lot in SouthCentral Asia with Saka (Scythians). And those Scythians were heavily mixed with the Mongoloid people in Central Asia.

So, conclusion is that most of Turkic/Mongoloid auDNA in Kurdistan was brought by the Parthians who assimilated the Saka Scythians. But it was not that much. And Kurdish language, ethnicity, religion, culture etc was already for a huge part contributed by the Medes.
Also, that Sintashta-MLBA-type of ancestry has to be post Medes (Aryan) era. That Sintashta-MLBA-type of ancestry can be explained by the European Scythians, Cimmerians etc. And ot is very artificial component, since Sintashta-MLBA component was lready very mixed with a lot Iran_Neo ancestry.
1. Dilawer Khan
  September 30, 2018 at 1:34 am
  
  Perhaps because those Azeri samples are from Azerbaijan and not Iran. No doubt Iranian Azeris are genetically one of Kurds’ closest relatives.
  
  The analysis seems to indicate that Kurds have a layer of IA Saka/Scythian ancestry and another layer of more recent Turkic mediated by perhaps Seljuks, Ottomans, and Turkmen. The Turkic evidence shows up in the dstats in fig 4 where Kurds are very Mongolian shifted to the exclusion of their Chalcolithic Iranian ancestors.
  
  With regards to the older Sintashta type, Saka/Scythians would have mediated some of it to Kurds
Goga
September 30, 2018 at 11:12 pm

I don’t understand what we are talking about? It is less important and big than it has shown.

We are not inventing any wheel here.

Of course there is some gene flow from Mongoloid people into Kurdistan during the recent times. But not much more important than Afro-Asiatic (Assyrian etc.) gene flow. A very, very thin layer of 3-4-5 % of additional Turkic auDNA of recent times. The contribution of those people is practically invisible in Kurdistan.
They are not really relevant in Kurdish history of 9000 years, the contributed nothing to Kurdish history, and therefore they are irrelevant.

What matters is that NorthWest Iranian proto-Kurdic ancestors of the Kurds (Medes) had nothing to do with SouthCentral Asian at the first place and were just native to Kurdistan. Azeri auDNA doesn’t show any BMAC ancestry. While there should be some Medes ancestry left in Azeri after Turkification of that people. Therefore we can almost certainty assume that Kurdish direct Aryan Medes ancestors had nothing to do with the BMAC etc. at all.

Conclusion:

a) recent Turco-Mongoloid gene flow into Kurdistan is minor and not really important. 3, 4, 5 % of that auDNA is not really relevant. As much irrelevant as Afro-Asiatic (Semitic/Assyrian) gene flow.

b) Aryan Medes were just native to Kurdistan, without (South) Central Asian auDNA component.
Aran Medes were most likely like the modern Kurds. A mixture of Anatolian and Iranian Plateau auDNA.
Goga
October 4, 2018 at 11:44 am

Aha, as you can see Eastern Scythians had a lot ‘Mongoloid’ auDNA.

So this is our evidence that most of EastAsian auDNA in Kurdistan was brought by Saka (Scythians).

Also, so called ‘Cimmerians’ were full of the NorthEast Asian auDNA.

http://advances.sciencemag.org/content/advances/4/10/eaat4457.full.pdf