Post Iron Age Introduction of Y-DNA R1a R-Z94 and East Asian Ancestry into Kurdistan, North Iran, and Turkey with the Parthians and Scythians

Abstract

The recent publication of numerous DNA sequences from ancient humans spanning the Neolithic to the Iron-Age and Medieval time frames from the Anatolia, Caucasus, and Iran regions has greatly furthered our understanding of the formation of modern ethnic groups such as Armenians, Kurds, Persians, and Turks, which we define as our study populations.

Our detailed downstream analysis of whole genome sequencing (WGS) data along with published DNA genotype data from ancient and extant individuals, including genotype data from the recent 2022 extensive multi-national collaborative studies on the origin and dispersal of Indo-European languages by Lazaridis et al1 shows that the regions encompassed by Anatolia, the Caucasus, Iran, and Iraq underwent major demographic changes from the Bronze-Age to present. These changes include the introduction of hitherto absent paternal haplogroup lineages such as R1a-Z94 from Central Asia during the Parthian era, especially among the Kurds, as well as the introduction of Indo-Iranian languages from the ancient region of Ariana (present-day Turkmenistan and Afghanistan) and surrounds.

We also utilize a robust technique which we call “Differential Cladality Analysis” using the formal statistical software qpWave6, where we perform pairwise cladality tests between one of our study populations and various ancient and extant Eurasian populations. This allows us to determine which ancient populations have genetically contributed to one or more of our four study populations to the exclusion of our other study populations on a relatively more recent time scale.

INTRODUCTION

The recent 2022 extensive multi-national collaborative study and multiple publications on the origin and dispersal of Indo-European languages by Lazaridis et al 1 showed that proto Indo-European languages and Yamnaya’s paternal ancestors originated in the “Southern Arc” region of Armenia and Northern Iran. Subsequently the modified Indo-European languages dispersed from a secondary staging area on the Eurasian Steppe to Europe and Asia. Here we show that the Indo-European languages spoken in the Kurdistan region and northern Iran were further modified post Iron Age by Parthians from Central Asia (Fig 1) who hybridized with the local Iranian/proto-Kurd Iron Age populations at Hasanlu and Tepe Dinkha and introduced a new Y-DNA lineage hitherto missing in the region; R1a Haplogroup R-Z94. R-Z94 became the dominant Y-DNA lineage in Kurds from certain regions2,3,4 , however prior to the Parthian period this lineage was non-existent with a frequency of 0% in the 230 good coverage ancient Bronze and Iron Age samples from Iraq, Iran, and Turkey released by Lazaridis et al1 . Utilizing formal statistics (qpWave and qpAdm and a solid set of pright references) we identify the genetic profile of these Central Asian Iron Age populations who hybridized with the local Iranian Bronze Age and Late Iron Age to form present day Kurds and Persians.

The Lazaridis et al Southern Arc study published 230 new ancient male samples covering Armenia, Iran, Iraq, and Turkey from the Chalcolithic to around 600 years ago. Yet out of the 230 male samples only two recent 1300 year old samples from Turkey were positive for Y-DNA haplogroup R-M417, however according to Grugni et al2 Y-DNA R1a1a R-M17 is a major haplogroup with a frequency of 20% among Kurds (Fig2 ). In fact, a glance at Fig 2 shows that Kurds have the highest frequency of R1a1a amongst all West Asian populations including western Iran and this frequency rivals that of Persians on the border of Afghanistan in Khorasan province. Additional downstream tests have shown that R-Z94 is by far the major R1a1a subclade for Kurds. Since the 230 ancient samples from Kurdistan, Iran, and Turkey, published with the Lazaridis et al “Southern Arc” international study were negative for R1a, this is evidence that R-M417 and R-Z94 was introduced into the Kurdistan regions of Iran, Iraq and Turkey and Northern Iran from the east (Central Asia) post Iron-Age with the most likely vectors being Parthian and Scythian militaristic waves (Fig 1).

This is consistent with the historical, archeological and linguistic connections between certain Iranics such as Kurds and Parthians. This also means that since R1a R-Z94 is a major haplogroup among present day Kurds and was absent in the Kurdistan and Turkey regions from the Chalcolithic to the end of the Iron-Age, that the present day Kurd genetic identity was born post Iron-Age when Central Age Parthians hybridized with pre-R1a populations in the area to form present-day Kurds, and the Hasanlu-IA and Dinkha Tepe populations can be considered proto-Kurds or Persians at best. Our qpAdm models for present day Kurds as DinkhaTepe-A/Hasanlu-IA + Central Asian R-Z94 were extremely robust and could not be rejected even with increasing pright references to 15 Neolithic through Iron-Age diverse Eurasian references. This of course makes sense and is consistent with the aforementioned.

QpAdm models and qpWave 2 source cladeliness analysis also unequivocally show that East Asian admixture in Kurdistan, Iran and Turkey significantly increased with the arrival of R1a and Turkics post Iron-Age.

We identify several Iron Age good coverage samples from Central Asia which proxy this eastern Parthian genetic input into present day Kurds and are able to produce very robust qpAdm 2 source models for Kurds as IRN-DinkaTepe-BA/Hasanlu-IA + Central-Asian-IA. Surprisingly, we are even not able to reject Kurds as 100% Tajikistan-Kushan using the same set of robust pright references. We suspect that this sample is genetically Parthian.

Here we also identify a 2000 year old good coverage DNA sample recovered from Tajikistan from the Parthian Era that forms a perfect clade with Kurds and to a lesser degree Persians using qpWave rank=0 and 17 diverse East and West Eurasian Neolithic, Chalcolithic, and Bronze Age references. Although the sample is labeled as Kushan, we believe it is likely Parthian since the 2 empires overlap in space and time

Due to the established relatively lengthy historical presence of Armenians in this region, we use them as a geographical reference population to show how Iranics such as Kurds and Persians are genetically shifted to the north and east of Armenia, to wit, Central Asia and the Caucasus, and thus the homeland of a substantial proportion of their ancestors is the Parthia Central Asia region which includes Aryana. The Caucasus region includes Sarmatia.

We find that there is greater genetic continuity since the Iron Age with Armenians and Kurds, since using qpAdm they are relatively easy to model with 2 source populations whereas Persians and Turks appear to have a more complex demography and are thus more difficult to model with only 2 source populations.

Using the recently published samples from the “Southern Arc” we also dentify good Late Bronze Age and later source populations for Armenians, Kurds, Persians and Turks, which when combined with source populations from Uzbekistan, Tajikistan, Xinjiang and the Caucasus produce very robust 2 source models especially for Kurds (Fig 7).

Fig 1 – Out of the 230 good coverage male ancient samples from Iran, Iraq and Turkey published in the Lazaridis et al “Southern Arc” paper covering the Chalcolithic to around 1500 years ago only 2 were positive for Y-DNA R1a (M417); 1300 year old sample I4475 TUR-SE-Mardin-RomByz and 1800 year old sample I4531 TUR-BlackSea-Samsun-Anc, both from Turkey. Thus 0% of the Iron Age samples were positive for R1a. By contrast, R-Z94 (R1a) is the predominant Y-DNA haplogroup in present day Kurds from certain regions2,3,4 . This is strong evidence that the Parthian and Scythian and possibly Sarmatian and Cimmerian ancestors of present day Kurds and Persians introduced this haplogroup to Western Iran within the last 2000 years from Central Asia and the North Caucasus. A significant increase in East Asian ancestry also accompanied the introduction of Y-DNA R-Z94 in the Kurdistan and North Iran regions according to our qpAdm models. In fact we see evidence that East Asian ancestry accompanied the introduction of R-Z94 as both of the aforementioned samples have East Asian ancestry.
Fig 2 – The ancient samples published in Lazaridis et al1 showed a lack of Y-DNA R1a in Iraq, Iran and Turkey from the Chalcolithic through the Iron-Age. However, R1a’s Z94 subclade is a major haplogroup among present day Kurds2,3,4. This is strong evidence that the Kurd ancestors, to wit, Parthians and Scythians and others from Central Asia introduced R-Z94 to Kurdistan, Iran, and Turkey post Iron-Age. This was accompanied by an increase in East Asian ancestry in Kurdistan, Iran and Turkey post Iron-Age. This can be attributed to the Parthians, Scythian and Turkic waves from Central Asia post Iron-Age.

A couple of years ago we were excited to discover a high degree of genetic continuity in Kurdistan from 2700 years ago with the release of the Mede era Hasanlu-IA sample where present day Kurds could be modeled as 80% with this sample. Amazingly, however, we now detail a 2000 year old good coverage DNA sample from Tajikistan from the Parthian period that forms a significantly tighter clade with Kurds than Hasanlu-IA with Kurds. This sample is labeled a Kushan sample, however, we believe it is likely Parthian since the 2 empires overlap in space and time. We name this Parthian-LIA-Tajikistan. Kurds can be modeled as 100% Parthian-LIA-Tajikistan using qpAdm even while increasing the number of pright reference populations to 17 diverse Neolithic, Chalcolithic, and Bronze Age West and East Eurasians.

METHODOLOGY & QUALITY CONTROL

We employ the following multi-fold strategy in our analysis here:

  1. We use the qpWave 2 source framework, and introduce a new method which we call normalized differential shift, to determine relatively more recent introgression into a population to the exclusion of its neighbors as shown in figures 15 – 20. Normalization of the results for each population mitigates pre-existing biases affecting the study samples from using different sequencing and genotyping techniques.
  2. QpWave and qpAdm are included in the Admixtools package introduced by Patterson et al6 in 2 source population (pleft) cladeliness mode, and a large number of Neolithic, Bronze-Age, and Iron-Age population references in pright to determine how much the 2 source populations form a clade with each other with regards to the diverse reference populations (Fig 3). We use the Chisq output to compare for example the cladeliness betwen DinkaTepe-B or Kazakhastan-Wusun or present day Mongols and Armenians vs Kurds vs Persians vs Turks;
  3. We tabulate the Chisq values obtained in (1) and perform differential cladeliness analysis where we calculate the cladeliness ratio of say Kurds/Armenians, Kurds/Persians and Kurds/Turks to determine which populations Kurds are closer to to the exclusion of Armenians, Persians, and Turks. This is informative to recent admixture events and migrations because in more distant past admixture scenarios these ratios should approach 1 for neighboring populations such as Armenians, Kurds, Persians, and Turks;
  4. We use qpAdm which is also a software included in the Admixtools package to determine the “Southern Arc” region present-day populations ancestry proportions using a large number of Neolithic and younger reference populations (pright) along with highest quality published samples in an effort to make all but the most robust ancestry models fail.
Fig 3 – Distance calculations from Armenians, Iranians, Kurds and Turks to ancients published in Lazaridis et al1 using the QpWave 2 source method with 16 pright Neolithic to Iron-Age reference populations. Chisq values have been normalized for each target to mitigate any bias that may exist due to different genotyping platforms for the target populations

We define our study populations as follows:

Armenians = SA

Iranians = SI

Kurds = SK

Turks =ST

We test the strength of the clade formed by our study populations, S with various ancient and extant target populations, T, with respect to right populations. Using qpWave we set pleft to a study population, S, and one target ancient or extant population, T.

We use the following robust set of 16 pright reference populations in qpWave and qpAdm to differentiate subtle differences in shared drift between the targets and references;

Mbuti

CHG

IRN_Ganj_Dareh_N

ISR_PPNB

MAR_Taforalt_EpiP

SRB_Iron_Gates_HG

MA1

TUR_Marmara_Barcın_N

WHG

RUS_Yana_RHS

MNG_Khovsgol_LBA

RUS_DevilsCave_N

RUS_Shamanka_EN

Loschbour_WHG

UZB_Sappali_Tepe_BA

RUS_Siberia_Lena_EBA

Thus we obtain a m x 4 matrix, A, of qpWave Chisq ( χ2) outputs, where each columns is one of our 4 study populations, and each row is either an ancient or contemporary test population.

A = ( χ2ij)1≤i≤m, 1≤j≤4

Thus, each of our study populations which are represented by a column in A can be shown as the following vectors:

Where each vector, S, is composed of Chisq value entries; S = [ χ12  χ22  χ32 . . . . . χm2 ] for m target populations.

We can alternatively state that each study population, S, is an m-dimensional vector. From extensive prior experience processing DNA sequences, mapping sequences to human reference genomes, and genotyping, we have determined that the following affect the analysis ready genotyped DNA sequence:

  1. SNP ascertainment bias. The SNPs that are used significantly affect analysis results
  2. Sequencing depth, for example, 1X vs 30X
  3. Paired-End vs Single-End sequencing
  4. DNA fragment read lengths
  5. Software and strategy used for mapping to reference genome
  6. Downstream sequence processing, for example, GATK vs Samtools
  7. Single vs joint genome genotyping
  8. Various quality filter thresholds used at various stages of the pipeline

Unfortunately most scientists are not aware of the impact of the aforementioned on the downstream analysis results since most scientists use analysis ready genotyped datasets, and are not themselves involved in processing the DNA sequences. After devoting many years, considerable effort and resources to understand these issues, we have become quite familiar with them.

The Armenian, Iranian, and Kayseri Turk samples were sequenced and processed as part of the older Simons Genome Diversity Project, whereas the Kurd samples were WGS sequenced to 30X using the latest NGS sequencers. The ancient DNA data was for the most part pseudo-haploid data and genotyped on the 1240K Affymetrix microarray. We performed additional quality by filtering out samples with read depths less than 1X.

the Parthians & Scythians

Around 2300 years ago a series of momentous events occurred on the steppes of the Trans-Caspian area and Turkmenistan. A Central Asian nomadic Scythian tribe called the Parni subsequently to be known as the Parthians took over the Seleucid Empire established one of the most enduring states in the history of Asia for 500 years from 248 BCE to CE 226. At its peak the Parthian Empire stretched from the Mediterranean all the way to China.

Fig 4a – Tribes such as Yuezhi, Wusun, and Xiongnu which genetically contributed to present day Iranics such as Kurds and Persians as well as Turks

The essential human resources at the roots of the Arsacid state were the Dahae, a community of Central Asian nomads. Nomadic clans overpowered the settled Iranian populace of Khorasan, imposing a monarchical structure on a nascent social amalgam of nomadic peoples of the steppes and settled Parthians.

With the rise of the Arsacid Parthians in Iran and in the Trans-Caspian steppes for the first time in recorded history nomads from Central Asia had subdued parts of Iran and imposed their dynasty on it. Henceforth dynasties with nomadic roots such as Seljuks, Ilkhanids and dynasties of Turkmen origin would come to power in Iran. In Eastern Iran and Afganistan waves of nomadic peoples established several states beginning with the Yuezhi and Saka through the White Hun Hephtalites. The growth of the Arsacid state is in many respects comparable with that of the Seljuks, who originally were nomads in Central Asia.

Fig 4b – The Wusun, an Indo-Iranian speaking peoples whose descendants maybe Sogdians and overlapped territories with Saka and Yuezhi around the Tian Shan mountains of China, Kazakhstan, Kyrgistan and may have around the 5th century migrated into the Pamir Mountain range produced excellent qpAdm models (p-value 0.55) consisting of 32% Kazakh Wusun + 68% DinkhaTepe A (Fig 9a). The Wusun sample had Y-DNA R1a M-417 which is ancestral to R1a-Z94 and Mt-DNA J1c. They very likely resembled present day Tajiks, Kurds, Persians and Turks in phenotype and possibly overlapped genetically with some Parthians.

Under Mithradates I (165–132 BCE), Parthia became the most significant state power in Iran and southwestern Central Asia. He subjugated Margiana, most districts of Bactria, western Iran and Mesopotamia. In the meantime, the attacks of the Xiongnu against the Yuezhi set in motion nomadic tribes of Central Asia. As a result of these movements, various steppe peoples appeared n the region of Transoxiana and were pushed into the borders of Graeco-Bactria and Parthia.
Several steppe tribes including Yuezhi, Saka, and Asioi subdued Bactria. Sogdiana had fallen prey to nomads already in the 3rd century BCE. The Yuezhi mi-
gration forced some tribes to move into the territory held by the Arsacids.

Fig 4c – The Kushans an Indo-Iranian people are a Yuezhi tribe who some speculate have Saka or Tocharian origins, ruled in Bactria and Sogdiana which are a part of Afghanistan and Tajikistan. Their territories overlapped those of the Parthians in space and time. Amazingly, a qpAdm model of Kurds and to a lesser extent Persians as 100% Kushan sample “TJK_Ksirov_H_Kushan I12292” can not be rejected (p-value > 0.05). A robust qpAdm model for Kurds consisted of 83% Kushan + 17% DinkhaTepe as shown in fig 9a. Thus it is reasonable to believe this sample which had Y-DNA R2 and Mt-DNA U2e is very Parthian like. Their religions included Buddhisim and Zoroastrianism.
Fig 4d – Bronze statute of a Parthian from Khuzestan Province, Iran. The dress resembles the Kurdi Sharwal (Shalwar in Urdu) worn from Kurdistan to Tajikistan and Pakistan.
Fig 4e – The Parthian Empire was one of the longest lasting and covered a large territory stretching from the Mediterranean to China.

As a result, two Parthian kings, Phraates II and Artabanos I, lost their lives in encounters against the Saka and the Tocharoi. Consequently, the Saka overran and plundered the eastern Iranian provinces of the Parthian Empire.

Fig 5 – Parthia and Scythia. Parthians orginated east of the Caspian sea and are said to be the Parni Scythian tribe by origin.

Just as present day Kurds are quite genetically diverse and span a large geographic areas and speak multiple Kurdi languages such as Badini, Sorani, Gorani, Zazaki and Feyli which are quite mutually unintelligible, it is also expected that Parthians would also have had a moderate genetic diversity with individuals overlapping with Scythians, Kushans, Yuezhi, and Wusun because of their overlapping territories.

Linguistically, Parthian seems to have affected the various Kurdi languages more than Persian (fig 5). This is consistent with Kurdi sharing dozens of cognates with Pashto to the exclusion of Persian (fig 6).

Fig 6- Parthian having a greater impact on Kurdi and Balochi to the exclusion of other Iranic languages. Indo-European language tree. Kurdish and Balochi are under the Parthian branch of the north-western Indo-Iranian branch. authors, M. (2013, January 19). Indo-European language family tree. Ancient History Encyclopedia. Retrieved from https://www.ancient.eu/image/1028/
Fig 7 – A few examples of Kurdi and Pashto sharing cognates to the exclusion of Persian (Farsi). Athough it’s still a mystery, we believe the reason is Parthians and the Parthian language having a larger impact on Kurdi and Pashto to the exclusion of Farsi.
Fig 8- qpWave 1/Chisq plot for Armenians vs Kurds vs Turks. We clearly see that Iranian DinkaTepe-B and Hasanlu-IA, and also Tajikistan Kushan (we suspect maybe Parthian) have a much stronger genetic relationship with Kurds than with Armenians or Turks, whereas Medieval Armenian Agarak and Kura-Araxes-EBA have a stronger genetic affinity to Armenians. The other Central Asian populations shown have a stronger relationship with Kurds, then Turks and followed by Armenians.
Fig 9a – Very robust QpAdm models for Kurds result when Iranian Zagrosian DinkaTepe-BA or Hasanlu-IA are used as once ancestral source, and an Iron-Age or later Central Asian population is used as the other source. We are surprised to see the robustness of these models such that even 16 Neolithic and younger diverse Eurasian reference populations as references (pright) does not cause the models to fail.
Fig 9b – Robust qpAdm models using 700K intersecting SNPs showing that Kurds can be modeled as Armenians + Central/South-Central Asians, and thus pointing to the ancestral homeland of Iranics such as Kurds to the east of Armenia. Admixture percentages are shown in brackets with models sorted by p-value.
Fig 9c – Robust qpAdm models using 600K intersecting SNPs showing that Kurds are a mix of local Bronze/Iron Age Iranian + Iron-Age Central Asians, and thus pointing to the ancestral homeland of Iranics such as Kurds to the east of Armenia. Admixture percentages are shown in brackets with models sorted by p-value. Populations similar to the above Central Asians introduced Y-DNA haplogroup R1a-Z94 to Kurdistan post Late Iron-Age to make it one of the major haplogroups among present day Kurds and to a lesser extent a few other West Asian populations.
Fig 10- Genetic similarity of Kurds-IRQ with various Chalcolithic to Iron-Age populations used in our qpAdm models. These results were obtained using the QpWave 2 source cladeliness methodology with a very robust set of 16 Chalcolithic to Iron-Age pright references. Some of these are similar to Parthians and Scythian Kurd ancestors who introduced R-Z94 to Kurdistan, Turkey, and Iran post Iron-Age. Although DinkhaTepe-B forms a stronger clade with Kurds, DinkhaTepe-A results in more robust qpAdm models for Kurds as a local Iranian source due to the additional Mesopotamian admixture which it has. This additional Mesopotamian admixture is likely due to iintrogression of Assyrian admixture which is probably due to the lengthy geographical overlap of the Assyrian and Persian Empires.
Fig 11 – Present day Kurd facial features are consistent with qpAdm ancestry results showing them a mix of approximately 60-70% local Persian Zagrosian Bronze/Iron-Age DinkhaTepe-A + 30-40% Post Iron-Age Central Asian Parthian/Scythian/Turkic which introduced the now common haplogroup R1a R-Z94 among Kurds
Fig 12- Present day Kurd facial features are consistent with qpAdm ancestry results showing them a mix of approximately 60-70% local Persian Zagrosian Bronze/Iron-Age DinkhaTepe-A + 30-40% Post Iron-Age Central Asian Parthian/Scythian/Turkic which introduced the now common haplogroup R1a R-Z94 among Kurds
Fig 13- Present day Kurd facial morphology is consistent with qpAdm ancestry results showing them a mix of approximately 60-70% local Persian Zagrosian Bronze/Iron-Age DinkhaTepe-A + 30-40% Post Iron-Age Central Asian Parthian/Scythian/Turkic who appear to have introduced the now common haplogroup R1a R-Z94 among Kurds. The hundreds of Chalcolithic to Iron-Age ancient DNA samples from Iraq, Iran and Turkey published thus far indicate that R-Z94 was missing back then.

COMPARISONS WITH PRESENT DAY POPULATION

To improve our understanding of the population histories of Armenians, Kurds, Persians, and Turks, and to corroborate findings based on ancient DNA, it is helpful to see in which directions these populations are shifted relative to each other. To do so we tabulate QpWave cladeliness for each of these populations with present day Eurasians. To mitigate any biases that may exist due to different sequencing technologies and genotyping pipelines between our targeted study samples, to wit, 2016 Simons WGS Armenians, Turks, and Iranians vs 2022 WGS 30X Kurd-IRQ samples, we normalize the QpWave Chisq results for each of these targeted populations. The normalized results based on QpWave chisq are shown in fig 14.

Although the aforementioned results give a very accurate picture of shared genetic drift between Armenians, Kurds, Turks and Iranians and various Eurasian populations, what if just as important if not more so is what historical admixture events are set a population apart from its surrounding populations. In other words what part of its population history is significantly different from its neighbors, or alternatively, how much more genetic drift does a population share with other Eurasian populations to the exclusion of its neighbors.

For example, to determine how much more genetic drift Kurds share with Brahui to the exclusion of Armenians, we would calculate [(1/chisqN)Kurd-Brahui ] / [(1/chisqN)]Armenian-Brahui ], where N are the normalized values. When these results are tabulated and sorted we obtain information on which Eurasian populations Armenians, Kurds, Persians and Turks share the MOST genetic drift with to the exclusion of each other. In other words, which geographical direction these populations are shifted to with respect to each other. We tabulated and sorted these results for each of the 4 study populations and show the results in Fig 15 – 19.

With a quick glance at figures 15 & 16 it is immediately apparent that Europeans, Siberians, Central and South Asians share significantly more ancestry with Kurds than Armenians. This reinforces our hypothesis that Armenians have a higher genetic continuity in West Asia than Kurds and are more thus derive more of their ancestry locally. Alternatively, we can say Kurds have more ancestry from Europe and Central Asia than Armenians since figure 15 clearly shows Europeans and Central Asians share greater ancestry with Kurds than with Armenians. Figure 16 which is the reverse of figure 15 shows that some local West Asians and Southern Europeans such as Georgians, Iraqi Jews and Greeks share more ancestry with Armenians than with Kurds. Some of this is probably an artifact of greater shared Neolithic Anatolian Farmer ancestry with Armenians to the exclusion of Kurds.

Fig 14 – A comparison of how much of a clade various Eurasians form with Armenians, Kurds, Persians, and Turks using qpWave and 16 robust Neolithic to Iron-Age Eurasian references (pright). Results are as expected and help corroborate findings based on ancient DNA. Kurds form the tighest clades with Iranians followed by Turks, Lezgins, Abkhazians, and Ossetians. Armenians form the tightest clades with Georgians followed by Abkhazians, Iraqi Jews and Kurds. Turks form the tightest clades with Kurds, followed by Adygei, Iranians, and Armenians. Iranians form the tightest clades with Kurds, followed by Turks and Lezgins.
Fig 15 – These results show how much excess of a clade various Eurasians form with KURDS than with ARMENIANS, and were obtained using qpWave using a Eurasian population as one source and Kurds or Armenians as the other, and 16 robust Neolithic to Iron-Age Eurasian populations as references (pright) and approximately 700K intersecting SNPs. The results are calculated as [(1/chisqN)Kurd-S2 ] / [(1/chisqN)Armenian-S2]. This is informative to which Eurasian populations KURDS are genetically shifted to to the exclusion of ARMENIANS. This in turn allows us to determine from which geographical regions Kurds received more recent admixture to the exclusion of Armenians, which in turn allows us to estimate in which direction from Armenia the average ancestry of Kurds is located, which appears to be to the north and east of the Caspian Sea. It also corroborates our results obtained using ancient DNA. For example, we can see that Kurds share substantially more ancestry with Iranians, Brahui, Pashtuns, other SC Asians, Turks, Mongols, NE Europeans and Siberians than Armenians. This also explains why R1a Z-94 is highest amongst Kurds in West Asia and forms one of their major haplogroups.
Fig 16 – These results show how much excess of a clade various Eurasians form with ARMENIANS than with KURDS, and were obtained using qpWave using a Eurasian population as one source and Kurds or Armenians as the other, and 16 robust Neolithic to Iron-Age Eurasian populations as references (pright) and approximately 700K intersecting SNPs.
Fig 17 – These results show how much excess of a clade various Eurasians form with IRANIANS than with ARMENIANS, and were obtained using qpWave using a Eurasian population as one source and Iranians or Armenians as the other, and 16 robust Neolithic to Iron-Age Eurasian populations as references (pright) and approximately 700K intersecting SNPs. This is informative to which Eurasian populations IRANIANS are genetically shifted to to the exclusion of ARMENIANS. This in turn allows us to determine from which geographical regions Iranians received more recent admixture to the exclusion of Armenians, which in turn allows us to estimate in which direction from Armenia the average ancestry of Iranians is located, which appears to be to the north and east of the Caspian Sea
Fig 18- Here we see far fewer populations than in figure 15 because genetically Kurds are closer to Turks than Armenians as shown with our qpWave cladeliness results. Since Turks also have substantial Central Asian admixture like Kurds, we see fewer populations which are more “Kurd than Turk like” than previously with Kurds versus Armenians. These results show how much excess of a clade various Eurasians form with KURDS to the exclusion of TURKS, and were obtained using qpWave using a Eurasian population as one source and Kurds or Armenians as the other, and 16 robust Neolithic to Iron-Age Eurasian populations as references (pright) and approximately 700K intersecting SNPs. This is informative to which Eurasian populations Kurds are genetically shifted to to the exclusion of Turks, whereas below in Fig 17 we show the reverse, the populations Turks are shifted to to the exclusion of Kurds
Fig 19These results show how much excess of a clade various Eurasians form with TURKS to the exclusion of KURDS, and were obtained using qpWave using a Eurasian population as one source and Kurds or Turks as the other, and 16 robust Neolithic to Iron-Age Eurasian populations as references (pright) and approximately 700K intersecting SNPs. This is informative to which Eurasian populations Turks are genetically shifted to to the exclusion of Kurds, whereas above in Fig 16 we show the reverse, the populations Kurds are shifted to to the exclusion of Turks
Fig 20 – These results show how much excess of a clade various Eurasians form with KURDS to the exclusion of PERSIANS, and were obtained using qpWave using a Eurasian population as one source and Kurds or Persians as the other, and 16 robust Neolithic to Iron-Age Eurasian populations as references (pright) and approximately 700K intersecting SNPs. This is informative to which Eurasian populations Kurds are genetically shifted to to the exclusion of Persians.

EHG Admixture as a Measure of Central Asian Steppe Admixture

Eastern Hunter Gatherer (EHG) admixture was recently used in the Southern Arc paper by Lazaridis et al1 as a signal of Eurasian steppe admixture in West Asia. We agree this is sound practice since the use of Steppe EMBA and MLBA populations such as Yamnaya, Sintashta and Andronovo as Steppe proxies will yield ambigous results since any shared haplotypes between West Asians and the aforementioned does not necessarily indicate Steppe admixture, since it could simply due to shared ancestral ancestors with Steppe EMBA or MLBA.

Thus we also adopt the use of EHG admixture as a signal of Steppe admixture in our study groups. We use the qpAdm framework with the following robust set of pright references; Khomani_San, MAR_Taforalt_EpiP, JOR_Feldman_PPNB, TKM_Parkhai_EN, ANE, TUR_C_Boncuklu_PPN, WHG, SRB_Iron_Gates_HG, CHN_YR_LN, XINJ_Chemurcheck.N.BA3, Tarim_EMBA1, RUS_Shamanka_EN.

Lazaridis et al1 determined that Eastern Anatolia received negligible Eurasian Steppe admixture. We expected that this does not apply to extant Iranic populations in Eastern Anatolia such as Kurds since Y-DNA haplogroup R1a-Z94 is a major haplogroup among Kurds where they appear to have the highest frequency of this haplogroup in West Asia ( figure 2). Our analysis results reflect this by showing EHG admixture highest among Kurds from our study group, are certainly consistent with the relative high frequency of Y-DNA R1a-Z94/Z93 among Kurds. Lazaridis et al1 also showed via the hundreds of ancient samples published from the Kurdistan, Anatolia, Armenia, and Iran region that haplogroup R1a-Z94 was missing from this region through the Iron Age. This clearly shows that the R1a-Z94 ancestors of Iranics such as Kurds must have arrived in the region during the late Iron-Age to the historical time frame. Thus the Parthian, Scythian, Sarmatian, and Turkic ancestors of Iranics such as Kurds must have introduced this haplogroup to the region.

Our qpAdm results using EHG as a Eurasian Steppe Indo-Iranic signal do indeed show that present day Armenians, Kurds, Persians and Turks have EHG admixture (figure 21) with the highest EHG percentage being in Kurds which is consistent with Kurds having the highest Y-DNA R1a-Z94 frequency.

We also find that although Armenians and Persians can be modeled in qpAdm without additional East Asian sources albeit with lower passable p-values, the same can not be said of Kurds and Turks which require some East Asian source to produce passable p-values (figures 21-23).

Fig 21 – Admixture results using the qpWave and qpAdm framework. Turkmenistan Neolithic consist of the highest quality Parkhai-EN samples and is similar to the Iranian Zagrosian Neolithic samples. TKM-Parkhai-EN was used in lieu of Iran-N because it produced more robust p-values for the studied populations. Kurd models consistently required the highest EHG percentage which is consistent with Kurds having the highest Y-DNA R1a-Z94 frequency in West Asia. Although Armenians and Iranians could be modeled without additional East Asian but with lower passing p-values, Turk and Kurd models did not produce models with passing p-values without additional East Asian input.
Fig 22 – Admixture results using the qpWave and qpAdm framework. IRAN-NE-Chalcolithic consist of the highest quality IRN-Teppe Hissar samples and are somewhat similar to their Zagrosian Chalcolithic counterparts. Kurds have the highest EHG percentage which is consistent with Kurds having the highest Y-DNA R1a-Z94 frequency in West Asia. Although Armenians and Iranians could be modeled without additional East Asian but with lower passing p-values, Turk and Kurd models did not produce models with passing p-values without additional East Asian input.
Fig 23 – QpAdm models shown in figs 21-22 with standard errors and p-values

It is important to remember that models utilizing EHG and Iron-Age East Asians as genetic sources are only informative to the relative degree of Eurasian Steppe introgression among West Asians, since these populations themselves did not actually participate in the demography of West Asians, and did not themselves hybridize with local West Asian populations such as Iran-Chl/BA or Iran-IA to produce present day Armenians, Kurds, Turks and Persians. Thus their combined EHG and Mongol percentages shown in figures 21-23 are not the actual percentages of non-local West Asian introgression into the present day West Asian populations for this study. The reader is referred to figures 9(a)(b)(c) for a more realistic representation of the actual percentage contribution of Central Asians to the demography of West Asians.

For example, the combined 15% EHG/East Asian in Kurd models, and the combined 4% combined EHG/East Asian in Armenian models would translate to a much a higher percentage in Central Asian Indo-Iranian required input since the relevant Scythians or Parthians were not 100% EHG or East Asian themselves. This is clearly evident when we model Armenians, Kurds, Turks and Persians using the relevant Iran-Chl/BA + relevant Scythians or similar Central Asian populations that potentially introduced Y-DNA R1a-Z93 or Z94 into West Asia post Early Iron Age as indicated in figure 9.

Methods & Quality Control

Our analysis used published data 1&2 genotyped on the 1240K Axiom Affymatrix array. The present-day populations consist of the Simons WGS set. All the available Iranian, Armenian and Turkish samples in the Reichlab dataset were used in this study. As an additional quality control measure we removed samples with less than 600,000 intersecting SNPs with the 1240K set. Additionally, for Kurds-IRQ we used two whole genome sequenced males from northern Iraq sequenced to a depth of 30X. We processed the fastq sequences using our own samtools pipeline which consists of the following:

  1. AdapterRemoval to remove remnant adapter sequences from High-Throughput Sequencing (HTS) data and trim low quality bases from the 3′ end of reads following adapter removal. Additionally, we removed sequences shorter than 24 bases which due to their short length may map to several regions of the reference genome
  2. bwa mem to quickly align the sequences to the GRCh37 Human Reference and piped the output to samtools view to produce BAM files. The reads were merged post-alignment instead of with AdapterRemoval
  3. samtools sort to coordinate sort BAM file
  4. Picard to soft-clip beyond-end-of-reference alignments and set MAPQ to 0 for unmapped reads
  5. Picard to remove duplicate sequences
  6. bcftools mpileup to convert BAM file into genomic positions we first use mpileup to produce a BCF file that contains all of the locations in the genome. We use this information to call genotypes and reduce our list of sites to those found to be variant by passing this file into bcftools call. Thresholds used were –min-MQ 15, –min-BQ 20
  7. bcftools view to remove positions supported by less than 4 reads
  8. This yielded VCF files with 1478 million bases
  9. Extraction of Affymetrix 1240K variants yielded a final VCF of 1160K variants

Data Availability

The 1240K genotype data used in this study is readily available at Reich Lab at https://reich.hms.harvard.edu/datasets

REFERENCES

  1. The genetic history of the Southern Arc: A bridge between West Asia and Europe, Lazaridis et al, American Association for the Advancement of Science, doi: 10.1126/science.abm4247, https://doi.org/10.1126/science.abm4247, 2022/09/28
  2. Grugni V, Battaglia V, Hooshiar Kashani B, Parolo S, Al-Zahery N, Achilli A, Olivieri A, Gandini F, Houshmand M, Sanati MH, Torroni A, Semino O. Ancient migratory events in the Middle East: new clues from the Y-chromosome variation of modern Iranians. PLoS One. 2012;7(7):e41252. doi: 10.1371/journal.pone.0041252. Epub 2012 Jul 18. PMID: 22815981; PMCID: PMC3399854.
  3. Hennerbichler, F. (2012). The Origin of Kurds. Advances in Anthropology, 2, 64-79. doi: 10.4236/aa.2012.22008.
  4. http://corduene.blogspot.com/2015/07/northern-kurds-y-dna-haplogroups.html
  5. Bronze and Iron Age population movements underlie Xinjiang population history, Kumar et al, Science 376, 62 (2022), DOI: 10.1126/science.abk1534
  6. Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, Genschoreck T, Webster T, Reich D. Ancient admixture in human history. Genetics. 2012 Nov;192(3):1065-93. doi: 10.1534/genetics.112.145037. Epub 2012 Sep 7. PMID: 22960212; PMCID: PMC3522152.
Scroll to Top
Scroll to Top