ABSTRACT
DNA from hundreds of ancient humans remains spanning the last 7000 years has been sequenced over the past few years in the “Southern Arc” region of West Asia and surrounds. Using DNA we here investigate the demographic changes that occurred over the past 2000 years in the Anatolia region as they pertain to the peopling of Anatolia by Armenians and Kurds, and the ethnogenesis of the Turks in Turkey. Using the qpWave3 framework we investigate whether DNA supports claims by Armenian, Kurd and Turk nationalist claiming their respective ethnic groups have inhabited Anatolia for thousands of years. We also shed some light on the ethnogeneses of present-day Armenian, Kurd, and Turk populations.
Using the formal statistical software qpWave3 with 17 ancient diverse Eurasian reference populations (pright), we formally compare hundreds of ancient DNA samples from the “Southern Arc” and surrounds (TARGET populations) to DNA data from present day, Armenians, Kurds, and Turks (STUDY populations). We use qpWave with 2 source populations (pleft) to either accept or reject a null hypothesis, the hypothesis being one extant STUDY population forms a very tight clade with one ancient TARGET population from Anatolia, Armenia, Iraq, Iran, Central Asia, and Russia. Thus we can thereby help establish whether there is evidence of a genetic signature of present-day Armenians, Kurds, and Turks in Anatolia or nearby regions at a particular point in time. We use a widely accepted p-value of 0.05 to accept or reject our null hypothesis, with 0.05 translating to a 5% probability the extant STUDY and ancient TARGET populations form a tight clade.
Our results indicates that from about 2500 years ago to around 400 to 1000 years ago, Anatolia was inhabited by people who significantly resembled Armenians to the exclusion of Kurds and Turks on a DNA basis. Our genetic analyses is thus consistent with an en masse migration of Kurds into Anatolia from Iran over the last 500 years, during the Ottoman occupation of Anatolia. Additionally, this study also indicates that no single ancient TARGET population in Central or West Asia genetically resembled present-day Turks from Turkey. This result is consistent with the relatively recent ethnogenesis of Turks in Turkey from multiple gene pools.
INTRODUCTION
Using framework of the formal statistical software qpWave3 along with a robust set of 17 ancient diverse Eurasian reference populations (pright) to distinguish between slightly differing streams of ancestry, we formally compare hundreds of ancient DNA samples from the “Southern Arc” and surrounds to DNA data from present day, Armenians, Kurds, and Turks. We use qpWave with 2 source populations (pleft) to either accept or reject the null hypothesis, the hypothesis being one STUDY population forms a very tight clade with an ancient TARGET population. We use a widely used p-value of 0.05 to accept or reject the null hypothesis, with 0.05 meaning there is a 5% probability the STUDY and ancient TARGET populations form a tight clade.
With regards to Turks, our genetic analysis reveals that none of the ancient samples from 7000 years to 500 years before present (BP) from the region forms a tight clade with extant Turks. This is somewhat expected and is consistent with historical accounts of a mass migration of today’s ethnic Turks from Central Asia into Anatolia with the Seljuks and Ottoman over the last 1000 years. Once in Anatolia the relatively recent migrants mixed with a larger local Anatolian populations resulting in dilution of the “Turkic” admixture profile.
Our genetic analysis points to a similar situation with Kurds, where Kurds entered Anatolia in large numbers relatively recently, again consistent with historical accounts of Sunni Kurds entering Anatolia from Iran and allying with Turks to fight the Safavids in Iran and the Europeans in Anatolia. Whereas the present-day Turkish admixture profile was mostly a result of invading Oghuz Turkics hybridizing with native Anatolians, genetic evidence indicates the admixture profile of present-day Kurds had already formed inside Iran some 1000 years ago as a result of the various waves Indo-Iranian invaders from in and around the ancient Ariana region of Central Asia hybridizing with the native Iranian population. Some of this mixing undoubtedly occurred prior to the Indo-Iranization of Persia around the time of the Medes about 2800 years ago. Subsequent further hybridization with invading Indo-Iranians such as Parthians and Scythians and Oghuz Turkics completed the genetic profile of present-day Kurds in Iran prior to them entering Anatolia.
The aforementioned genetics evidence is consistent with Central Asian Y-DNA haplogroups R1a-Z93, R1a-Z94, R1a-Z95, and their subclades such as R1a-Z2123 being significant in present-day Kurds, whereas these haplogroups are absent in the ancient samples older than 1500 years old from Iran. This genetics evidence of Central Asian Indo-Iranian ancestry in Kurds also is consistent with Kurds speaking an Indo-Iranian language and prior to Islam practicing Zoroastrianism, both of which have origins in Central Asia.
Of particular interest are two ancient samples, the first, a 2100 year old Kushan era sample from Ksirov, Tajikistan, and a 390 year old Iranian sample from the Ganjdareh Kermanshah area. Surprisingly, Kurds form a genetic clade with both, and are able to be modeled as either 100% 2100 year-old Tajikistan Kushan (p-value 0.54), or 100% 390 year-old Ganjdareh (p-value 0.48), shown in figure1 and table 1. Kurds did not exist in any significant numbers in Anatolia before 400 years ago.
With regards to Armenians, our genetic analysis reveals that ancient Anatolians from 2500 years to 400 years ago were genetically more similar to Armenians than to either Kurds or Turks. Additionally, we find genetic evidence that 2000 year old samples from Armenia, and 2800 year old Uartian samples from Van, Turkey form clades with Armenians (p=0.10 & p=0.06, respectively). Additionally, we find evidence of “Armenian-like” populations in Iran from 400 years ago, and Gaziantep, Turkey from 1000 years ago, and from Armenia from 900 years ago.
We summarize our findings in tables 1 through 6 below, with further details available in figures 3, 4, and 5.
We visualize the genetic similarity between our extant STUDY and TARGET populations using circle sizes as shown in figures 1 and 2.
We also used the formal statistical software qpWave3 with 17 ancient diverse Eurasian reference populations (pright) shown below, to perform genetic relatedness analysis using 2 source populations (pleft) to either accept or reject a null hypothesis (p-value <0.05), the hypothesis being one target population forming a tight clade with one another contemporary population. The results are shown in tables 7-10.
Due to their long and extensive shared population history, unsurprisingly, Kurds and Southern Iranians form a clade (p-value 0.17) where they can be modeled as 100% of each other.
RESULTS & DISCUSSION
Using qpWave we performed pairwise analysis of our extant STUDY populations with our ancient TARGET populations in cladality mode and post the results in figures 3, 4 and 5. The results shown are for the 43 genetically most similar ancient populations to our STUDY populations. Our findings can be summarized as follows
ARMENIANS
- KEY STATISTICS:
- Number of ancient Anatolian populations in top 25 list of ancients for Armenians (figure 3): 8
- Number of “Armenian like” ancient Anatolian populations with p-values > 0.0099: 2
- Number of Central Asian ancient populations in top 25 list for Armenians: 1
- Populations forming a tight clade with Armenians:
- 2000 year old samples from Aghitu, Armenia – p=0.10
- 2800 year old Uartu culture samples from Van, Turkey – p=0.06
- “Armenian-like” ancient populations:
- 390 year old samples from Ganj-Dareh, Iran – p=0.02
- 1000 year old samples from Gaziantep, Turkey – p=0.01
- 900 year old samples from Agarak, Armenia – p=0.01
Our qpWave analysis results for the top 43 ancient populations in terms of genetic similarity with Armenians is summarized in figure 3.
Our DNA analysis does indicate that Armenians or Armenian like populations have inhabited Anatolia over the past 3000 years (figures 1 & 2), and thus does corroborate claims by Armenian nationalists that Armenians have inhabited Anatolia for a long time. A glance at figure 3 shows 8 ancient Anatolian populations in the top 25 list for Armenians. However, the same clearly can not be said for Kurds and Turks.
KURDS
- KEY STATISTICS:
- Number of ancient Anatolian populations in top 25 list of ancients for Kurds (figure 4): 3
- Number of “Kurd like” ancient Anatolian populations with p-values > 0.0099: 0
- Number of Central Asian ancient populations in top 25 list for Kurds: 5
- Populations forming a tight clade with Kurds:
- 2100 year old Kushan sample from Ksirov, Tajikistan – p=0.53
- 390 year old samples from Ganj-Dareh, Iran – p=0.48
- “Kurd-like” ancient populations:
- 2800 year old samples from Hasanlu, Iran – p=0.02
- 2900 year old samples from Dinkha Tepe, Iran – p=0.01
Our qpWave analysis results for the top 43 ancient populations in terms of genetic similarity with Kurds is summarized in figure 4.
A glance at figures 1, 2, and 4 shows that the Kurdish ethnic identity began taking shape east of Armenia in the Iranic geographic sphere some 2900 years ago during the Mede era. Figure 4 shows the 2100 year old Kushan sample from Tajikistan, and the 390 year old sample from Ganj Dareh, Iran have an almost 100% Kurd genetic composition.
The most Kurd like ancient samples from Anatolia are 2800 year old Uartian samples from Van Turkey, and 1000 year old samples from Gaziantap, and 380 year old samples from Mardin. However, the genetic similarity indices for these samples are only 1.8, 1.2, and 1.1, respectively. This lack of the genetic signature of Kurds in Anatolia from 2000 to 400 years ago supports an en masse migration of Kurds to Anatolia from Iran over the past 400 years during the Ottoman occupation of Anatolia. The aforementioned results support ethnogenesis of the present-day Kurdish identity in the Iranic sphere some 2000 years ago during the Parthian era, and a relatively recent massive migration of Kurds westward from Iran into present-day Turkey, Iraq, and Syria.
TURKS OF TURKEY
- KEY STATISTICS:
- Number of ancient Anatolian populations in top 25 list of ancients for Turks (figure 5): 10
- Number of “Kurd like” ancient Anatolian populations with p-values > 0.0099: 0
- Number of Central Asian ancient populations in top 25 list for Turks: 3
- Populations forming a tight clade with Turks:
- None
- “Turk-like” ancient populations:
- None
Our qpWave analysis results for the top 43 ancient populations in terms of genetic similarity with Turks is summarized in figure 5.
Our results also indicate that the genetic signature of present-day Turks from Turkey was absent in Eurasia prior to approximately 500 BP, since we were able to reject as a clade all qpWave combinations of Turk-Ancient Eurasians. However, figure 5 shows 10 Anatolian ancient populations made the top 25 list in terms of genetic similarity to Turks, in contrast to 3 ancient Anatolian populations making the top 25 list for Kurds in figure 4. This is consistent with the ethnogenesis of Turks where Central Asian Turkics hybridized with a greater number of local Anatolian populations to the exclusion of Kurds on a more recent time scale.
METHODS & quality control
We extensively use the formal statistical program qpWave which is a part of the Admixtools package available from https://github.com/DReichLab/AdmixTools. We use the following 17 diverse ancient east and west Eurasian populations references (pright) to robustly differentiate fine streams of ancestry between our STUDY and TARGET populations (pleft). The DNA samples used in this study, except for the Kurdish 30X WGS samples, are publicly available at Reich Lab at the following link; https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/3AR0CD. The Xinjiang samples2 were published in https://www.science.org/doi/10.1126/science.abk1534, and the Iron-Age Uzbekistan samples4 were published in https://academic.oup.com/mbe/article/38/11/4908/6329832
Pright Reference Populations1:
Mbuti
CHG
IRN_Ganj_Dareh_N
ISR_PPNB
MAR_Taforalt_EpiP
SRB_Iron_Gates_HG
ANE
TUR_Marmara_Barcın_N
WHG
RUS_Yana_RHS
MNG_Khovsgol_LBA
RUS_DevilsCave_N
RUS_Shamanka_EN
Loschbour_WHG
UZB_Sappali_Tepe_BA
RUS_Siberia_Lena_EBA
We can independently test each present population with one ancient population of to determine whether the form a clade. For example, we can test whether Armenians form a clade with an ancient from Van, Turkey by setting pleft to Armenian and Van_Ancient , using 17 diverse ancient east and west Eurasian populations as pright references to robustly differentiate fine streams of ancestry between Armenians and Van_Ancient.
In this manner qpWave outputs a p-value associated with the “Null Hypothesis”, with the hypothesis being there is no significant genetic difference between Armenians and Van_Ancient, and any observed difference is due to sampling or experimental error. We use the generally used p-value of 0.05 as the cutoff, where 0.05 indicates a 5% probability of there being “null” or no genetic difference between Armenians and Van_Ancient. In other words, we reject the null hypothesis when p is less than 0.05. Alternatively stated we would conclude that the Van_Ancient samples are not genetically the same as the present-day Armenian samples.
The “Genetic Similarity” values shown in figures 3,4, and 5 were simply obtained as follows; (1/ChiSq) x 100.
Our analysis used published data1 genotyped on the 1240K Axiom Affymatrix array. The present-day populations consist of the Simons WGS set. All the available Iranian, Armenian and Turkish samples in the Reichlab dataset were used in this study. As an additional quality control measure we removed samples with less than 600,000 intersecting SNPs with the 1240K set. Additionally, for Kurds-IRQ we used two whole genome sequenced males from northern Iraq sequenced to a depth of 30X. We processed the fastq sequences using our own samtools pipeline which consists of the following:
- AdapterRemoval to remove remnant adapter sequences from High-Throughput Sequencing (HTS) data and trim low quality bases from the 3′ end of reads following adapter removal. Additionally, we removed sequences shorter than 24 bases which due to their short length may map to several regions of the reference genome
- bwa mem to quickly align the sequences to the GRCh37 Human Reference and piped the output to samtools view to produce BAM files. The reads were merged post-alignment instead of with AdapterRemoval
- samtools sort to coordinate sort BAM file
- Picard to soft-clip beyond-end-of-reference alignments and set MAPQ to 0 for unmapped reads
- Picard to remove duplicate sequences
- bcftools mpileup to convert BAM file into genomic positions we first use mpileup to produce a BCF file that contains all of the locations in the genome. We use this information to call genotypes and reduce our list of sites to those found to be variant by passing this file into bcftools call. Thresholds used were –min-MQ 15, –min-BQ 20
- bcftools view to remove positions supported by less than 4 reads
- This yielded VCF files with 1478 million bases
- Extraction of Affymetrix 1240K variants yielded a final VCF of 1160K variants
Data Availability
The 1240K genotype data1,2,4 used in this study is readily available at Reich Lab at https://reich.hms.harvard.edu/datasets.
REFERENCES
- The genetic history of the Southern Arc: A bridge between West Asia and Europe, Lazaridis et al, American Association for the Advancement of Science, doi: 10.1126/science.abm4247, https://doi.org/10.1126/science.abm4247, 2022/09/28
- Bronze and Iron Age population movements underlie Xinjiang population history, Kumar et al, Science 376, 62 (2022), DOI: 10.1126/science.abk1534
- Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, Genschoreck T, Webster T, Reich D. Ancient admixture in human history. Genetics. 2012 Nov;192(3):1065-93. doi: 10.1534/genetics.112.145037. Epub 2012 Sep 7. PMID: 22960212; PMCID: PMC3522152.
- Genetic Continuity of Bronze Age Ancestry with Increased Steppe-Related Ancestry in Late Iron Age Uzbekistan, Kumar et al, Molecular Biology and Evolution, Volume 38, Issue 11, November 2021, Pages 4908–4917, https://doi.org/10.1093/molbev/msab216