The PEOPLING OF ANATOLIA OVER THE PAST 2000 YEARS

ABSTRACT

DNA from hundreds of ancient humans remains spanning the last 7000 years has been sequenced over the past few years in the “Southern Arc” region of West Asia and surrounds. Using DNA we here investigate the demographic changes that occurred over the past 2000 years in the Anatolia region as they pertain to the peopling of Anatolia by Armenians and Kurds, and the ethnogenesis of the Turks in Turkey. Using the qpWave3 framework we investigate whether DNA supports claims by Armenian, Kurd and Turk nationalist claiming their respective ethnic groups have inhabited Anatolia for thousands of years. We also shed some light on the ethnogeneses of present-day Armenian, Kurd, and Turk populations.

Using the formal statistical software qpWave3 with 17 ancient diverse Eurasian reference populations (pright), we formally compare hundreds of ancient DNA samples from the “Southern Arc” and surrounds (TARGET populations) to DNA data from present day, Armenians, Kurds, and Turks (STUDY populations). We use qpWave with 2 source populations (pleft) to either accept or reject a null hypothesis, the hypothesis being one extant STUDY population forms a very tight clade with one ancient TARGET population from Anatolia, Armenia, Iraq, Iran, Central Asia, and Russia. Thus we can thereby help establish whether there is evidence of a genetic signature of present-day Armenians, Kurds, and Turks in Anatolia or nearby regions at a particular point in time. We use a widely accepted p-value of 0.05 to accept or reject our null hypothesis, with 0.05 translating to a 5% probability the extant STUDY and ancient TARGET populations form a tight clade.

Our results indicates that from about 2500 years ago to around 400 to 1000 years ago, Anatolia was inhabited by people who significantly resembled Armenians to the exclusion of Kurds and Turks on a DNA basis. Our genetic analyses is thus consistent with an en masse migration of Kurds into Anatolia from Iran over the last 500 years, during the Ottoman occupation of Anatolia. Additionally, this study also indicates that no single ancient TARGET population in Central or West Asia genetically resembled present-day Turks from Turkey. This result is consistent with the relatively recent ethnogenesis of Turks in Turkey from multiple gene pools.

INTRODUCTION

Using framework of the formal statistical software qpWave3 along with a robust set of 17 ancient diverse Eurasian reference populations (pright) to distinguish between slightly differing streams of ancestry, we formally compare hundreds of ancient DNA samples from the “Southern Arc” and surrounds to DNA data from present day, Armenians, Kurds, and Turks. We use qpWave with 2 source populations (pleft) to either accept or reject the null hypothesis, the hypothesis being one STUDY population forms a very tight clade with an ancient TARGET population. We use a widely used p-value of 0.05 to accept or reject the null hypothesis, with 0.05 meaning there is a 5% probability the STUDY and ancient TARGET populations form a tight clade.

With regards to Turks, our genetic analysis reveals that none of the ancient samples from 7000 years to 500 years before present (BP) from the region forms a tight clade with extant Turks. This is somewhat expected and is consistent with historical accounts of a mass migration of today’s ethnic Turks from Central Asia into Anatolia with the Seljuks and Ottoman over the last 1000 years. Once in Anatolia the relatively recent migrants mixed with a larger local Anatolian populations resulting in dilution of the “Turkic” admixture profile.

Our genetic analysis points to a similar situation with Kurds, where Kurds entered Anatolia in large numbers relatively recently, again consistent with historical accounts of Sunni Kurds entering Anatolia from Iran and allying with Turks to fight the Safavids in Iran and the Europeans in Anatolia. Whereas the present-day Turkish admixture profile was mostly a result of invading Oghuz Turkics hybridizing with native Anatolians, genetic evidence indicates the admixture profile of present-day Kurds had already formed inside Iran some 1000 years ago as a result of the various waves Indo-Iranian invaders from in and around the ancient Ariana region of Central Asia hybridizing with the native Iranian population. Some of this mixing undoubtedly occurred prior to the Indo-Iranization of Persia around the time of the Medes about 2800 years ago. Subsequent further hybridization with invading Indo-Iranians such as Parthians and Scythians and Oghuz Turkics completed the genetic profile of present-day Kurds in Iran prior to them entering Anatolia.

Map 1 – A 700 year old map based on the traveler Marco Polo outlining “Curdistan” in western Iran. This is consistent with other historical accounts and aligns with our genetics analysis indicating large scale migrations/invasions of Kurds westward into Iraq and Anatolia had not yet commenced. “Turcomania” and “L Armenia” are shown in Anatolia. This again aligns with our analysis results showing higher genetic relatedness between the ancient Anatolian samples and Armenians than with either Kurds or Turks.

The aforementioned genetics evidence is consistent with Central Asian Y-DNA haplogroups R1a-Z93, R1a-Z94, R1a-Z95, and their subclades such as R1a-Z2123 being significant in present-day Kurds, whereas these haplogroups are absent in the ancient samples older than 1500 years old from Iran. This genetics evidence of Central Asian Indo-Iranian ancestry in Kurds also is consistent with Kurds speaking an Indo-Iranian language and prior to Islam practicing Zoroastrianism, both of which have origins in Central Asia.

Of particular interest are two ancient samples, the first, a 2100 year old Kushan era sample from Ksirov, Tajikistan, and a 390 year old Iranian sample from the Ganjdareh Kermanshah area. Surprisingly, Kurds form a genetic clade with both, and are able to be modeled as either 100% 2100 year-old Tajikistan Kushan (p-value 0.54), or 100% 390 year-old Ganjdareh (p-value 0.48), shown in figure1 and table 1. Kurds did not exist in any significant numbers in Anatolia before 400 years ago.

Table 1 – Genetic relatedness (cladeliness) of various whole genome Eurasian populations to the 2100 year old Kushan Era sample from Tajikistan using QpWave and a diverse set of 16 pright reference populations shown under “Methods” below to distinguish between related populations. For the top 3 populations the p-values are >0.05 indicating the Kushan sample forms a clade with them. The higher p-values indicate we have a lower probability of being able to reject the null-hypothesis, which in this case is no statistical difference between Tajikistan Kushan and the Eurasian population tested.

With regards to Armenians, our genetic analysis reveals that ancient Anatolians from 2500 years to 400 years ago were genetically more similar to Armenians than to either Kurds or Turks. Additionally, we find genetic evidence that 2000 year old samples from Armenia, and 2800 year old Uartian samples from Van, Turkey form clades with Armenians (p=0.10 & p=0.06, respectively). Additionally, we find evidence of “Armenian-like” populations in Iran from 400 years ago, and Gaziantep, Turkey from 1000 years ago, and from Armenia from 900 years ago.

We summarize our findings in tables 1 through 6 below, with further details available in figures 3, 4, and 5.

Table 2 – Genetic relatedness of ancient samples from Turkey to Kurds, Armenians, and Turks using whole genomes and our framework of formal statistical software qpWave3 along with a robust set of 17 ancient diverse Eurasian reference populations (pright) to distinguish between slightly differing streams of ancestry. It’s readily apparent that the ancient samples from Turkey genetically resemble Armenians to the exclusion of Kurds and Turks. These results indicate that Kurds and Turks are relative newcomers to Anatolia consistent with historical accounts showing Kurds and Turks arriving en masse with the the Seljuks and Ottoman during the last 1000 years, and Armenian like populations having an older presence in Anatolia
Table 3 Although not surprising to see ancient Central Asian Indo-Iranians being more closely related to Kurds than to either Armenians or Turks, it’s surprising to see their very high degree of genetic relatedness to the 2100 year old Iron-Age Indo-Iranian Kushan sample from Tajikistan, as compared to the other ancient Indo-Iranians . This relatedness is so high that present-day Kurds can be modelled as 100% Tajikistan Kushan (p-value 0.53. See figure 4). This is very much to the exclusion of Armenians and Turks. This indicates a differential genetic relationship of Kurds with certain Central Asian ancient Indo-Iranian populations

Table 4 – Genetic relatedness of ancient samples from Iran to Kurds, Armenians, and Turks using whole genomes and our framework of formal statistical software qpWave3 along with a robust set of 17 ancient diverse Eurasian reference populations (pright) to distinguish between slightly differing streams of ancestry. It’s interesting to see that pre Indo-Iranian ancient Western Iranians until 3500 years ago show an equal genetic relatedness to both Armenians and Kurds, whereas the early Mede era 2800 & 2900 year old samples clearly favor Kurds indicating that the Indo-Iranianization of western Iran had commenced around 2900 years ago, and by 390 years ago present-day Kurds can be modelled as 100% GanjDareh (Kermanshah area, p-value 0.48, figure 4)

Table 5 – Genetic relatedness of ancient samples from Iraq to Kurds, Armenians, and Turks using whole genomes and our framework of formal statistical software qpWave3 along with a robust set of 17 ancient diverse Eurasian reference populations (pright) to distinguish between slightly differing streams of ancestry. The northern Iraq Nemrik sample shows the highest relatedness to present-day Armenians and an equally lower relatedness to both Kurds and Turks. This relatedness to all 3 present-day ethnic groups is not high enough to conclude that any of these 3 groups were present in northern Iraq 3300 years ago. Unfortunately, we did not possess any whole genome Assyrian samples to perform the calculation

Table 6 – Genetic relatedness of ancient samples from Armenia to contemporary Kurds, Armenians, and Turks using whole genomes and our framework of formal statistical software qpWave3 along with a robust set of 17 ancient diverse Eurasian reference populations (pright) to distinguish between slightly differing streams of ancestry. The ancient populations of Armenia are genetically clearly more Armenian like than either Kurd or Turk like

We visualize the genetic similarity between our extant STUDY and TARGET populations using circle sizes as shown in figures 1 and 2.

Figure 1 – Each ancient TARGET population is individually compared with on of our 3 STUDY populations. The size of the circle indicates the degree of genetic similarity between the TARGET and STUDY population. For perspective, if we were to compare 2 individuals from the same population, for example a Kurd with another Kurd the expected genetic similarity would be between 7 and 8. Anatolians from 400 to 1000 years ago are genetically significantly more similar to Armenians than to either Kurds or Turks. The genetic signature of present-day Kurds was not found in Anatolia from 2400 to 400 years BP inspite of the availability of 230 ancient DNA samples spanning space and time in Anatolia.

Figure 2 – For the period 7000 to 2600 years BP the genetic similarity of all 3 of our STUDY populations with ancient populations is overall lower than in figure 1 which depicts a more recent time scale. This of course is to be expected since the regional ancient populations from the Chalcolithic to the the early Iron Age form only a fraction of the genetic composition of present-day Armenians, Kurds and Turks.

We also used the formal statistical software qpWave3 with 17 ancient diverse Eurasian reference populations (pright) shown below, to perform genetic relatedness analysis using 2 source populations (pleft) to either accept or reject a null hypothesis (p-value <0.05), the hypothesis being one target population forming a tight clade with one another contemporary population. The results are shown in tables 7-10.

Due to their long and extensive shared population history, unsurprisingly, Kurds and Southern Iranians form a clade (p-value 0.17) where they can be modeled as 100% of each other.

Table 7 – Genetic relatedness between Kurds and other Eurasian populations. Unsurprisingly, Southern Iranians top the list. However, it’s readily apparent that Turks as well as other populations of the Caucasus also share high relatedness with Kurds indicating substantial shared demography between them.
Table 8 – Genetic relatedness between Turks and other Eurasian populations. Somewhat unsurprisingly, Kurds top the list due to the proximity of their Oghuz and Indo-Iranian ancestors to each other for the last 1500 years, starting in Central Asia all the way to West Asia, followed by various ethnic groups from the Caucasus and Iranians.
Table 9 – Genetic relatedness between Southern Iranians and other Eurasian populations. Unsurprisingly, Kurds top the list by a wide margin followed by Turks and Caucasians. The main difference between the demography of Persians and Kurds shown in Table 7 is that in addition to Kurds showing very high genetic relatedness to Persians, Kurds also show high relatedness to Turks and Caucasians indicating a greater history of admixture between ancestors of Kurds and Turks/Caucasians vs the ancestors of Persians and Turks/Caucasians

RESULTS & DISCUSSION

Using qpWave we performed pairwise analysis of our extant STUDY populations with our ancient TARGET populations in cladality mode and post the results in figures 3, 4 and 5. The results shown are for the 43 genetically most similar ancient populations to our STUDY populations. Our findings can be summarized as follows

ARMENIANS

  • KEY STATISTICS:
    • Number of ancient Anatolian populations in top 25 list of ancients for Armenians (figure 3): 8
    • Number of “Armenian like” ancient Anatolian populations with p-values > 0.0099: 2
    • Number of Central Asian ancient populations in top 25 list for Armenians: 1
    • Populations forming a tight clade with Armenians:
      • 2000 year old samples from Aghitu, Armenia – p=0.10
      • 2800 year old Uartu culture samples from Van, Turkey – p=0.06
    • “Armenian-like” ancient populations:
      • 390 year old samples from Ganj-Dareh, Iran – p=0.02
      • 1000 year old samples from Gaziantep, Turkey – p=0.01
      • 900 year old samples from Agarak, Armenia – p=0.01

Our qpWave analysis results for the top 43 ancient populations in terms of genetic similarity with Armenians is summarized in figure 3.

Figure 3 – Top 43 populations with genetic similarity to Armenians based on qpWave outputs. Populations that form clades with Armenians are bold highlighted. Although populations 3 thru 5 are below our threshold p-value of 0.05, they are highlighted because they are slightly below our threshold p-value and can thus be considered “Armenian like”

Our DNA analysis does indicate that Armenians or Armenian like populations have inhabited Anatolia over the past 3000 years (figures 1 & 2), and thus does corroborate claims by Armenian nationalists that Armenians have inhabited Anatolia for a long time. A glance at figure 3 shows 8 ancient Anatolian populations in the top 25 list for Armenians. However, the same clearly can not be said for Kurds and Turks.

KURDS

  • KEY STATISTICS:
    • Number of ancient Anatolian populations in top 25 list of ancients for Kurds (figure 4): 3
    • Number of “Kurd like” ancient Anatolian populations with p-values > 0.0099: 0
    • Number of Central Asian ancient populations in top 25 list for Kurds: 5
    • Populations forming a tight clade with Kurds:
      • 2100 year old Kushan sample from Ksirov, Tajikistan – p=0.53
      • 390 year old samples from Ganj-Dareh, Iran – p=0.48
    • Kurd-like” ancient populations:
      • 2800 year old samples from Hasanlu, Iran – p=0.02
      • 2900 year old samples from Dinkha Tepe, Iran – p=0.01

Our qpWave analysis results for the top 43 ancient populations in terms of genetic similarity with Kurds is summarized in figure 4.

Figure 4 – Top 43 ancient populations genetically similar to Kurds. It is quickly apparent that Kurds form clades with ancient Iranians and Tajiks. With genetic similarity indices of 7.1 and 6.8, the 2100 year old Kushan sample from Tajikistan, and the 390 year old from Ganj-Dareh Iran are practically Kurd. It appears that the Kurdish genetic composition began taking shape in the Mede time-frame as evidenced by the 2800 year old Iranian Hasanlu population, and 2900 year old Iranian Dinkha Tepe population. Although they don’t form tight clades with Kurds and can not be considered Kurds, they are quite Kurd like. In contrast with Armenians which have 1 Central Asian population in the top 25, Kurds have 5 Central Asian populations in the top 25.

A glance at figures 1, 2, and 4 shows that the Kurdish ethnic identity began taking shape east of Armenia in the Iranic geographic sphere some 2900 years ago during the Mede era. Figure 4 shows the 2100 year old Kushan sample from Tajikistan, and the 390 year old sample from Ganj Dareh, Iran have an almost 100% Kurd genetic composition.

The most Kurd like ancient samples from Anatolia are 2800 year old Uartian samples from Van Turkey, and 1000 year old samples from Gaziantap, and 380 year old samples from Mardin. However, the genetic similarity indices for these samples are only 1.8, 1.2, and 1.1, respectively. This lack of the genetic signature of Kurds in Anatolia from 2000 to 400 years ago supports an en masse migration of Kurds to Anatolia from Iran over the past 400 years during the Ottoman occupation of Anatolia. The aforementioned results support ethnogenesis of the present-day Kurdish identity in the Iranic sphere some 2000 years ago during the Parthian era, and a relatively recent massive migration of Kurds westward from Iran into present-day Turkey, Iraq, and Syria.

TURKS OF TURKEY

  • KEY STATISTICS:
    • Number of ancient Anatolian populations in top 25 list of ancients for Turks (figure 5): 10
    • Number of “Kurd like” ancient Anatolian populations with p-values > 0.0099: 0
    • Number of Central Asian ancient populations in top 25 list for Turks: 3
    • Populations forming a tight clade with Turks:
      • None
    • “Turk-like” ancient populations:
      • None

Our qpWave analysis results for the top 43 ancient populations in terms of genetic similarity with Turks is summarized in figure 5.

Figure 5 – Top 43 ancient populations genetically similar to Turks. It is readily apparent that no single ancient TARGET population forms a clade with present-day Turks indicating ethnogenesis of the identity of Turkey Turks is a relatively recent event. Interestingly, although various ancient Anatolians form weaker clades with Turks than with Kurds individually, there are 10 ancient Anatolian populations in the top 25 list for Turks versus 3 for Kurds. This indicates that collectively more ancient Anatolian populations contributed to the Turk gene pool than the Kurd gene pool. This of course is consistent with Central Asian Turkic populations hybridizing with various local Anatolian populations to form present-day Turks.

Our results also indicate that the genetic signature of present-day Turks from Turkey was absent in Eurasia prior to approximately 500 BP, since we were able to reject as a clade all qpWave combinations of Turk-Ancient Eurasians. However, figure 5 shows 10 Anatolian ancient populations made the top 25 list in terms of genetic similarity to Turks, in contrast to 3 ancient Anatolian populations making the top 25 list for Kurds in figure 4. This is consistent with the ethnogenesis of Turks where Central Asian Turkics hybridized with a greater number of local Anatolian populations to the exclusion of Kurds on a more recent time scale.

METHODS & quality control

We extensively use the formal statistical program qpWave which is a part of the Admixtools package available from https://github.com/DReichLab/AdmixTools. We use the following 17 diverse ancient east and west Eurasian populations references (pright) to robustly differentiate fine streams of ancestry between our STUDY and TARGET populations (pleft). The DNA samples used in this study, except for the Kurdish 30X WGS samples, are publicly available at Reich Lab at the following link; https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/3AR0CD. The Xinjiang samples2 were published in https://www.science.org/doi/10.1126/science.abk1534, and the Iron-Age Uzbekistan samples4 were published in https://academic.oup.com/mbe/article/38/11/4908/6329832

Pright Reference Populations1:

Mbuti
CHG
IRN_Ganj_Dareh_N
ISR_PPNB
MAR_Taforalt_EpiP
SRB_Iron_Gates_HG
ANE
TUR_Marmara_Barcın_N
WHG
RUS_Yana_RHS
MNG_Khovsgol_LBA
RUS_DevilsCave_N
RUS_Shamanka_EN
Loschbour_WHG
UZB_Sappali_Tepe_BA
RUS_Siberia_Lena_EBA

We can independently test each present population with one ancient population of to determine whether the form a clade. For example, we can test whether Armenians form a clade with an ancient from Van, Turkey by setting pleft to Armenian and Van_Ancient , using 17 diverse ancient east and west Eurasian populations as pright references to robustly differentiate fine streams of ancestry between Armenians and Van_Ancient.

In this manner qpWave outputs a p-value associated with the “Null Hypothesis”, with the hypothesis being there is no significant genetic difference between Armenians and Van_Ancient, and any observed difference is due to sampling or experimental error. We use the generally used p-value of 0.05 as the cutoff, where 0.05 indicates a 5% probability of there being “null” or no genetic difference between Armenians and Van_Ancient. In other words, we reject the null hypothesis when p is less than 0.05. Alternatively stated we would conclude that the Van_Ancient samples are not genetically the same as the present-day Armenian samples.

The “Genetic Similarity” values shown in figures 3,4, and 5 were simply obtained as follows; (1/ChiSq) x 100.

Our analysis used published data1 genotyped on the 1240K Axiom Affymatrix array. The present-day populations consist of the Simons WGS set. All the available Iranian, Armenian and Turkish samples in the Reichlab dataset were used in this study. As an additional quality control measure we removed samples with less than 600,000 intersecting SNPs with the 1240K set. Additionally, for Kurds-IRQ we used two whole genome sequenced males from northern Iraq sequenced to a depth of 30X. We processed the fastq sequences using our own samtools pipeline which consists of the following:

  1. AdapterRemoval to remove remnant adapter sequences from High-Throughput Sequencing (HTS) data and trim low quality bases from the 3′ end of reads following adapter removal. Additionally, we removed sequences shorter than 24 bases which due to their short length may map to several regions of the reference genome
  2. bwa mem to quickly align the sequences to the GRCh37 Human Reference and piped the output to samtools view to produce BAM files. The reads were merged post-alignment instead of with AdapterRemoval
  3. samtools sort to coordinate sort BAM file
  4. Picard to soft-clip beyond-end-of-reference alignments and set MAPQ to 0 for unmapped reads
  5. Picard to remove duplicate sequences
  6. bcftools mpileup to convert BAM file into genomic positions we first use mpileup to produce a BCF file that contains all of the locations in the genome. We use this information to call genotypes and reduce our list of sites to those found to be variant by passing this file into bcftools call. Thresholds used were –min-MQ 15, –min-BQ 20
  7. bcftools view to remove positions supported by less than 4 reads
  8. This yielded VCF files with 1478 million bases
  9. Extraction of Affymetrix 1240K variants yielded a final VCF of 1160K variants

Data Availability

The 1240K genotype data1,2,4 used in this study is readily available at Reich Lab at https://reich.hms.harvard.edu/datasets.

REFERENCES

  1. The genetic history of the Southern Arc: A bridge between West Asia and Europe, Lazaridis et al, American Association for the Advancement of Science, doi: 10.1126/science.abm4247, https://doi.org/10.1126/science.abm4247, 2022/09/28
  2. Bronze and Iron Age population movements underlie Xinjiang population history, Kumar et al, Science 376, 62 (2022), DOI: 10.1126/science.abk1534
  3. Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, Genschoreck T, Webster T, Reich D. Ancient admixture in human history. Genetics. 2012 Nov;192(3):1065-93. doi: 10.1534/genetics.112.145037. Epub 2012 Sep 7. PMID: 22960212; PMCID: PMC3522152.
  4. Genetic Continuity of Bronze Age Ancestry with Increased Steppe-Related Ancestry in Late Iron Age Uzbekistan, Kumar et al, Molecular Biology and Evolution, Volume 38, Issue 11, November 2021, Pages 4908–4917, https://doi.org/10.1093/molbev/msab216
Scroll to Top
Scroll to Top