The PEOPLING OF ANATOLIA OVER THE PAST 2000 YEARS

ABSTRACT

DNA from hundreds of ancient humans remains spanning the last 7000 years has been sequenced over the past few years in the “Southern Arc” region of West Asia and surrounds. Using DNA we here investigate the demographic changes that occurred over the past 2000 years in the Anatolia region as they pertain to the peopling of Anatolia by Armenians and Kurds, and the ethnogenesis of the Turks in Turkey. Using the qpWave3 framework we investigate whether DNA supports claims by Armenian, Kurd and Turk nationalist claiming their respective ethnic groups have inhabited Anatolia for thousands of years. We also shed some light on the ethnogeneses of present-day Armenian, Kurd, and Turk populations.

Using the formal statistical software qpWave3 with 17 ancient diverse Eurasian reference populations (pright), we formally compare hundreds of ancient DNA samples from the “Southern Arc” and surrounds (TARGET populations) to DNA data from present day, Armenians, Kurds, and Turks (STUDY populations). We use qpWave with 2 source populations (pleft) to either accept or reject a null hypothesis, the hypothesis being one extant STUDY population forms a very tight clade with one ancient TARGET population from Anatolia, Armenia, Iraq, Iran, Central Asia, and Russia. Thus we can thereby help establish whether there is evidence of a genetic signature of present-day Armenians, Kurds, and Turks in Anatolia or nearby regions at a particular point in time. We use a widely accepted p-value of 0.05 to accept or reject our null hypothesis, with 0.05 translating to a 5% probability the extant STUDY and ancient TARGET populations form a tight clade.

Our results indicates that from about 2500 years ago to 400 years ago, Anatolia was inhabited by people who significantly resembled Armenians to the exclusion of Kurds and Turks on a DNA basis. Our genetic analyses is thus consistent with an en masse migration of Kurds into Anatolia from Iran over the last 400 years, during the Ottoman occupation of Anatolia. Additionally, this study also indicates that no single ancient TARGET population in Central or West Asia genetically resembled present-day Turks from Turkey. This result is consistent with the relatively recent ethnogenesis of Turks in Turkey from multiple gene pools.

INTRODUCTION

Using framework of the formal statistical software qpWave3 along with a robust set of 17 ancient diverse Eurasian reference populations (pright) to distinguish between slightly differing streams of ancestry, we formally compare hundreds of ancient DNA samples from the “Southern Arc” and surrounds to DNA data from present day, Armenians, Kurds, and Turks. We use qpWave with 2 source populations (pleft) to either accept or reject the null hypothesis, the hypothesis being one STUDY population forms a very tight clade with an ancient TARGET population. We use a widely used p-value of 0.05 to accept or reject the null hypothesis, with 0.05 meaning there is a 5% probability the STUDY and ancient TARGET populations form a tight clade.

With regards to Turks, our genetic analysis reveals that none of the ancient samples from 7000 years to 500 years before present (BP) from the region forms a tight clade with extant Turks. This is somewhat expected and basically implies that the genetic composition of Turks from Turkey did not exist prior to 500 years ago. This also implies that Turks formed when Turkic tribes from Central Asia hybridized with various local Anatolian populations relatively recently.

With regards to Kurds, our genetic analysis reveals that although the present-day genetic composition of Kurds existed 400 years ago in Ganj Dareh, Iran (p=0.48), and in Ksirov, Tajikistan 2100 years ago (p=0.54), Kurds did not exist in any significant numbers in Anatolia before 400 years ago. This implies that migrations of Kurds en masse west from Iran to Anatolia occurred with the Ottomen. However, we also find genetic evidence of “Kurd-like” populations in Hasanlu, Iran going back 2800 years, and to a lesser extent and surprisingly, in the 2500 year old Uartian culture of Armenia.

With regards to Armenians, our genetic analysis reveals that ancient Anatolians from 2500 years to 400 years ago were genetically more similar to Armenians than to either Kurds or Turks. Additionally, we find genetic evidence that 2000 year old samples from Armenia, and 2800 year old Uartian samples from Van, Turkey form clades with Armenians (p=0.10 & p=0.06, respectively). Additionally, we find evidence of “Armenian-like” populations in Iran from 400 years ago, and Gaziantep, Turkey from 1000 years ago, and from Armenia from 900 years ago.

We visualize the genetic similarity between our extant STUDY and TARGET populations using circle sizes as shown in figures 1 and 2.

Figure 1 – Each ancient TARGET population is individually compared with on of our 3 STUDY populations. The size of the circle indicates the degree of genetic similarity between the TARGET and STUDY population. For perspective, if we were to compare 2 individuals from the same population, for example a Kurd with another Kurd the expected genetic similarity would be between 7 and 8. Anatolians from 400 to 1000 years ago are genetically significantly more similar to Armenians than to either Kurds or Turks. The genetic signature of present-day Kurds was not found in Anatolia from 2400 to 400 years BP inspite of the availability of 230 ancient DNA samples spanning space and time in Anatolia.

Figure 2 – For the period 7000 to 2600 years BP the genetic similarity of all 3 of our STUDY populations with ancient populations is overall lower than in figure 1 which depicts a more recent time scale. This of course is to be expected since the regional ancient populations from the Chalcolithic to the the early Iron Age form only a fraction of the genetic composition of present-day Armenians, Kurds and Turks.

RESULTS & DISCUSSION

Using qpWave we performed pairwise analysis of our extant STUDY populations with our ancient TARGET populations in cladality mode and post the results in figures 3, 4 and 5. The results shown are for the 43 genetically most similar ancient populations to our STUDY populations. Our findings can be summarized as follows

ARMENIANS

  • KEY STATISTICS:
    • Number of ancient Anatolian populations in top 25 list of ancients for Armenians (figure 3): 8
    • Number of “Armenian like” ancient Anatolian populations with p-values > 0.0099: 2
    • Number of Central Asian ancient populations in top 25 list for Armenians: 1
    • Populations forming a tight clade with Armenians:
      • 2000 year old samples from Aghitu, Armenia – p=0.10
      • 2800 year old Uartu culture samples from Van, Turkey – p=0.06
    • “Armenian-like” ancient populations:
      • 390 year old samples from Ganj-Dareh, Iran – p=0.02
      • 1000 year old samples from Gaziantep, Turkey – p=0.01
      • 900 year old samples from Agarak, Armenia – p=0.01

Our qpWave analysis results for the top 43 ancient populations in terms of genetic similarity with Armenians is summarized in figure 3.

Figure 3 – Top 43 populations with genetic similarity to Armenians based on qpWave outputs. Populations that form clades with Armenians are bold highlighted. Although populations 3 thru 5 are below our threshold p-value of 0.05, they are highlighted because they are slightly below our threshold p-value and can thus be considered “Armenian like”

Our DNA analysis does indicate that Armenians or Armenian like populations have inhabited Anatolia over the past 3000 years (figures 1 & 2), and thus does corroborate claims by Armenian nationalists that Armenians have inhabited Anatolia for a long time. A glance at figure 3 shows 8 ancient Anatolian populations in the top 25 list for Armenians. However, the same clearly can not be said for Kurds and Turks.

KURDS

  • KEY STATISTICS:
    • Number of ancient Anatolian populations in top 25 list of ancients for Kurds (figure 4): 3
    • Number of “Kurd like” ancient Anatolian populations with p-values > 0.0099: 0
    • Number of Central Asian ancient populations in top 25 list for Kurds: 5
    • Populations forming a tight clade with Kurds:
      • 2100 year old Kushan sample from Ksirov, Tajikistan – p=0.53
      • 390 year old samples from Ganj-Dareh, Iran – p=0.48
    • Kurd-like” ancient populations:
      • 2800 year old samples from Hasanlu, Iran – p=0.02
      • 2900 year old samples from Dinkha Tepe, Iran – p=0.01

Our qpWave analysis results for the top 43 ancient populations in terms of genetic similarity with Kurds is summarized in figure 4.

Figure 4 – Top 43 ancient populations genetically similar to Kurds. It is quickly apparent that Kurds form clades with ancient Iranians and Tajiks. With genetic similarity indices of 7.1 and 6.8, the 2100 year old Kushan sample from Tajikistan, and the 390 year old from Ganj-Dareh Iran are practically Kurd. It appears that the Kurdish genetic composition began taking shape in the Mede time-frame as evidenced by the 2800 year old Iranian Hasanlu population, and 2900 year old Iranian Dinkha Tepe population. Although they don’t form tight clades with Kurds and can not be considered Kurds, they are quite Kurd like. In contrast with Armenians which have 1 Central Asian population in the top 25, Kurds have 5 Central Asian populations in the top 25.

A glance at figures 1, 2, and 4 shows that the Kurdish ethnic identity began taking shape east of Armenia in the Iranic geographic sphere some 2900 years ago during the Mede era. Figure 4 shows the 2100 year old Kushan sample from Tajikistan, and the 390 year old sample from Ganj Dareh, Iran have an almost 100% Kurd genetic composition.

The most Kurd like ancient samples from Anatolia are 2800 year old Uartian samples from Van Turkey, and 1000 year old samples from Gaziantap, and 380 year old samples from Mardin. However, the genetic similarity indices for these samples are only 1.8, 1.2, and 1.1, respectively. This lack of the genetic signature of Kurds in Anatolia from 2000 to 400 years ago supports an en masse migration of Kurds to Anatolia from Iran over the past 400 years during the Ottoman occupation of Anatolia. The aforementioned results support ethnogenesis of the present-day Kurdish identity in the Iranic sphere some 2000 years ago during the Parthian era, and a relatively recent massive migration of Kurds westward from Iran into present-day Turkey, Iraq, and Syria.

TURKS OF TURKEY

  • KEY STATISTICS:
    • Number of ancient Anatolian populations in top 25 list of ancients for Turks (figure 5): 10
    • Number of “Kurd like” ancient Anatolian populations with p-values > 0.0099: 0
    • Number of Central Asian ancient populations in top 25 list for Turks: 3
    • Populations forming a tight clade with Turks:
      • None
    • “Turk-like” ancient populations:
      • None

Our qpWave analysis results for the top 43 ancient populations in terms of genetic similarity with Turks is summarized in figure 5.

Figure 5 – Top 43 ancient populations genetically similar to Turks. It is readily apparent that no single ancient TARGET population forms a clade with present-day Turks indicating ethnogenesis of the identity of Turkey Turks is a relatively recent event. Interestingly, although various ancient Anatolians form weaker clades with Turks than with Kurds individually, there are 10 ancient Anatolian populations in the top 25 list for Turks versus 3 for Kurds. This indicates that collectively more ancient Anatolian populations contributed to the Turk gene pool than the Kurd gene pool. This of course is consistent with Central Asian Turkic populations hybridizing with various local Anatolian populations to form present-day Turks.

Our results also indicate that the genetic signature of present-day Turks from Turkey was absent in Eurasia prior to approximately 500 BP, since we were able to reject as a clade all qpWave combinations of Turk-Ancient Eurasians. However, figure 5 shows 10 Anatolian ancient populations made the top 25 list in terms of genetic similarity to Turks, in contrast to 3 ancient Anatolian populations making the top 25 list for Kurds in figure 4. This is consistent with the ethnogenesis of Turks where Central Asian Turkics hybridized with a greater number of local Anatolian populations to the exclusion of Kurds on a more recent time scale.

METHODS & quality control

We extensively use the formal statistical program qpWave which is a part of the Admixtools package available from https://github.com/DReichLab/AdmixTools. We use the following 17 diverse ancient east and west Eurasian populations references (pright) to robustly differentiate fine streams of ancestry between our STUDY and TARGET populations (pleft). The DNA samples used in this study, except for the Kurdish 30X WGS samples, are publicly available at Reich Lab at the following link; https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/3AR0CD. The Xinjiang samples2 were published in https://www.science.org/doi/10.1126/science.abk1534, and the Iron-Age Uzbekistan samples4 were published in https://academic.oup.com/mbe/article/38/11/4908/6329832

Pright Reference Populations1:

Mbuti
CHG
IRN_Ganj_Dareh_N
ISR_PPNB
MAR_Taforalt_EpiP
SRB_Iron_Gates_HG
ANE
TUR_Marmara_Barcın_N
WHG
RUS_Yana_RHS
MNG_Khovsgol_LBA
RUS_DevilsCave_N
RUS_Shamanka_EN
Loschbour_WHG
UZB_Sappali_Tepe_BA
RUS_Siberia_Lena_EBA

We can independently test each present population with one ancient population of to determine whether the form a clade. For example, we can test whether Armenians form a clade with an ancient from Van, Turkey by setting pleft to Armenian and Van_Ancient , using 17 diverse ancient east and west Eurasian populations as pright references to robustly differentiate fine streams of ancestry between Armenians and Van_Ancient.

In this manner qpWave outputs a p-value associated with the “Null Hypothesis”, with the hypothesis being there is no significant genetic difference between Armenians and Van_Ancient, and any observed difference is due to sampling or experimental error. We use the generally used p-value of 0.05 as the cutoff, where 0.05 indicates a 5% probability of there being “null” or no genetic difference between Armenians and Van_Ancient. In other words, we reject the null hypothesis when p is less than 0.05. Alternatively stated we would conclude that the Van_Ancient samples are not genetically the same as the present-day Armenian samples.

The “Genetic Similarity” values shown in figures 3,4, and 5 were simply obtained as follows; (1/ChiSq) x 100.

Our analysis used published data1 genotyped on the 1240K Axiom Affymatrix array. The present-day populations consist of the Simons WGS set. All the available Iranian, Armenian and Turkish samples in the Reichlab dataset were used in this study. As an additional quality control measure we removed samples with less than 600,000 intersecting SNPs with the 1240K set. Additionally, for Kurds-IRQ we used two whole genome sequenced males from northern Iraq sequenced to a depth of 30X. We processed the fastq sequences using our own samtools pipeline which consists of the following:

  1. AdapterRemoval to remove remnant adapter sequences from High-Throughput Sequencing (HTS) data and trim low quality bases from the 3′ end of reads following adapter removal. Additionally, we removed sequences shorter than 24 bases which due to their short length may map to several regions of the reference genome
  2. bwa mem to quickly align the sequences to the GRCh37 Human Reference and piped the output to samtools view to produce BAM files. The reads were merged post-alignment instead of with AdapterRemoval
  3. samtools sort to coordinate sort BAM file
  4. Picard to soft-clip beyond-end-of-reference alignments and set MAPQ to 0 for unmapped reads
  5. Picard to remove duplicate sequences
  6. bcftools mpileup to convert BAM file into genomic positions we first use mpileup to produce a BCF file that contains all of the locations in the genome. We use this information to call genotypes and reduce our list of sites to those found to be variant by passing this file into bcftools call. Thresholds used were –min-MQ 15, –min-BQ 20
  7. bcftools view to remove positions supported by less than 4 reads
  8. This yielded VCF files with 1478 million bases
  9. Extraction of Affymetrix 1240K variants yielded a final VCF of 1160K variants

Data Availability

The 1240K genotype data1,2,4 used in this study is readily available at Reich Lab at https://reich.hms.harvard.edu/datasets.

REFERENCES

  1. The genetic history of the Southern Arc: A bridge between West Asia and Europe, Lazaridis et al, American Association for the Advancement of Science, doi: 10.1126/science.abm4247, https://doi.org/10.1126/science.abm4247, 2022/09/28
  2. Bronze and Iron Age population movements underlie Xinjiang population history, Kumar et al, Science 376, 62 (2022), DOI: 10.1126/science.abk1534
  3. Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, Genschoreck T, Webster T, Reich D. Ancient admixture in human history. Genetics. 2012 Nov;192(3):1065-93. doi: 10.1534/genetics.112.145037. Epub 2012 Sep 7. PMID: 22960212; PMCID: PMC3522152.
  4. Genetic Continuity of Bronze Age Ancestry with Increased Steppe-Related Ancestry in Late Iron Age Uzbekistan, Kumar et al, Molecular Biology and Evolution, Volume 38, Issue 11, November 2021, Pages 4908–4917, https://doi.org/10.1093/molbev/msab216
Scroll to Top
Scroll to Top