Population replacements in Armenia and Iran over the past 7000 years

ABSTRACT

We present genetic evidence of multiple demographic changes in Armenia and western Iran over the past 7000 years. By about 3000 years ago descendants of the 7000-year-old Chalcolithic Zagrosian herders, hereinafter referred to as Iran-CHL, along with their likely Elamo-Dravidian languages were gradually replaced by invading “Aryan” warriors from around the ancient Ariana (Airyanem Vaejah) and surrounding region of Central Asia. These invaders were likely associated with the Yaz Central Asian culture and likely spoke Avestan. These early Iranians genetically resembled the sample named Turkmenistan-IA, which is a mixture of BMAC and neighboring “Andronovo Horizon”.

Genetically, the 2800-year-old early Mede era Iron-Age DNA from samples excavated from Hasanlu in NW Iran, hereinafter referred to as Hasanlu-IA, lie on a genetic cline (Figures 1 & 2) between the 7000-year Iran-CHL herders and the Turkmenistan-IA sample associated with the Central Asian Iranic Yaz culture. This indicates that by 2800 years ago, the process of Aryanization of western Iran had already commenced, and this implies that the Medes themselves were a genetic mix of Iran-CHL and Turkmenistan-IA, which is consistent with the Hasanlu-IA samples lying midway on a genetic cline between Iran-CHL and Turkmenistan-IA (Figures 1 & 2). Thus, the first waves of Yaz Aryan invaders from Central Asia hybridizing with the descendants of Iran-CHL caused the genetics in western Iran to shift in the direction of Central Asia and Andronovo. This was also accompanied by a language shift from the earlier Elamo-Dravidian to Avestan derived Iranian languages and the introduction of Zoroastrianism from Central Asia in the Iranian plateau.

We hypothesize that sometime between 2000 and 3000 years ago the Avestan languages spoken by the Yaz Central Asian Indo-Iranians gave rise to the western and eastern Iranian languages. We also hypothesize that this is when the process of genetic differentiation between Persians, Kurds, and Baloch commenced.

Subsequently, additional waves of warriors from Central Asia, which includes Parthians and Scythians moved into western Iran and hybridized with Hasanlu-IA. It is at this point that the ethnic identity of present-day Kurds and Persians is formed. We see this in Figures 1 & 2 where present-day Kurds and Persians lie midway on a genetic cline between Hasanlu-IA and Central Asian Scythians and Sarmatians. These later invaders also introduced paternal DNA lineages of R1a-Z94 into western Iran. These lineages peak among Kurds in western Iran.

Thus, over the past 7000 years, we observe in figures 1 & 2 a gradual shift in the genetics of Armenia and western Iran towards Central Asia from Iran-CHL. This implies that present-day Armenians also harbor DNA from the various invading Aryan waves from Central Asia. The positioning of Armenians on the PCA plots in figures 1 & 2 slightly to the left and below Kurds and Persians simply indicates that Armenians were less genetically affected by the subsequent Parthian, Scythian, and Turkic waves than Kurds and Persians. Nonetheless, present-day Armenians are also shifted in the direction of Central Asia when compared with the 3000-year-old Trialeti-Vanadzor culture samples from ancient Armenia (Figures 1 & 2). Some of this shift in present-day Armenians is likely caused by demic diffusion from neighboring populations.

Finally, the PCA plots in figures 1 & 2 also clearly show that none of our study populations lie on an ancient NW Iranian – Trialeti Armenian genetic cline meaning that present-day Armenians, Kurds, and Persians can not be modeled as an ancient NW Iranian – Armenian two-way mixture.  

Figure 1 – PCA plot showing population replacements in NW Iran and Armenia over the past 7000 years. Observations are scaled population averages. PC1 and PC2 capture a high 87% of the variation within the data. PC1 corresponds to variation in East Eurasian, and PC2 corresponds to variation in WHG and to a lesser extent ANE. Immediately discernable is a genetic shift of the genetics of Armenia, NW Iran, and E Turkey gradually towards Central Asian Steppe MLBA, Scythian and Sarmatian populations over time. We also observe that towards the end of BMAC around 3500 years ago, populations on its periphery such as the Dashti-Kozy samples in Tajikistan genetically resembled Andronovo LBA.
Figure 2 – Observations represent population averages. Early Mede era 2800 year old samples from Hasanlu in NW Iran lie on a genetic cline between 7000 year-old NW Iranian Iran-CHL herders and Yaz region 2800 year-old sample TKM-IA
Figure 2a – Scree plot for the PCAs in figures 1 & 2 showing that 87% of the genetic variation in the samples is captured by PC1 & PC22.

INTRODUCTION

By the Chalcolithic around 7000 years ago the nomadic lifestyle of sheep and goat herding was well established in the Zagros mountains of Iran. This proved to be a good survival method for ancient humans as those nomadic Iranian Chalcolithic herders would end up becoming ancestors of many present-day populations across a large swath of land from West Asia to South-Central Asia. We refer to those DNA samples extracted from human remains around Haji-Firoz as Iran-CHL.. The language of Iran-CHL was likely some form of Elamo-Dravidian.

Genetic analysis indicates some Iran-CHL migrated east and became ancestral to the 4000 year-old Bactria–Margiana Archaeological Complex (BMAC) in Central Asia1 . Our analysis using qpWave indicates that 7000 year-old Iran-CHL were genetically equidistant from present-day Armenians and Kurds, however, the ancestors of Armenians and Kurds had already diverged by the Mede era since the 2800 year-old Hasanlu-IA samples are genetically closer to Kurds than Armenians. Similarly, the Mede era Hasanlu-IA samples are more closely related to Kurds than to Persians indicating that Kurds have more Mede related ancestry than Persians.

RESULTS & DISCUSSION

By 2800 years ago, the genetic, linguistic, and cultural landscape of NW Iran had significantly changed from the Chalcolithic. The Indo-Iranians from the Ariana and Yaz regions of Central Asia had established themselves in the NW Iran region by hybridizing with the descendants of the more ancient Iran-CHL people. They also brought their Indo-Iranian languages with them from Central Asia, and around this time the Zoroastrian religion from Central Asia also became dominant in NW Iran.

Genetically, we see evidence of this in figures 1 and 2, where the Mede era Hasanlu-IA samples labeled Mede-IA lie midway on a genetic cline between Iran-CHL and Turkmenistan-IA.

This decrease in genetic distance to Steppe-MLBA populations over time is not only evident in Iran. Similar changes were simultaneously occurring in Armenia and the Caucasus with the Kura-Araxes and Trialeti-Vanadzor cultures. We see these changes in figures 1 and 2, as well as the hierarchical cluster diagram in figure 3 where Iran-Medes-IA and Armenia-Trialet-BA are closer to Steppe-MLBA and present-day Armenian, Kurd and Persian populations than to Iran-CHL.

As we move from the Iron-Age to the present, figures 1 and 2 show present-day Armenians, Kurds, and Persians moving yet closer to Central Asian Sarmatian and Scythian populations and are midway on a genetic cline between Iran-Mede-IA and Central Asian Sarmatians and Scythians. This is consistent with our models using qpAdm showing present-day Kurds and Persians as a combination of Iron-Age Mede era Iranian populations and Central Asian Sarmatians and Scythians. This is also consistent with the introduction of male paternal haplogroup R1a-Z94 in NW Iran, particularly among the Kurds post Medes.

The PCAs in figures 1 and 2 show Armenians also on a cline between Iran-Mede-IA/Armenia-Trialeti-BA and Central Asian Sarmatians indicating present-day Armenians also have Steppe-MLBA related ancestry, however, they are located to the left of Kurds and Persians on the PC1 East Eurasian axis. This indicates less geneflow from post Early Iron Age Sarmatians, Scythians, and Turkics than for Kurds and Persians.

Figure 4 shows clustering based on distances of population cluster averages. Here we see the Central Asian Stepp-MLBA and Scythian/Sarmatian clusters are sister clusters to the West Asian cluster which consists primarily of populations with heavy Indo-Iranian admixture.

Figure 3 – Hierarchical clustering using the Ward method using 1240K SNPs. Observations are population averages.
Figure 4 – Hierarchical clustering using the Average Clustering method using 1240K SNPs. Observations are population averages

indicating that NW Iran had already been “Aryanized” by hybridization of ancient Zagrosian herders with invading “Aryans” from the Ariana and Yaz regions in Central Asia..

Genetic models using the qpAdm code of Admixtools3 software reinforces the aforementioned PCA results by showing that present-day Kurds formed when western Iranian late Bronze/ early Iron-Age farmers/herders hybridized with Central Asian Turco-Aryans as shown in figures 5 & 6.

Fig 5 – Very robust QpAdm models for Kurds result when Iranian Zagrosian DinkaTepe-BA or Hasanlu-IA are used as once ancestral source, and an Iron-Age or later Central Asian population is used as the other source. We are surprised to see the robustness of these models such that even 16 Neolithic and younger diverse Eurasian reference populations as references (pright) does not cause the models to fail.
Fig 6 – Robust qpAdm models using 600K intersecting SNPs showing that Kurds are a mix of local Bronze/Iron Age Iranian + Iron-Age Central Asians, and thus pointing to the ancestral homeland of Iranics such as Kurds to the east of Armenia. Admixture percentages are shown in brackets with models sorted by p-value. Populations similar to the above Central Asians introduced Y-DNA haplogroup R1a-Z94 to Kurdistan post Late Iron-Age to make it one of the major haplogroups among present day Kurds and to a lesser extent a few other West Asian populations.

Methods & Quality Control

Our analysis used published data 4 genotyped on the 1240K Axiom Affymatrix array. The present-day populations consist of the Simons WGS set. All the available Iranian, Armenian and Turkish samples in the Reichlab dataset were used in this study. As an additional quality control measure we removed samples with less than 600,000 intersecting SNPs with the 1240K set. Additionally, for Kurds-IRQ we used two whole genome sequenced males from northern Iraq sequenced to a depth of 30X. We processed the fastq sequences using our own samtools pipeline which consists of the following:

  1. AdapterRemoval to remove remnant adapter sequences from High-Throughput Sequencing (HTS) data and trim low quality bases from the 3′ end of reads following adapter removal. Additionally, we removed sequences shorter than 24 bases which due to their short length may map to several regions of the reference genome
  2. bwa mem to quickly align the sequences to the GRCh37 Human Reference and piped the output to samtools view to produce BAM files. The reads were merged post-alignment instead of with AdapterRemoval
  3. samtools sort to coordinate sort BAM file
  4. Picard to soft-clip beyond-end-of-reference alignments and set MAPQ to 0 for unmapped reads
  5. Picard to remove duplicate sequences
  6. bcftools mpileup to convert BAM file into genomic positions we first use mpileup to produce a BCF file that contains all of the locations in the genome. We use this information to call genotypes and reduce our list of sites to those found to be variant by passing this file into bcftools call. Thresholds used were –min-MQ 15, –min-BQ 20
  7. bcftools view to remove positions supported by less than 4 reads
  8. This yielded VCF files with 1478 million bases
  9. Extraction of Affymetrix 1240K variants yielded a final VCF of 1160K variants

Data Availability

The 1240K genotype data used in this study is readily available at Reich Lab at https://reich.hms.harvard.edu/datasets

REFERENCES

  1. Narasimhan, Vagheesh M.; et al. (2019). “The formation of human populations in South and Central Asia”. Science. 365 (6457): eaat7487. doi:10.1126/science.aat7487. PMC 6822619. PMID 31488661
  2. Bronze and Iron Age population movements underlie Xinjiang population history, Kumar et al, Science 376, 62 (2022), DOI: 10.1126/science.abk1534
  3. Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, Genschoreck T, Webster T, Reich D. Ancient admixture in human history. Genetics. 2012 Nov;192(3):1065-93. doi: 10.1534/genetics.112.145037. Epub 2012 Sep 7. PMID: 22960212; PMCID: PMC3522152.
  4. The genetic history of the Southern Arc: A bridge between West Asia and Europe, Lazaridis et al, American Association for the Advancement of Science, doi: 10.1126/science.abm4247, https://doi.org/10.1126/science.abm4247, 2022/09/28
Scroll to Top
Scroll to Top