ABSTRACT
We present genetic evidence of multiple demographic changes in Armenia and western Iran over the past 7000 years. By about 3000 years ago descendants of the 7000-year-old Chalcolithic Zagrosian herders, hereinafter referred to as Iran-CHL, along with their likely Elamo-Dravidian languages were gradually replaced by invading “Aryan” warriors from around the ancient Ariana (Airyanem Vaejah) and surrounding region of Central Asia. These invaders were likely associated with the Yaz Central Asian culture and likely spoke Avestan. These early Iranians genetically resembled the sample named Turkmenistan-IA, which is a mixture of BMAC and neighboring “Andronovo Horizon”.
Genetically, the 2800-year-old early Mede era Iron-Age DNA from samples excavated from Hasanlu in NW Iran, hereinafter referred to as Hasanlu-IA, lie on a genetic cline (Figures 1 & 2) between the 7000-year Iran-CHL herders and the Turkmenistan-IA sample associated with the Central Asian Iranic Yaz culture. This indicates that by 2800 years ago, the process of Aryanization of western Iran had already commenced, and this implies that the Medes themselves were a genetic mix of Iran-CHL and Turkmenistan-IA, which is consistent with the Hasanlu-IA samples lying midway on a genetic cline between Iran-CHL and Turkmenistan-IA (Figures 1 & 2). Thus, the first waves of Yaz Aryan invaders from Central Asia hybridizing with the descendants of Iran-CHL caused the genetics in western Iran to shift in the direction of Central Asia and Andronovo. This was also accompanied by a language shift from the earlier Elamo-Dravidian to Avestan derived Iranian languages and the introduction of Zoroastrianism from Central Asia in the Iranian plateau.
We hypothesize that sometime between 2000 and 3000 years ago the Avestan languages spoken by the Yaz Central Asian Indo-Iranians gave rise to the western and eastern Iranian languages. We also hypothesize that this is when the process of genetic differentiation between Persians, Kurds, and Baloch commenced.
Subsequently, additional waves of warriors from Central Asia, which includes Parthians and Scythians moved into western Iran and hybridized with Hasanlu-IA. It is at this point that the ethnic identity of present-day Kurds and Persians is formed. We see this in Figures 1 & 2 where present-day Kurds and Persians lie midway on a genetic cline between Hasanlu-IA and Central Asian Scythians and Sarmatians. These later invaders also introduced paternal DNA lineages of R1a-Z94 into western Iran. These lineages peak among Kurds in western Iran.
Thus, over the past 7000 years, we observe in figures 1 & 2 a gradual shift in the genetics of Armenia and western Iran towards Central Asia from Iran-CHL. This implies that present-day Armenians also harbor DNA from the various invading Aryan waves from Central Asia. The positioning of Armenians on the PCA plots in figures 1 & 2 slightly to the left and below Kurds and Persians simply indicates that Armenians were less genetically affected by the subsequent Parthian, Scythian, and Turkic waves than Kurds and Persians. Nonetheless, present-day Armenians are also shifted in the direction of Central Asia when compared with the 3000-year-old Trialeti-Vanadzor culture samples from ancient Armenia (Figures 1 & 2). Some of this shift in present-day Armenians is likely caused by demic diffusion from neighboring populations.
Finally, the PCA plots in figures 1 & 2 also clearly show that none of our study populations lie on an ancient NW Iranian – Trialeti Armenian genetic cline meaning that present-day Armenians, Kurds, and Persians can not be modeled as an ancient NW Iranian – Armenian two-way mixture.
INTRODUCTION
By the Chalcolithic around 7000 years ago the nomadic lifestyle of sheep and goat herding was well established in the Zagros mountains of Iran. This proved to be a good survival method for ancient humans as those nomadic Iranian Chalcolithic herders would end up becoming ancestors of many present-day populations across a large swath of land from West Asia to South-Central Asia. We refer to those DNA samples extracted from human remains around Haji-Firoz as Iran-CHL.. The language of Iran-CHL was likely some form of Elamo-Dravidian.
Genetic analysis indicates some Iran-CHL migrated east and became ancestral to the 4000 year-old Bactria–Margiana Archaeological Complex (BMAC) in Central Asia1 . Our analysis using qpWave indicates that 7000 year-old Iran-CHL were genetically equidistant from present-day Armenians and Kurds, however, the ancestors of Armenians and Kurds had already diverged by the Mede era since the 2800 year-old Hasanlu-IA samples are genetically closer to Kurds than Armenians. Similarly, the Mede era Hasanlu-IA samples are more closely related to Kurds than to Persians indicating that Kurds have more Mede related ancestry than Persians.
RESULTS & DISCUSSION
By 2800 years ago, the genetic, linguistic, and cultural landscape of NW Iran had significantly changed from the Chalcolithic. The Indo-Iranians from the Ariana and Yaz regions of Central Asia had established themselves in the NW Iran region by hybridizing with the descendants of the more ancient Iran-CHL people. They also brought their Indo-Iranian languages with them from Central Asia, and around this time the Zoroastrian religion from Central Asia also became dominant in NW Iran.
Genetically, we see evidence of this in figures 1 and 2, where the Mede era Hasanlu-IA samples labeled Mede-IA lie midway on a genetic cline between Iran-CHL and Turkmenistan-IA.
This decrease in genetic distance to Steppe-MLBA populations over time is not only evident in Iran. Similar changes were simultaneously occurring in Armenia and the Caucasus with the Kura-Araxes and Trialeti-Vanadzor cultures. We see these changes in figures 1 and 2, as well as the hierarchical cluster diagram in figure 3 where Iran-Medes-IA and Armenia-Trialet-BA are closer to Steppe-MLBA and present-day Armenian, Kurd and Persian populations than to Iran-CHL.
As we move from the Iron-Age to the present, figures 1 and 2 show present-day Armenians, Kurds, and Persians moving yet closer to Central Asian Sarmatian and Scythian populations and are midway on a genetic cline between Iran-Mede-IA and Central Asian Sarmatians and Scythians. This is consistent with our models using qpAdm showing present-day Kurds and Persians as a combination of Iron-Age Mede era Iranian populations and Central Asian Sarmatians and Scythians. This is also consistent with the introduction of male paternal haplogroup R1a-Z94 in NW Iran, particularly among the Kurds post Medes.
The PCAs in figures 1 and 2 show Armenians also on a cline between Iran-Mede-IA/Armenia-Trialeti-BA and Central Asian Sarmatians indicating present-day Armenians also have Steppe-MLBA related ancestry, however, they are located to the left of Kurds and Persians on the PC1 East Eurasian axis. This indicates less geneflow from post Early Iron Age Sarmatians, Scythians, and Turkics than for Kurds and Persians.
Figure 4 shows clustering based on distances of population cluster averages. Here we see the Central Asian Stepp-MLBA and Scythian/Sarmatian clusters are sister clusters to the West Asian cluster which consists primarily of populations with heavy Indo-Iranian admixture.
indicating that NW Iran had already been “Aryanized” by hybridization of ancient Zagrosian herders with invading “Aryans” from the Ariana and Yaz regions in Central Asia..
Genetic models using the qpAdm code of Admixtools3 software reinforces the aforementioned PCA results by showing that present-day Kurds formed when western Iranian late Bronze/ early Iron-Age farmers/herders hybridized with Central Asian Turco-Aryans as shown in figures 5 & 6.
Methods & Quality Control
Our analysis used published data 4 genotyped on the 1240K Axiom Affymatrix array. The present-day populations consist of the Simons WGS set. All the available Iranian, Armenian and Turkish samples in the Reichlab dataset were used in this study. As an additional quality control measure we removed samples with less than 600,000 intersecting SNPs with the 1240K set. Additionally, for Kurds-IRQ we used two whole genome sequenced males from northern Iraq sequenced to a depth of 30X. We processed the fastq sequences using our own samtools pipeline which consists of the following:
- AdapterRemoval to remove remnant adapter sequences from High-Throughput Sequencing (HTS) data and trim low quality bases from the 3′ end of reads following adapter removal. Additionally, we removed sequences shorter than 24 bases which due to their short length may map to several regions of the reference genome
- bwa mem to quickly align the sequences to the GRCh37 Human Reference and piped the output to samtools view to produce BAM files. The reads were merged post-alignment instead of with AdapterRemoval
- samtools sort to coordinate sort BAM file
- Picard to soft-clip beyond-end-of-reference alignments and set MAPQ to 0 for unmapped reads
- Picard to remove duplicate sequences
- bcftools mpileup to convert BAM file into genomic positions we first use mpileup to produce a BCF file that contains all of the locations in the genome. We use this information to call genotypes and reduce our list of sites to those found to be variant by passing this file into bcftools call. Thresholds used were –min-MQ 15, –min-BQ 20
- bcftools view to remove positions supported by less than 4 reads
- This yielded VCF files with 1478 million bases
- Extraction of Affymetrix 1240K variants yielded a final VCF of 1160K variants
Data Availability
The 1240K genotype data used in this study is readily available at Reich Lab at https://reich.hms.harvard.edu/datasets
REFERENCES
- Narasimhan, Vagheesh M.; et al. (2019). “The formation of human populations in South and Central Asia”. Science. 365 (6457): eaat7487. doi:10.1126/science.aat7487. PMC 6822619. PMID 31488661
- Bronze and Iron Age population movements underlie Xinjiang population history, Kumar et al, Science 376, 62 (2022), DOI: 10.1126/science.abk1534
- Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, Genschoreck T, Webster T, Reich D. Ancient admixture in human history. Genetics. 2012 Nov;192(3):1065-93. doi: 10.1534/genetics.112.145037. Epub 2012 Sep 7. PMID: 22960212; PMCID: PMC3522152.
- The genetic history of the Southern Arc: A bridge between West Asia and Europe, Lazaridis et al, American Association for the Advancement of Science, doi: 10.1126/science.abm4247, https://doi.org/10.1126/science.abm4247, 2022/09/28