Genetic Diversity and Light Skin Adaptation in Cameroon, Congo, and Bantu Populations

Africa is the continent where anatomically modern humans originated within the past 300 ky and the source of migration of anatomically modern humans out of Africa within the past 80 ky. Africa is also a continent of tremendous cultural, linguistic, phenotypic, and genetic diversity. More than 2,000 ethnolinguistic groups have been identified in Africa, representing around one-third of the world’s languages.

These languages are classified into four major phyla: Afroasiatic, Nilo-Saharan, Niger-Congo, and Khoesan. The Niger-Congo phylum, consisting of ~1,500 sub-languages, is the largest language phylum in Africa. The largest subfamily of languages are the Bantu languages, which originated near the border of Cameroon and Nigeria.

Recent studies have focused on understanding the genetic diversity within African populations, including variations related to skin color. One such study conducted high coverage (>30X) whole-genome sequencing of 180 individuals from 12 indigenous African populations. This research identified millions of unreported variants, many predicted to be functionally important.

Map of the Bantu Expansion

Study Populations and Methods

The study included populations from Ethiopia (Amhara, Dizi, Chabu, and Mursi), Tanzania (Hadza and Sandawe), Cameroon (RHG - Baka and Bagyeli, Fulani, and Tikari), and Botswana (Herero, Ju|’hoansi and !Xoo, the latter two collectively referred to as “San”). These populations speak languages encompassing all four African language phyla. The Hadza and San still practice traditional hunter-gatherer subsistence styles, whereas the Sandawe have adopted agriculture and herding within the past few hundred years.

Read also: History of Moroccan Light Fixtures

The RHG, who, based on their short stature, have been referred to as “Pygmies”, have lost their traditional language and now speak Bantu languages. Such language replacement also happened to the Fulani, who are traditionally nomadic pastoralists living across a broad range of Africa encompassing the Sudan, Central, and Western Africa. The Fulani now speak a Niger-Congo language most similar to languages spoken on the west coast of Africa.

The Chabu have a census population size of only 1,000-2,000 individuals, live in a mountainous region in southwestern Ethiopia, and practice a foraging lifestyle. Their language is considered a ‘language isolate’ and one of the ‘severely endangered languages’ of the world.

Identification of Genomic Variants

Across these populations, millions of genomic variants were characterized, many of which were predicted to be functional and of potential biomedical relevance. High coverage (> 30X) WGS data was generated from 15 individuals per population from 12 African populations (180 individuals total), representing the most diverse genetic ancestries in sub-Saharan Africa based on prior admixture analyses. After quality control, a total of 35,201,568 variants were identified: 32,438,935 single nucleotide polymorphisms (SNPs) and 2,762,633 small insertions and deletions. Further analyses were restricted to 32,044,896 biallelic SNPs. The average number of SNPs varies greatly among populations.

5,344,342 SNPs were identified that are not reported in dbSNP version 155 nor gnomAD version 2.1. Around 78% of the unreported SNPs are population-specific, 15% are shared by populations in the same country, and 7% are shared by populations residing in different countries. Variants at unreported SNPs are significantly rarer than those at previously reported SNPs. The Dizi, Ju|’hoansi and !Xoo have the greatest numbers of population-specific unreported variants, and the Ju|’hoansi and !Xoo shared the greatest number of unreported SNPs among populations in the same country.

Among the unreported SNPs, 28,901 and 499 were identified as causing amino acid changes or stop codon gain/loss, respectively, as well as 95,844, 253,334, and 47,777 located in transcription factor binding site regions, enhancers, and active promoter regions, respectively, based on functional annotations using ANNOVAR.

Read also: The Light of the World: Ancient Egypt

Further, 154 SNPs in the dataset were reported as “Pathogenic” or “Likely Pathogenic” in the ClinVar database. Of these, 44 are at frequencies higher than 0.05 in at least one of the populations from this study but are either absent or at frequencies lower than 0.01 in non-African populations in gnomAD.

For example, rs74853476-C is a splice donor variant at dopamine beta-hydroxylase (DBH) associated with orthostatic hypotension 1 in non-African samples. While rs74853476-C is rare in all super-populations in gnomAD, it reaches 13% in the Fulani. Another example consists of three missense mutations, rs139426141-G, rs140482516-T, and rs34097903-A, in Peptidyl Arginine Deiminase 3 (PADI3) reported to associate with central centrifugal cicatricial alopecia in patients of African ancestry. Each of these variants is at a high frequency in at least one of the studied populations but is rare in the non-African super-populations in gnomAD. Thus, a number of variants that are labeled by ClinVar as putatively pathogenic are seen at high frequencies in one or more of our populations and, in fact, may be benign.

Phylogenetic Analysis and Population Structure

After merging the African WGS data with WGS data for Papuans from the Simons Genome Diversity Project (SGDP) and the Northern and Western Europeans from Utah, Tuscans, and Han Chinese in Beijing from the 1000 Genomes Project (1KGP), a neighbor-joining phylogenic tree was constructed using MEGA. It was observed that the Ju|’hoansi and !Xoo have the most basal lineages of all modern humans, followed by the RHG. The remaining populations largely clustered by their current geographical locations with a few exceptions.

Further, the Chabu clustered with the Nilo-Saharan-speaking Mursi, consistent with the linguistic classification of the Chabu language. The Hadza and Sandawe clustered near each other, though they did not form a monophyletic group, possibly due to strong admixture between the Sandawe and other East African populations. Consistent with previous studies, the Fulani and two Ethiopian Afroasiatic-speaking populations, the Amhara and Dizi, are genetically closest to non-African populations.

Principal component analysis (PCA) of the current dataset merged with a global WGS dataset from the SGDP reveals both continental and population-specific patterns of genetic variation. PC1 separates Africans and non-Africans, with the exception of populations in North Africa and the Middle East, consistent with prior studies. PC2 distinguishes the San from other Africans. Subsequent principal components differentiate the Hadza, Chabu, Dizi, and Mursi from other populations along PC3, and RHG populations are distinguished along PC4.

Read also: Unraveling Ethiopian Skin Pigmentation

Including 55 ancient Eastern and Southern African samples dated from 10,000 - 160 BP in the PCA, a wide geographic distribution of Khoesan-related individuals in Africa was observed as previously noted; 15 ancient samples either overlap or fall onto a geographic cline between the present-day Eastern and Southern African Khoesan-speaking hunter-gatherer populations. For example, Mota from Ethiopia (4524 - 4418 BP) and ancient foragers from Tanzania and Kenya (4080 - 160 BP) overlap in the PCA with the Sandawe and Hadza.

ADMIXTURE analysis of the merged dataset separated African and non-African populations at K = 2. At K = 4, San ancestry becomes distinct, which is also common in the RHG, Sandawe and Hadza. At K=7, east African populations (e.g., Hadza, Sandawe, Chabu, Dizi, Amhara, and Mursi) emerged as a cluster. The Fulani formed a distinct cluster at K = 8. The Hadza emerged as a cluster at K = 10 and the RHG, and Chabu became distinct clusters at K = 12.

At K = 16 the Ju|’hoansi who speak a northern Khoesan language and the !Xoo and Khomani San who speak a southern Khoesan language become distinguished. Additionally, Nilo-Saharan-speaking populations (e.g., the Dinka, Mursi, and Sengwer) became a single cluster at K = 16. Niger-Congo-related ancestry was inferred to be widely spread across sub-Saharan Africa, but was most common in west and central African Niger-Congo-speaking populations compared to eastern and southern Niger-Congo-speaking populations that have admixed to varying degrees with neighboring populations.

More complex demographic histories were modeled using TreeMix and qpgraph. When no admixture is allowed, the topologies based on qpgraph and TreeMix are consistent with the topology of the neighbor-joining tree, with the San as an outgroup to all other populations. However, the topologies of qpgraph and TreeMix vary tremendously when allowing admixture among populations.

When modeling 10 admixture events, qpgraph estimated that the East African Khoesan populations, the Hadza and Sandawe, respectively derive 71% and 38% ancestry from a population ancestral to the Southern African Khoesan population (consistent with migration events between the Hadza, Sandawe and San inferred from TreeMix with 9 migration events). These populations, particularly the Sandawe, also derive ancestries from an Afroasiatic-like population, likely reflecting recent Afroasiatic gene flow, consistent with TreeMix with 4 migration events. It was estimated that the Ethiopian populations (Amhara, Dizi, Mursi, and Chabu) derived 98% and 2% of their ancestries from a population ancestral to the Hadza and a population ancestral to all modern human populations, respectively.

Furthermore, 80% of the Omotic-speaking Dizi ancestry can be traced back to a Chabu-related population and 20% to an Amhara-related population (consistent with TreeMix results with 7 migration events). In addition, qpgraph indicates that the RHG derive 37% of their ancestry from a population ancestral to the San and 63% of their ancestry from a Niger-Congo-speaking population consistent with high levels of Bantu gene-flow to the RHG. The relationship of the Tikari and Herero with other populations is complex. They could be modeled as having 23% ancestry related to an archaic population that diverged prior to the divergence of all modern human populations and 77% ancestry from a population related to the Nilo-Saharan-speaking Mursi. A similar pattern was observed in the ADMIXTURE analyses at K = 7 to 11 but with much lower inferred Nilo-Saharan-related ancestries in the Tikari and Herero.

The TreeMix analyses showed evidence of gene flow between the Mursi and the ancestors of the Tikari and Herero starting at 5 migration events. Consistent with the ADMIXTURE results, TreeMix and qpgraph analyses detected extensive recent gene flow among African populations.

Light Skin Alleles and Adaptation

The derived APBA2 (OCA2) allele is present at low frequencies in most populations of African ancestry, and at high frequencies in most populations of Asian and European ancestry. These results suggest that an APBA2 (OCA2) mutation conferring light skin arose before the spread of humans out of Africa. However, the presence of this gene in Sub-Saharan Africa may also be attributed to migrations into the region and gene diffusion from other populations.

It is important to note that genetic studies in Africa are complex and require careful consideration of population history and admixture events. Further research is needed to fully understand the genetic basis of skin color variation and other adaptive traits in African populations.

We observe evidence of introgression from a deeply diverged population into the ancestor of all modern human populations. In addition, the Bantu-speaking and RHG populations show some ancestry that is very old, possibly reflecting subsequent introgression with a deeply diverged population.

The ancestors of southern African San and central African rainforest hunter-gatherers (RHG) diverged from other populations >200 kya and maintained a large effective population size. Evidence for ancient population structure in Africa and for multiple introgression events from “ghost” populations with highly diverged genetic lineages was observed. Although currently geographically isolated, evidence for gene flow between eastern and southern Khoesan-speaking hunter-gatherer populations lasting until ~12 kya was observed. Signatures of local adaptation for traits related to skin color, immune response, height, and metabolic processes were identified.

Population Genetics: Why do we have different skin colors?: Crash Course Biology #14

Popular articles:

tags: #Cameroon