Dating the origin of language using phonemic diversity.
Perreault C, Mathew S.
I could write an introduction the explains how fundamentally important language is but I have a feeling I’d be telling you nothing new. Our spoken language forms the basis of co-operation and is one of the most obvious differences between us and chimps (along with bipedality and a lack of fur).
The obvious importance of language naturally makes most people curious about it. Who first spoke? Why did they decide to? How did they figure it out? When did all this happen? The study of language is a vibrant field that attracts all sorts of people who want to learn about this crucial trait.
Luckily for them language behaves in a manner akin to evolution. It is passed through generations and occasionally changes in the process. This means some evolutionary techniques could be applied to language. For example, if you could measure the rate of this evolution you might be able to work backwards and figure out how long it has been around.
However this potential for evolutionary study is often little more than a scientist-tease. Language is also influenced by a range of social factors that often mask or otherwise alter its evolution. As such there are many cases where researchers think they’re onto something, only for it to turn out to be worthless (or at least not as profound as they thought).a, Majority-rule consensus tree based on the MCMC sample of 1,000 trees. The main language groupings are colour coded. Branch lengths are proportional to the inferred maximum-likelihood estimates of evolutionary change per cognate. Values above each branch (in black) express the bayesian posterior probabilities as a percentage. Values in red show the inferred ages of nodes in years BP. *Italic also includes the French/Iberian subgroup. Panels b–e show the distribution of divergence-time estimates at the root of the Indo-European phylogeny for: b, initial assumption set using all cognate information and most stringent constraints (Anatolian, Tocharian, (Greek, Armenian, Albanian, (Iranian, Indic), (Slavic, Baltic), ((North Germanic, West Germanic), Italic, Celtic))); c, conservative cognate coding with doubtful cognates excluded; d, all cognate sets with minimum topological constraints (Anatolian, Tocharian, (Greek, Armenian, Albanian, (Iranian, Indic), (Slavic, Baltic), (North Germanic, West Germanic), Italic, Celtic)); e, missing data coding with minimum topological constraints and all cognate sets. Shaded bars represent the implied age ranges under the two competing theories of Indo-European origin: blue, Kurgan hypothesis; green, Anatolian farming hypothesis. The relationship between the main language groups in the consensus tree for each analysis is also shown, along with posterior probability values.
However, two researchers think they’ve gotten around such problems and believe they may have figured out when languages first arose. Charles Perreault and Sarah Mathew looked into phonemic diversity, which seems to change at a set rate.
A phoneme is essentially a sound, so phonemic diversity is the number of sounds included in a language. English, for example, includes “th” as in “they” and “uh” as in “cup.” In case you’re still don’t get it (or are just curious) a complete list of English phonemes can be found here.
Unlike other linguistic elements, like words, phonemes aren’t very strongly influenced by culture. Whilst the inventions of the computer has introduced many new words into the English language it hasn’t added any new sounds (except maybe this
Phonemes have been used before, notably to study where language originated. When a small group moves to a new region they take with them a limited sample of the original population leading to reduced diversity in that pioneering group. This is known as “bottlenecking.”
As such the most diversity will be found in older populations whilst groups which split off from this will have reduced diversity. Those who split off from the migratory group will have even less diversity still. Since the most diversity is in Africa, this means that is where language started.
This study also concluded that phonemes accumulate at a faster rate in larger populations and the new research builds on that. If two groups migrate from an ancestral population and one moves to a large area whilst the other goes to a small, isolated island, then the latter’s phonemes will not change as much.
As such the island population acts as an effective “control” population with phoneme diversity similar to the ancestral population. You can then compare this to the other, larger group to see how many new phonemes have arisen.
Then its simply a case of dividing the number of new phonemes by the time since the two groups diverged and boom! You have calculated the rate phonemes change, allowing you to calculate how long it would’ve taken to accumulate all the phonemes in language and thus how language has been around.A model of change in phonemic diversity through drift and recovery.
At time two small populations, B and C, emigrate from population A and colonize two different regions. Population B settles on a large landmass, and subsequently grows and diversifies linguistically. As a result, the average phonemic diversity of population B increases with time. Conversely, the phonemic diversity of population C remains stable through time because it occupies a small, isolated island. Therefore, the phonemic diversity of population C can be used to approximate what the phonemic diversity of population B would have been at time Large dots denote high phonemic diversity and small dots denote low phonemic diversity.
Therefore all you need to calculate how long language has been around is two related languages, one from a small isolated area and one from a larger area. Also, you need to know how long they have been separate.
The researchers found a situation that provided this information in Southeast Asia. There, genetics suggests that the Andaman islands and mainland Southeast Asia were colonised at roughly the same time by the same group of people ~70,000 years ago.
So they plugged this data into their equation and got the rate at which phonemes accrue. Then they looked at how many phonemes are found in the most phoneme diverse languages (which are apparently the click languages from Africa) and worked out how long it would’ve taken them to get that number of phonemes.
Their results varied depending on how many phonemes they assumed the first language had, with results ranging from between 150-600 thousand years ago.
Sources: http://evoanth.wordpress.com/2012/05/10/dating-the-origin-of-language/ http://www.ncbi.nlm.nih.gov/pubmed/22558135