Speak for yourself

Arguably the most unique ability of humans is the ability to communicate highly complex concepts, for which we need language. But language can mean many things, from sign language to writing, although the most efficient form of language we have is speech – a skill unique today to living modern humans. The apes have a remarkable capacity for language and communication: Kanzi the bonobo (fig. 1) has a working knowledge of perhaps thousands of words and Koko the gorilla understands and uses American Sign Language. But they are unable to speak. Why is this? What are differences between human and ape anatomy that allow us to produce these sounds, and what selection pressures may have driven the evolution of our highly specialised anatomy?

Figure 1. Kanzi the bonobo with his lexigram sheets that he uses to communicate on a day-to-day basis. Credit: Great Ape Trust.
Figure 1. Kanzi the bonobo with his lexigram sheets that he uses to communicate on a day-to-day basis. Credit: Great Ape Trust.

The Anatomy of Speech

In human babies the entrance to the windpipe (trachea) is very high up in the throat, allowing them to swallow and breathe through their nose at the same time – milk simply flows around the trachea as it enters the throat (diagram). This also means that the voice box (larynx) is very high up in the throat. Starting around 3 months of age, the larynx begins to descend further down the trachea and stops its migration by the time we reach 4 years old.
The final position of the larynx is possibly the most important difference in the large-scale anatomy of our vocal apparatus. Its low position increases the overall length of the cavity that is our vocal tract which consists of the pharynx, oral cavity and nasal cavity (fig. 2). The human nasal cavity is also significantly longer than in apes and the tongue is far more compact, allowing air to resonate more freely.

Figure 2. The location of the pharynx in the human vocal tract. Credit: Arcadian / Wikimedia Commons
Figure 2. The location of the pharynx in the human vocal tract. Credit: Arcadian / Wikimedia Commons.

I know a song that’ll get on your nerves…

It is this resonance of the air that allows us to produce noises – the vibration of the vocal cords inside the larynx produces a tone that is bounced around inside our vocal tract before leaving our mouth. But obviously we can produce more than one type of sound, which means we must be able to change the shape of our vocal tract. Our two most distinct vowel sounds are ‘ee’ and ‘oo’ but to produce all the sounds in between such as ‘ah, ‘uh’ and ‘oh’ we need to have much finer control over the shape of our vocal tract.
This is where we really have the upper hand over our ape-y cousins. We have much finer control over our vocal tract. We can independently control the size and shape of the pharynx and the oral cavity which greatly increases the range of sounds we can make, even down to the difference between ‘aahhh’ (relaxation) and ‘awww’ (cuteness). Our ability here is aided by the great flexibility and accuracy of our tongue – there’s even research being done into using it to help paralysed people control electric wheelchairs. But what could have caused us to evolve these abilities in the first place?

Stand up, speak up.

A few possible explanations exist for why we evolved a larynx which descends over time, including breathing during heavy exercise, our upright posture or to sound as if we are bigger than we actually are. However, plenty of animals breathe just fine during exercise and even manage to pant at the same time to regulate heat, all without a low larynx. The upright posture hypothesis doesn’t really hold up either, as both gibbons and kangaroos are upright creatures with a high larynx. Pretending to be bigger than we are may be a  promising explanation. Birds, deer and lions all possess longer vocal tracts than you would expect for their size, and also settle disputes using vocal cues.
Given that we also see a secondary descent of the larynx in male humans as their voice breaks, it looks likely that a lower larynx may aid in communication of your place in the social hierarchy, or to put off predators by sounding bigger. These two roles may have been particularly useful early on in hominin evolution before we evolved complex speech or the ability to use weapons to protect ourselves. Either way, speech did evolve and it must have been incredibly important – the very low position of our larynx and trachea in our throat puts us in great danger of choking on our food if we breathe at just the wrong time.

Aping around – mimicry and speech

So the descent of the larynx may have played an early role in communication, whether between ourselves or to other animals, but this is a long way from the intricate vocal patterns and linguistic meanings which comprise true speech. The closest that we find in the animal kingdom is bird song, refined through a mixture of instinct and imitation, leading to development of recognisable songbird ‘dialects’. This imitation is taken to the extreme in species such as parrots and lyrebirds (here’s some classic Attenborough) –


Such mimicry is rare in the animal kingdom but has evolved separately in birds, seals and whales and dolphins. Dolphins use it to learn their mother’s song before leave their pod. When they come across other dolphins, they sing their mother’s song which can be recognised by siblings they have never known.
Vocal mimicry could have followed a similar path in our own ancestors, with dispersing females (New Scientist) needing to mimic the ‘dialect’ of a neighbouring population in order to be accepted into the social structure. With competition for mating, food and water running high dialects may have become ever more complex to keep outsiders out, which would have required a better ear to recognise and distinguish between different dialects. It would also have been ever more important to accurately mimic more complex dialects in order to be accepted. this may eventually have led to runaway selection for vocal complexity, understanding and mimicry, the three vital ingredients needed to assign meaning to sounds and, crucially, learn and remember that meaning for the future.

Speak now, or forever hold your peace…

But can we tell when these abilities evolved just from looking at the fossil record? The most bone most likely to tell us anything about the state of our larynx is the hyoid bone, which sits at the front of our throat below the tongue and is the only bone in the body which doesn’t touch another bone (fig. 3). Instead, it is held in place by muscles, tendons, ligaments and cartilage.

Figure 3. The position of the hyoid bone in the neck, above the larynx and below the tongue. Credit: The Worlds of David Darling.
Figure 3. The position of the hyoid bone in the neck, above the larynx and below the tongue. Credit: The Worlds of David Darling.

Sadly, the hyoid only contains indirect clues about the anatomy of our vocal tract. It is also a very fragile bone which does not preserve well in the fossil record. Its shape remains the same as our larynx descends so is not very useful for finding out how complex Neanderthal speech was, although it is widely agreed that they had spoken language. The main difference in hyoid shape between humans and apes is the presence of a hollow in apes to accommodate vocal sacs which were lost at some point along the hominin line. The size and shape of the bony canal for the hypoglossal nerve which controls muscle movements below the tongue is also uninformative as there is a large range of overlap between human and apes, while the diameter of the spinal cord is more likely to have allowed greater control of breath for endurance running and later been co-opted to allow fine control of breath for speech.

In conclusion, while we can explain how we are able to speak, questions relating to precisely why we speak and when we evolved this ability remains elusive.
If you’ve got any more questions, just grab me in the comments or on Tw*tter!

– James.

This blog post is based on Fitch, 2000.


Arcadian / Wikimedia Commons, 2007. Illu701 head neck. Available from: http://en.wikipedia.org/wiki/Pharynx#mediaviewer/File:Illu01_head_neck.jpg (Accessed: 01/09/14).

Fitch, W. (2000). The evolution of speech: a comparative review. Trends in Cognitive Sciences, 4, 258–267.

Great Ape Trust, 2013. Available from: greatapetrust.org (Accessed: 30/08/14)

The Worlds of David Darling, 1999-2014. Hyoid. Available from: www.daviddarling.info (Accessed: 01/09/14)


1 Comment

  1. I don’t know about these other chimps with their symbols, but it requires a tremendous effort in wishful thinking believing that Koko “speaks sign language” when you read the transcripts. She probably doesn’t fare much better than a dog that were trained to do something analogous. It has been likened to “facilitated communication” with children with severe autism, that is, what they supposedly communicate is virtually made up by humans or non-autists making an effort to make some sense of what really has none. (Except when it’s something like “food!”)

    That may in part be caused not only by their lack in some language/speech-related neuronal/genetic traits (FOXP2, which curiously, has evolved convergently in parrots), but also with fine-grained motor coordination. Chimps and gorillas are not only strong by sheer muscular advantage, but they also are lacking in nervous control to tone down their strength. So, even if they were more “exapted” for speech by other means than vocalizations, the extra effort in subtler sign-language movements would likely delay any possible development of a real ability.

    I think that even a spoken language developed for gorilla/chimp-friendly vocalization would possibly yield much better results than sign language. I guess researchers could try to study their natural communication and expand their “vocabulary”, but training infants.

    But the most interesting experiment, albeit also ethically questionable, would be to have a genetically engineered chimpanzee or gorilla with a human FOXP2. I guess that it’s likely they would come scarily close to talking, if not really, unequivocally talking, despite of how theoretically anatomically inapt they are. Their anatomical condition possibly approaches that of some human malformations that don’t totally prevent speech.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s