logo ELT Concourse teacher training
Concourse 2



If anything in the first part of this guide is unfamiliar to you, you should probably take a little time to refresh your memory concerning the essential concepts in phonology.  You can open that guide in a new tab by clicking here.

Two questions:

  1. Can you define 'consonant'?
  2. What are the consonant sounds of English?

Click here when you have an answer.

Only 7 of the 24 sounds need a special symbol to represent them.  It is quite a simple matter to learn how to read and write the phonemic script for consonant sounds.  The only ones which differ from the letters of the Latin alphabet are:

You will need to learn these seven easily to understand what follows.

There are some other things to note:

Now we can go on to /'klæ.sɪ.faɪ.ɪŋ 'kɒn.sə.nənts/.


Classifying consonants

There are three areas to consider when classifying consonant sounds:

  1. Voice
  2. Place of articulation
  3. Manner of articulation



Voicing describes how phonemes may be different depending on whether the vocal cords vibrate or not at the time of pronunciation.  It is sometimes referred to as sonorisation.
For example, the /k/ sound is made without voicing but the /ɡ/ sound is made with the mouth parts in the same place but with voice added.  If you put your hand on your throat and say the words sue and zoo, you will see what is meant and feel a slight vibration on the second word (/s/ is unvoiced but /z/ is voiced).
The same phenomenon is noticeable when saying log vs. lock (although the voicing of the /ɡ/ in the first is less obvious).

Sixteen of the consonant phonemes form voiced / unvoiced pairings:

Unvoiced Voiced Minimal pairs
/p/ /b/ pat vs. bat
/tʃ/ /dʒ/ chin vs. gin
/f/ /v/ fan vs. van
/s/ /z/ sip vs. zip
/k/ /ɡ/ cut vs. gut
/t/ /d/ tab vs. dab
/θ/ /ð/ loath vs. loathe
/ʃ/ /ʒ/ leash on vs. lesion

Voicing is not a digital, on-off phenomenon; it exists on a cline from fully voiced to fully unvoiced.  In some circumstances, the consonants normally considered voiced are only partially voiced and, more rarely and in very rapid speech, not voiced at all.
In initial and final positions, as in words like had, sob, dig, do, be and go, the consonants /d/, /b/ and /ɡ/ are only partially voiced but in the mid-position, as in words like ladder, rubber and bigger, voicing is more pronounced.
This variation in the level of voicing has led some to use two different terms for the phenomenon:

  1. fortis (meaning strong) which alludes to the fact that unvoiced consonants are, allegedly, pronounced with more energy.  The consonants /p/, /t/ and /k/ are described as fortis consonants.
  2. lenis (meaning weak) alludes to the opposite phenomenon of the consonants /b/, /d/ and /ɡ/ which are variable in the amount of voicing they take and often produced with little force.

What this implies is that phonemes are a way of digitalising the information.  Although a sound may, in fact, be very variably pronounced (shouted, whispered, mumbled etc.) and may be affected by its environment vis-à-vis other sounds, it will still be instantly recognisable by a native speaker of a language.  Phonemes are, in other words, sets of allophones, not simple sounds.

There are two other things to know about any consonant:

  1. Where is it pronounced?  This is called place of articulation
  2. How is it pronounced?  This is called manner of articulation


Place of articulation

To figure this out, we need to do a bit of physiology to get the terms right.  As you read the following guide, move your tongue around to identify the parts we are talking about.  Technically, the various parts you identify are called articulators.

  1. Start at the front of your mouth, where it meets the outside world and you have found your lips.  Sounds which require the use of your lips are called labial.  Sounds which require both lips are called bilabial.  An example is the /m/ sound in member.
  2. Behind your lips are your teeth and sounds produced here are, unsurprisingly, called dental.  An example is the th (/ð/) sound in that.
  3. Behind your top front teeth, there is a bony ridge called the alveolar ridge and sounds produced here are called alveolar.  An example is the /t/ sound in teeth.
  4. Behind that, the roof of the mouth has two sections:
    1. the hard palate (where palatal sounds are made) to the front.  An example is the sh (/ʃ/) sound in ship.
    2. the soft palate or vellum to the rear (where we make velar sounds).  An example is the /k/ sound in cake.
  5. Your tongue can reach no further but pause to note that the tongue has three areas: the tip, the front and the back.
  6. At the back of your mouth is a teardrop-shaped fleshy part called the uvula.  It is, unsurprisingly, where uvular sounds are made but there are no uvula consonants in standard varieties of English.
  7. Right at the back of the mouth is the glottis where we make glottal sounds.  The only true glottal in English is the /h/ in, for example, horse.  In rapid speech and some varieties of English, there is also the glottal stop (/ʔ/), however, that appears when a consonant is dropped as in, e.g., the Scots and Southern British pronunciation of better as be'er (/ˈbe.ʔə/ rather than /ˈbe.tə/.
  8. The nasal cavity which is connected to the mouth and involved in nasal sounds.  An example is the sound on ng (/ŋ/)in swing.

Here's a picture:

vocal tract

A copy of that diagram is available.  Download it here.

Now pronounce some consonants and see if you can identify which parts of the mouth are involved in making the sounds.  Can you put the following sounds in the table?
/s/ as in seem
/t/ as in tent
/f/ as in fine
/ɡ/ as in gone
/θ/ as in think
/l/ as in link
/ŋ/ as in sing
/w/ as in went
/h/ happy
/p/ as in pin
/ʃ/ as in shine
In the third column, put in your best guess at the adjective for the type of sound.
You can download a printable version of this and the next activity here.

Click on the table to get the right answer.

place of articulation table


Manner of articulation

There is, unfortunately, no universally recognised system to describe how sounds are produced.  However, English sounds are all produced pulmonically (i.e., by expelling air) and by restricting the airflow in some way.

Stops or plosives
These sounds are produced by completely blocking the air flow and then releasing the blockage.  For example, to produce a /p/ sound, we close both lips, let a little breath build up and then release it by opening the lips.  These sounds can't be made continuously.
There are four phases to their production:
  1. the articulators are closed (e.g., the lips are pressed together for /p/)
  2. the air behind the articulators is compressed
  3. the articulators are moved apart to allow the air to be released
  4. the air, once released, often makes an audible sound or aspiration.  That is the difference between the sound of the 't' and 'p' in top [/tʰɒp/] and pat [/pʰæt/].
English has seven plosive consonants: /p/, /b/, /t/, /d/, /k/, /ɡ/ and /ʔ/.  The last of these is called the glottal plosive and is often an alternative to /p/, /t/ and /k/.
  1. /p/ and /b/ are bilabial, formed by both lips, and the second is voiced.  For example:
        paper bill /ˈpeɪ.pə.bɪl/
  2. /t/ and /d/ are alveolar, formed by the tongue pressing against the alveolar ridge (not the teeth), and the second is voiced.  For example:
        train delay /treɪn.dɪ.ˈleɪ/
  3. /k/ and /ɡ/ are velar, formed by the back of the tongue pressing against the juncture of the hard and soft palate, and the second is voiced.  For example:
        cream gateau /kriːm.ˈɡæ.təʊ/
  4. The glottal plosive (/ʔ/) is also known as a glottal stop (because the airflow is entirely blocked) and is voiceless.  It has to be voiceless because it is formed by compressing the vocal tract entirely and holding the vocal folds rigid.  It occurs in many words, often replacing a plosive as in the London and Scots pronunciation of butter which may be transcribed as /ˈbʌʔ.ə/ with the /t/ plosive replaced by the glottal.
    It is not always a signal of non-standard speech patterns as, in rapid speech, the stop is commonly used.
To make these sounds, we close off the airflow (as we do for plosives) but allow the air to enter the oral cavity and flow out through the nasal cavity.
There are three nasal consonants in English:
  1. /m/ – as in
    map /mæp/
    ham /hæm/
    lamb /læm/
    milk /mɪlk/
    This consonant causes few problems for most learners.
  2. In these, the oral passage is blocked by pressing the tip of the tongue against the alveolar ridge (just behind the teeth) so the place of articulation is described as alveolar.
    /n/ – as in
    nut /nʌt/
    bun /bʌn/
    nil /nɪl/
    can /kæn/

    This consonant causes few problems for most learners.
  3. /ŋ/ as in:
    doing /ˈduːɪŋ/
    bringing /ˈbrɪŋɪŋ/
    sing /sɪŋ/
    ping /pɪŋ/
    In these, the oral passage is blocked by raising the tongue to contact the velum at the back of the throat, forcing the air through the nose so the place of articulation is described as velar.
    This consonant can cause difficulties because it is quite unusual.
    It never occurs initially in English.
    It occurs frequently in mid-position but is only pronounced as /ŋ/ when the morphology of the word allows it.  It is pronounced /ŋ/ in bringer [/brɪŋ.ə/] because the word is formed from bring + er but it is not pronounced that way in finger [/ˈfɪn.ɡə/] because the word is morphologically different, and not formed from fing + er.
    In other words, when it occurs at the end of a morpheme 'ng' is pronounced as /ŋ/ but in other circumstances, 'ng' is pronounced /nɡ/.
To make these sounds, the air flow is not completely cut but is restricted with air flowing continuously and turbulently between two mouth parts.  What you hear is the result of friction, hence the name.  The term sibilant is used to refer to the sounds such as /s/ and /z/ which are produced by allowing the air to flow across the tip of the tongue between it and the alveolar ridge.
To demonstrate to yourself, make a /t/ sound by completely blocking and then releasing the air and then make the /s/ sound by allowing air to seep out between articulators.
The nine fricatives in English are:
  1. labiodental fricatives
    /f/ and /v/ formed by the lips and top teeth.  The second is voiced.  For example:
        fine /faɪn/
  2. dental fricatives
    /θ/ and /ð/ formed by the tongue touching the teeth.  The second is voiced.  For example:
        breath /breθ/
        breathe /briːð/
  3. alveolar fricatives
    /s/ and /z/ formed as sibilants with the air compressed between between the tongue and the alveolar ridge.  The second is voiced.  For example
    bus /bʌs/
    buzz /bʌz/
  4. palatal or post-alveolar fricatives
    /ʃ/ and /ʒ/ formed by the tongue compressing the air slightly further back on the palate or just behind the alveolar ridge.  The second is voiced.  For example:
        mesh /meʃ/
        measure /ˈme.ʒə/
  5. glottal fricative
    /h/ formed by air compressed in the glottis at the back of the throat.  For example:
        household /ˈhaʊs.həʊld/
        hope /həʊp/
    It is not voiced in English but a voiced equivalent exists in some languages, including Basque, Chinese, Czech, Finish, Korean, Polish, Portuguese, Romanian, Slovak and Slovene.  The sound is usually transcribed as [ɦ] and occurs, incidentally, in some South African speakers' production.  Speakers of those languages may be tempted to insert it into English and, although this rarely causes comprehension issues, it contributes to a foreign accent.
    In English, the airflow is only very minimally disrupted when forming this sound which leads some to assert that it isn't really a consonant at all.
  6. velar and uvular fricatives:
    The fricatives /x/ and /ɣ/ are not usually included in a list of standard English consonants but appear on words of Scots and Welsh origin and in many German words.  It also occurs in South African varieties where the words are borrowed from Afrikaans or Xhosa.
    The /x/ is voiceless and occurs in the Scots word:
        loch /lɒx/
    English speakers will often substitute /k/.
    Other languages (such as Dutch) have a voiced version for the /x/ which is transcribed as /ɣ/ and is often spellt as 'g', for example, in:
        's-Hertogenbosch /ˌsɛrtoʊɣə(m)ˈbɔs/
    There are no uvula consonants in standard varieties of English.
These are formed as a combination of a plosive and a fricative.  First there is closure of the airflow but release is allowed in a restricted way, extending the sounds.  There are two affricative sounds in English /dʒ/ and /tʃ/ and both are described as palatal or post-alveolar, being formed with the tongue obstructing the air flow further back than the alveolar ridge (where /t/ and /d/ are formed).  The first of these is unvoiced and the second voiced:
  1. /tʃ/ in, for example
    chop /tʃɒp/
  2. /dʒ/ in, for example
    bridge /brɪdʒ/
These sounds are all voiced and are produced by small obstructions of the airflow.  They are formed by bringing certain mouth parts quite close together without letting them touch, hence the name.
There are four of these in English and the first two are often referred to as glides or semi-vowels while the second two are referred to as liquid sounds:
  1. velar /w/ with the back of the tongue slightly raised towards the velum as in
        would wait /wʊd.weɪt/
  2. palatal /j/ with the tongue raised towards (but not very close to) the palate as in
        yellow yacht /ˈje.ləʊ.jɒt/
  3. /l/ which is sometimes placed in a class of its own as the only lateral in English.  In this, the sound is formed by using the tongue to stop air moving directly forward and out and forcing it to run along the side of the tongue.  For example
        lullaby /ˈlʌ.lə.baɪ/
  4. /r/ which is the only rhotic sound in English formed with a palatal airflow rather than a lateral flow of air.  For example
        real rarity /rɪəl.ˈreə.rɪ.ti/
Two more distinctions
One, rather simple way to divide consonant sounds is to refer to two overarching categories:
  1. Obstruents
    are sounds made by obstructing the airflow completely or partially and include
    • stops and plosives (such as /b/ and /p/)
    • fricatives (such as /f/, /v/, /ʃ/ and /ʒ/)
    • affricatives such as /tʃ/
  2. Sonorants
    are sounds made with continuous, non-turbulent airflow (and include all vowels by some definitions) and include
    • nasals such as /m/ and /n/
    • lateral (/l/) (a liquid sound)
    • rhotic (/r/ (another liquid sound)
    • glides (/w/ and /j/)

There are some other ways to make sounds and languages are quite inventive.  These include trills (the Spanish rolled /r/) in which the tongue vibrates and flaps (for example, the 'dd' sound in madder in US English) when the airflow is momentarily interrupted.  Some African languages make extensive use of click sounds which occur in English in expressions such as tsk tsk and also when people try to imitate the sound of horses' hooves (clip clop).  Transcription varies because there are at least five ways to make the sounds.

Retroflex sounds

Retroflex sounds are formed in many languages with the tongue concave and/or curled back on itself to block the air flow, like this:
(Image adapted from Wikipedia)

For example:
    Russian and Polish have a retroflex /z/, transcribed as [ʐ].
    Hindi and other Indian languages have a retroflex /t/ transcribed as [ʈ].
    Swedish has both a retroflex /ŋ/ transcribed as [ɳ] and a retroflex /d/ transcribed as [ɖ].
    Chinese languages have a retroflex /s/ transcribed as [ʂ].
If speakers of these languages import the retroflex sounds into English it contributes greatly to a foreign accent.
It is usually helpful to make learners aware of the differences.


Markedness and phonemic substitution

Markedness in this sense refers to how widely consonants are represented in the world's languages.  That, it is sometimes averred, is a measure of how hard they are to acquire.  The common sounds will give few problems but consonants which are not represented in the learners' first language(s) will, understandably, cause significant problems.

There is evidence to suggest that the unvoiced consonant sounds, especially, /t/, /s/, /p/, and /k/ are common to nearly all languages and are, therefore, considered unmarked.  They should cause few learners any trouble at all except in terms of their allophonic varieties (with and without aspiration, retroflex or not).
The consonant /n/ is also an unmarked form which appears in many languages.
On the other hand, the equivalent voiced sounds, /d/, /z/, /b/ and /ɡ/ are marked in that they do not universally occur with anything like the same frequency so they require more attention as does the nasalised /ŋ/ which is also less common and causes some learners a good deal of difficulty.

Where a sound may occur also plays a role.  Final voiced consonants are rare in many languages, including German and Dutch, for example and this may tempt learners of those backgrounds to pronounce dog as dock, cab as cap, cadge as catch and so on.

There is a guide on this site to teaching troublesome sounds (new tab) which considers many of the more marked, i.e., less common, vowel sounds.


Allophones, reductions and regional variations

individuals vary
No two speakers pronounce all consonants in exactly the same way.  Individual speakers will also pronounce some consonants slightly differently depending on how they feel, how carefully they wish to speak and how quickly.  So, for example, we might pronounce take in:
    I want to take it home
as /teɪk/
with no aspiration on the /t/ sound and on another occasion, we might pronounce the word in
    Take it!
as /theɪk/
with the aspiration on the /t/ prominent.
/t/, /p/ and /k/ are all variously aspirated depending on the phonological environment in which they occur and the speaker's attitude.
Voicing, too, is variable with some individuals using more (because there is a cline from unvoiced to voiced, not an either-or distinction).  So, for example, some speakers may pronounce pub as /pʌb/ with a clearly voiced final consonant but others may reduce the amount of voicing until word approximates to /pʌp/.  Some may even remove the final consonant and substitute a glottal stop as in /pʌʔ/.  No-one will mistake the word, however it is pronounced, so we are dealing with allophonic variation.
the positions of consonants vary
Where a consonant occurs in a word may also affect how it is pronounced.  For example:
  1. /b/, /d/, /dʒ/ and /ɡ/ which are all voiced in most transcriptions may become wholly or partially de-voiced when they fall at the end of a word or phrase so, for example
        It's my job
    may be transcribed as
        /ɪts.maɪ.dʒɒp/ or /ɪts.maɪ.dʒɒb/
        I'll be the judge
    may be transcribed as
        /aɪl.bi.ðə.dʒʌtʃ/ or /aɪl.bi.ðə.dʒʌdʒ/
    The /ɡ/ sound is clearly voiced in, e.g.
    but much less so in
    so the first may be transcribed as
    and the second is nearer to (but not identical with)
    and so on.
  2. The /l/ sound also exhibits variations in what is called velarization (the amount it is pronounced by partial closure of the velum at the back of the mouth).  So for, e.g.:
        Let me go
    the transcription would be
    but the transcription of
        Let me fall
    will be:
    with a velarized final consonant (the so-called dark [ɫ]).
    In standard BrE, the sound is light (/l/) before a vowel and dark elsewhere but that disguises changes in connected speech because the sound will be light in pull it (/pʊl.ɪt/) but dark in pull that (/pʊɫ.ðæt/).  The transcription may safely be left as /l/ in all cases because it is simpler and we have a rule for the pronunciation of the allophones.
    Most native speakers of English are unaware of the two pronunciations of /l/ because they make no phonemic difference.  A Turkish speaker, in whose language the sounds are phonemes, will be much more aware of the distinction having been trained since childhood to recognise it.
  3. The /t/ sound often becomes glottalized when it occurs finally.  In other words, it is replaced by the stop /ʔ/.  So the transcription of
        I got it
    is either
    or, even
  4. The amount of aspiration is also dependent on the position of the consonant vis-à-vis other sounds.  We saw above that this aspiration affect /t/, /k/ and /p/ in particular.  When these sounds are the first in a word or the first in a stressed syllable, they are aspirated so the sounds followed by the elevated /h/ in the following will be aspirated:
        peter /ˈpʰiː.tə/
        tap /tʰæp/
        kill /kʰɪl/
    but will remain unaspirated in these:
        couple /ˈkʌp.l̩/
        hate /heɪt/
        sicken /ˈsɪkən/
    Because the sounds are not full phonemes in English, most speakers are unaware of the differences in pronunciation and may be surprised to discover it but to speakers of languages (such as Mandarin) where aspiration is a phonemic characteristic, the change in pronunciation will be very obvious because they have been brought up to recognise it.
    (In fact the phoneme /t/ has six possible pronunciations in English:
    At the end of a hat it is called an unreleased /t/ and transcribed phonetically as [t̚].
    At the beginning of task it is aspirated [tʰ].
    It may be glottalised in, e.g., butter and got [ʔ].
    It may be flapped as in the AmE later [ɾ].
    It may be nasalised and flapped as in the AmE counter [ɾ̃] (because it is following a nasalised consonant /n/).
    It may just be a plain [t] sound as in stitch.)
reductions and elisions of consonants and clusters
When consonants occur in clusters such as at the end of a word like clothes (/kləʊðz/) there is a tendency in English to elide one of the consonants so the pronunciation is often as /kləʊz/ with the elision of the /ð/.  (If learners always say it that way, they will never be misunderstood and it's a good deal easier for them.)
Some clusters such as the one at the end of sixths, are simply difficult to pronounce.  The result is usually something like /sɪkθs/ or even /sɪkfs/.  Learners whose languages do not allow the same clusters as English are often tempted to use cluster reduction inappropriately, for example, pronouncing crisps as /krɪps/ rather than /krɪsps/.
It is usually /t/, /d/, /p/ and /k/ which are elided in this respect, so, for example:
    text message becomes /teks.ˈme.sɪdʒ/
    midst becomes /mɪst/
    glimpse becomes /ɡlɪms/
    and asked can be pronounced /ˈɑːst/.
The same phenomenon is observable with the unvoiced /θ/ sound so asthma is pronounced as /ˈæ.smə/.
Occasionally, elision can become fixed in the language so, for example, the confection now known as ice cream was originally iced cream but the /t/ sound of the letter 'd' was routinely elided and the phrase took on its current spelling.
accents vary
Where people come from may also have a significant effect.  In some parts of Britain, for example, a final letter 'r' will be pronounced quite obviously so, e.g.
    My father is
will be pronounced as
by lots of people because the /r/ precedes the vowel, but many people will pronounce it as
without the /r/ sound  However, even those who do pronounce the /r/ would not pronounce
    He is my father
because there is no following vowel.
In Standard AmE, the /r/ is usually produced so the transcription is
with a syllabic /r/ as the final consonant and no preceding schwa.
Alternatively, the transcription appends a tiny /r/ to the vowel so we have, e.g., nurse transcribed not as /nɜːs/ but as /nɝːs/.
Another significant difference between Standard American and British is the pronunciation of the letter 't' when it occurs in the middle of words so, for example, we find:
Word British American
butter /ˈbʌt.̩ə/ /ˈbʌd.r̩/
Peter /ˈpiː.tə/ /ˈpiː.dər/
There are a few other significant (and some not very significant) variations in how consonants are pronounced between BrE and AmE.  For a list of the differences, see either the guide to teaching yourself to transcribe or download the PDF document for this area.  (Both those links open in new tabs.)
A regional difference in parts of Britain is that the central /t/ sound may be replaced by a glottal stop (/ˈbʌʔ.ə/ and /ˈpiː.ʔə/, respectively).
/hw/ vs. /w/
Now almost extinct except in some varieties of English spoken in Scotland, parts of Ireland and the southern United States, is a variant of /w/ usually transcribed as /hw/ (or you may see it as [ʍ]).  It appears at the beginning of words spelled wh- but has for almost all speakers of English now merged with /w/.  The result is that apart from a small minority of speakers, there is no distinction in pronunciation between weather and whether, wine and whine etc.  The merger is generally called the whine-wine merger.)


A summary and test

Now we have all three ways to classify the consonants and can describe them properly.  These three ways are:

  1. Voicing
  2. Place of articulation
  3. Manner of articulation

Can you complete this chart?  If you have your downloaded and printed activity sheet to hand, do it there.  If you would like to download that now, click here.  When you have filled in all the consonant sounds, click on the chart to reveal the answer.


The voiced consonants are in bold.
Notice, too, that /t/ and /d/ are alveolar stops in English, not dental sounds as they are in a range of other languages.  Making them dental sounds contributes to a foreign accent in English.

If you would like to hear these sounds, the ideal place to go has been kindly provided by the British Council.

Of course there's a test (two, to be honest on what has been covered up to now).


Consonant clusters and phonotactic rules

English allows a range of consonants to occur together.  In this guide, we will call them clusters although you may hear talk of consonant sequences, consonants compounds and consonant blends.
Clusters can occur initially (as in spray [/spreɪ/]), medially (as in hopscotch [/ˈhɒp.skɒtʃ/]) or finally (as in cups [/kʌps/]) but there are restrictions concerning which clusters can occur where.  The rules are referred to as phonotactic, signalling that they concern the contact points of consonants.
The clusters which are allowed in the initial position of a syllable (not necessarily a word) in English can be listed:

Cluster Example Cluster Example Cluster Example Cluster Example Cluster Example
/s/ + /p/ speak /sp/ + /r/ spray /b/ + /l/ blow /f/ + /r/ frog /k/ + /j/ cute
/s/ + /t/ stop /st/ + /r/ street /ɡ/ + /l/ glow /θ/ + /r/ throw /b/ + /j/ beauty
/s/ + /k/ scope /sk/ + /l/ sclerosis /f/ + /l/ flow /ʃ/ + /r/ shrink /d/ + /j/ duty
/s/ + /f/ sphere /sk/ + /r/ screech /s/ + /l/ slow /t/ + /w/ twin /f/ + /j/ future
/s/ + /m/ smile /sk/ + /w/ squeal /p/ + /r/ pray /k/ + /w/ quick /h/ + /j/ huge
/s/ + /n/ snip /sk/ + /j/ skew /t/ + /r/ tray /d/ + /w/ dwell /v/ + /j/ view
/s/ + /l/ slip /st/ + /j/ stew /k/ + /r/ cry /θ/ + /w/ thwack /m/ + /j/ mew
/s/ + /w/ swim /sp/ + /j/ spurious /b/ + /r/ brow /s/ + /w/ swell /n/ + /j/ new
/s/ + /j/ suit /p/ + /l/ play /d/ + /r/ drag /p/ + /j/ pew /l/ + /j/ lewd
/sp/ + /l/ splay /k/ + /l/ clay /ɡ/ + /r/ grow /t/ + /j/ tube    

Three consonants is the maximum that is allowable in English in the initial position.
Some of the above (e.g., /sk/ + /l/, /θ/ + /w/, /sp/ + /j/ and /p/ + /j/) are very rare and some, such as /l/ + /j/ only occur in the dialects of some English speakers.
Others, such as /s/ + /f/ occur only in words derived from other languages (Greek in this case).

Equally, we can identify clusters which are permitted in the final position and see that there are phonotactic rules for final consonants in English.
Here's another list:

  1. In forming plurals and verb inflexions such as past tenses and other structures, English has the final consonant followed by /s/ (as in lots [/lɒts/, /z/ (as in lads [/lædz/]), /t/ (as in sacked [/sækt/]) or /θ/ (as in seventh [/ˈsevn̩θ/]).  In these cases, the /s/, /z/, /t/ and /θ/ are the only four allowable post-final consonants.
  2. There are five pre-final consonants appearing in clusters.
    /m/, /n/, /ŋ/ /l/ and /s/ are the only ones which can precede the final consonant.  For example:
    lumps, banks, ringed, belt, last (/lʌmps/, /bæŋks/, /rɪŋd/, /belt/, /lɑːst/)
  3. A few words in English end in clusters of four consonants and these cause many learners real trouble.  Examples are glimpsed (/ɡlɪmpst/) and texts (/teksts/).
  4. In BrE, the 'r' in words like marks (/mɑːks/), carts (/kɑːts/) and lords (/lɔːdz/) is not sounded so these are, in fact, two-, not three-consonant clusters.
  5. Only one word and a few derivatives of it, in English ends in /mt/: dreamt (/dremt/).
  6. No syllables can end with more than four consonants (and more than three is vanishingly rare).  We can allow sevenths (/ˈsevnθs/) with four final consonants in a cluster but that is the limit.

When we consider the medial position, life is slightly more complicated because some will only allow a cluster to appear in a single syllable so, for example, mixture will be said to contain only /ks/ and /tj/ but others will allow it to contain /kstj/ as a cluster.  The first analysis is more consistent with the phonotactic rules of English.

It is clear from the above that certain combinations of consonants are not allowed in English at all.  Here's a short list:

  1. /sb/, /sd/, /sɡ/, /sθ/, /ss/, /sʃ/, /sh/, /sv/, /sð/ /sz/, /sʒ/ and /sŋ/ cannot occur initially as a cluster in an English word.
  2. This is the situation before /l/ in the initial position:
    Allowed Forbidden
  3. This is the situation before /r/ in the initial position:
    Allowed Forbidden
  4. This is the situation before /w/ in the initial position:
    Allowed Forbidden
  5. This is the situation before /j/ in the initial position:
    Allowed Forbidden

There is no obvious reason for this and it is not to do with certain clusters being unpronounceable.  English speakers, for example, have little or no difficulty pronouncing Gwen but /ɡ/ + /w/ is not allowed in English words.  Equally, there is no obvious reason why English forbids an initial cluster of /ðr/ instead of /θr/ but it does.

This matters because English is at the forgiving end of the spectrum in allowing a wide range of clusters to occur (although not all of the possibilities, as we have seen).  Other languages do things differently and here's a short list of the commonest problems caused by clusters:

  1. Standard Arabic forbids initial consonant clusters altogether and never allows more than two consecutive consonants anywhere.
  2. Japanese allows a very limited range of clusters and forbids any unvoiced consonant following a nasal so /nd/, is allowed but /nt/ is forbidden and /mb/ is allowable but not /mp/.
  3. Spanish allows no cluster beginning with /s/ in initial position so speakers may insert an intrusive /ə/ or /e/ sound before the cluster in English producing, e.g., eschool for school (/eskuːl/ or /əskuːl/ not /skuːl/).
  4. French allows /vr/ as an initial cluster and French speakers may carry this over into English words beginning with /v/.
  5. In Chinese the clusters /kl/, /st/ and /rs/ are forbidden and speakers may insert a /ə/ between the consonants.
    Additionally, and the language has this in common with, e.g., Thai, there are no final consonants barring /ŋ/ in most dialects.  The result is often that speakers of these languages will simply fail to produce final consonants at all.
    Final consonant clusters, which may, in English, be made up of up to four consonants are even more problematic.
  6. In Italian the consonant clusters of pl/ or /kl/ are not allowed.
  7. In German more initial clusters are allowed (/ʃl/ is very common) and /pf/ occurs both initially and in other positions but is not allowed at all in English.
  8. Greek allows no fewer than 32 two-consonant clusters at the beginnings of words which are forbidden in English.
  9. In Russian and other Slavic languages, many initial clusters are permitted which in English are forbidden.  These include /pt/, /bd/, /tk/, /kt/ a /gd/, for example.

Phonotactic rules are not easily discernible to learners of the language so the temptation is often to use native clusters, so French speakers and Russian speakers may insert forbidden clusters.
Speakers of languages which have no or a very limited range of clusters may break up clusters which are unfamiliar and produce, e.g., screw as sekeru (/skruː/ pronounced as /sekəruː/) and that is evident in the production of speakers of Japanese, Chinese languages and Arabic.

To help a little, we need to recall (or become suddenly aware of the fact) that native speakers routinely simplify final consonant clusters, especially in rapid speech so it is unnecessary to trouble learners with the full pronunciation of words like products or camped because the /t/ and the /p/ are not usually sounded by native speakers (so we have /ˈprɒ.dʌks/ not /ˈprɒ.dʌkts/ and /kæmt/ not /ˈkæmpt/).
The middle consonant in clusters such as /kts/, /mps/, /mpt/, /nts/, /ndz/ and /skt/ is usually left out or sounded very weakly.  Examples are:
    impacts which can be pronounced as /ɪm.ˈpækts/ or /ɪm.ˈpæks/
    dumps which can be pronounced as /dʌmps/ or /dʌms/
    dumped which can be pronounced as /dʌmpt/ or /dʌmt/
    pints which can be pronounced as /paɪnts/ or /paɪns/
    funds which can be pronounced as /fʌndz/ or /fʌnz/
    tasked which can be pronounced as /tɑːskt/ or /tɑːskt/
That is helpful for teaching purposes, especially for learners whose first languages do not allow or allow a more limited range of final consonant clusters.
The troublesome /ð/ in clothes is also often ignored by native speakers and learners can take the same route (say /kləʊz/, not /kləʊðz/.  Nobody will misunderstand and few would notice.).

If you yearn for more help in this area, try the guide to syllables and phonotactics accessible from the pronunciation index linked below.


Spelling consonant sounds

What follows is a guide to how the consonant sounds of English are realised in its orthography.  If you have followed the general guide to spelling in English, you will be aware that English is often described, sometimes despairingly, as a wholly inconsistently spelled language with no discernible connections between sound and spelling.  You will also be aware that that is only very partially true.
In the case of consonant sounds, there are clear consistencies and these are teachable.

The following takes each consonant in turn and suggests the commonest way that the sounds are realised in the morphology as well as noting some rarities, often loan words from other languages, which have to be learned individually.
A silent final 'e' has been ignored in this list and the ordering is as for the list of consonants in the table above.

Sound Common spellings Rarities and varieties Sound Common spellings Rarities and varieties
/p/ p or pp:
/z/ z, zz, or s:
(also tsarina)
/d/ d, dd or ed:
AmE: tt:
*/h/ h or wh:
/tʃ/ ch, tch or t:

tch is never initial
/ŋ/ ng, n or ngue:
/v/ v, vv or f:
/j/ y or i:
/s/ s, ss or c:
/t/ t, tt, bt, ght or ed:
/ʒ/ g, j or s:
/ɡ/ g, gue or gh:
/n/ n, nn or kn:
kn is only initial
nn is only medial
/f/ f, ff, gh or ph:
lieutenant (BrE)
/r/ r, rr or wr:
/ð/ th:
/b/ b or bb:
/ʃ/ sh, s, ss, c, ce, ch or ti:
*/k/ c, k, kk or cc:
ck is never initial
/m/ m, mm or mb:
/dʒ/ g, j, dg or dj:
*/l/ l or ll:
/θ/ th:
/w/ w, wh or u:
* The /h/, /k/ or /l/ sounds are often the ways in which the /x/ sound in loch, chutzpah, llyn and other loan words are rendered in Standard English.  Many speakers of Standard English do, however, make the effort to produce /x/ in these cases.

This is the index of other guides in the in-service pronunciation section.
the overview of pronunciation connected speech consonants
intonation minimal pairs (PDF) minimal pairs transcription test
sentence stress syllables and phonotactics teach yourself transcription
teaching pronunciation IP teaching troublesome sounds verb and noun inflexions IP
vowels word stress identifying word-stress IP
Guides marked IP are in the initial plus section.