logo  ELT Concourse teacher training
Concourse 2



Why does one writer get a whole guide to himself?

Quite simply because he is so influential.  His ideas penetrate nearly all areas of the study of language.  Henk Van Riemsdijk, himself an influential figure and Professor emeritus of Linguistics at Tilburg University, The Netherlands, writes:

I had already decided I wanted to be a linguist when I discovered this book. But it is unlikely that I would have stayed in the field without it. It has been the single most inspiring book on linguistics in my whole career.
In Chomsky (2002)

Frank Palmer (1971:135) has said of the book that it was

the book that first introduced to the world the most influential of all modern linguistic theories, 'transformational-generative grammar'.

The book in question was: Syntactic Structures, first published in 1957.

Word of warning: what follows is an overview of the classic version of Chomsky's theories concerning Transformational-Generative Grammar (TGG).  As such, it can only do some injury, minor, it is hoped, to the complexities and subtleties of the approach.  For more, consult the literature.  There are some references at the end but there is a wealth of literature in the field.
In addition, there is Chomsky's own website at www.chomsky.info and many more or less reliable websites covering the area.  The briefest of overviews concerning some theories of Chomsky in cartoon format can be viewed at https://m.youtube.com/watch?v=7Cgpfw4z8cw.


Part 1: Transformational-Generative Grammar (TGG)

Background: Structural Linguistics

Attached to the name of Bloomfield and his followers, Structural Linguistics is/was the attempt to apply a truly scientific approach to grammar.  Although all linguists are concerned to study structure, or patterns and regularities, in language, of course, Bloomfield's central concern was with a mechanistic and purely empirical approach to language study, looking for structure, not meaning.
Structural linguistics defined the essential building blocks of language as phonemes and morphemes, the latter consisting of combinations of the former, and the approach to analysing grammar was to divide the language up into its 'Immediate Constituents'.  Hence, we have IC analysis.
The details of IC analysis need not concern us here but the result was to analyse language according to a branching tree diagram.  IC analysis can be quite illuminating.  Here's an example of what is meant.  If we analyse the sentence
    The large dog with huge teeth chased the man from the room
into its immediate constituents we can do so like this:

tree diagram

This kind of analysis can be fruitful because it is simple to see that we can substitute the sequences of morphemes in the bottom row with others and, while maintaining the same analysis, apply it to any number of sentences.  We can analyse, e.g., a sentence such as
    The old man with grey hair lost his keys in the street
which follows the same pattern.

problem 1: embedding

This all works very well when sentences behave themselves and come along quietly with all the constituents in order.  However, the first problem strikes when we consider sentences such as:

  1. She messed the whole thing up.
  2. Is John going home?

The problem is that while it's easy to see that the verb in sentence 1. is
    mess + -ed + up
and in sentence 2. it is
    primary auxiliary + stem + -ing
but no simple tree diagram will allow this separation of verbs by objects (sentence 1.) or verbs by subjects (sentence 2.).

It has been suggested that one could represent these sorts of embedded ideas this way:


or we could have crossing lines like this:

she messed

but that defeats the purpose because IC analysis depends on the possibility of substituting sequences with other sequences and discontinuous phenomena like these are not sequences at all.

problem 2: ambiguity

The problem here is to know how to divide the constituents up and what class of words to assign.  Try analysing Eating apples can be pleasing this way and you'll see what is meant.  What's the difference between the following?

apples 1  apples 2 

Click here when you have seen the difference.


Transformational-generative grammar to the rescue

Clearly, Chomsky's ideas have two parts so we'll take them individually.

transformational rules

Instead of relying on tree diagrams to represent how sentences were constructed from individual morphemes, Chomsky was concerned to discover the patterns by which one sentence could be transformed into another.  The classic example is the passive in English.
How is
    Mary allowed Peter to go home
transformed into
    Peter was allowed to go home, by Mary
Here's how (Chomsky, 2002:43):

If S1 is a grammatical sentence of the form
            NP1 — Aux — V— NP2,
then the corresponding string of the form
            NP2 — Aux + be + en — V — by + NP1
is also a grammatical sentence.

To explain:
S1 is the first or kernel sentence.
NP1 and NP2 are the noun phrases (Mary and Peter in our examples).
V is the verb (allow in our examples)
Aux stands for the tense marker, not necessarily an auxiliary verb, (the past of allow and be in our examples)
en is the marker for past participle.
So the rule for transforming the active, kernel sentence into the passive is:

  1. put NP2 first
  2. add the tense of the verb be
  3. add the past participle of the main verb
  4. add by
  5. insert NP1

Easy, and the last two steps are optional, of course.


So what?

So quite a lot.
Instead of the cumbersome tree diagrams for every sentence, we now have a set of rules to work from to transform kernel sentences (i.e., the original from which we make the transformations) into others and we have a way of dealing with both the embedding problem and the ambiguity problem we met above.



For example, if we take a sentence such as
    The man who was in the bar spoke to me.
the structural linguistics analysis would either have lines crossing each other or result in the kind of block diagram we saw above, like this:

the man who was in the bar

However, we can look for the kernel sentences which make it up and then work out the transformation rules.  Like this:

  1. The kernel sentences are:
    1. The man spoke to me.
    2. The man was in the bar.
  2. The transformation rules are:
    1. Place the second sentence after the first NP in the first sentence.
      That gives us:
          The man the man was in the bar spoke to me
    2. Replace the second NP with who.
      That then gives us:
          The man The man who was in the bar spoke to me

We can do the same thing with other forms of relative pronoun sentences, transforming, for example:

  1. I cut down the tree
  2. The tree was hanging over the gate

by using similar rules to generate:

  1. I cut down the tree the tree was hanging over the gate
    which involves placing the second sentence after the noun phrase in the first sentence
  2. I cut down the tree the tree which / that was hanging over the gate
    which involves replacing the second noun phrase with which or that

Other relative pronoun sentences can be generated slightly differently so from:

  1. I cut down the tree
  2. The tree was hanging over the gate

we apply other rules:

  1. Move the first sentence to the front of the second and produce:
    I cut down the tree the tree was hanging over the gate
  2. Delete the first noun phrase to produce:
    I cut down the tree the tree was hanging over the gate
  3. Delete the second verb phrase and produce
    I cut down the tree was hanging over the gate

This is called the relative transformation, by the way.



The often-cited example of an ambiguous statement which transformational-generative grammar can deal with is:
    The shooting of the hunters was terrible
The ambiguity, of course, is that we don't know whether the hunters were bad shots or whether they were shot.
Here's how TGG can unravel the problem:

  1. The key noun phrase is the shooting of the hunters
  2. This noun phrase can come from two possible kernel sentences:
    1. The hunters shot (something)
    2. The hunters were shot
  3. Once we know which of the kernel sentences produced the noun phrase we have disambiguated the sentences.

This is an example of the workings of deep structure.  I.e., the structure of the sentence which derives from the nature of the kernel sentences and is below the surface structure traditionally analysed in structural linguistics.  There are two possible deep structures here identifiable from the two possible kernel sentences.

As another example, we can take our earlier sentence
    Eating apples can be pleasing.
Can you apply the transformational approach to disambiguating the two meanings?  Click here when you have.



The idea of a grammar being generative is the second string of the theory.

Simply put, this means that the grammar must be capable of generating 'all and only' the grammatical sentences of the language.  This does not mean that it must do so, only that it must be capable of doing so.  For example, a generative grammar must be able to produce She likes apples but not Like apples she and so on.

There is a fundamental difference of approach here from that taken by structural linguists.

  1. Structural linguists were concerned to identify the sentences of the language and then analyse them.  In other words, they collected a corpus of data and then analysed the data working from the phonemes making up each morpheme upwards to make the tree diagram we are familiar with.
  2. A generative approach is not concerned with what has been observed but with what is possible.  A corpus of data (even a huge, modern computer-based one), however large, cannot contain all the sentences of a language and will inevitably miss some out.  This is because any language contains an infinite number of possible sentences.
    For example, I can say,
        The man who was in the room was reading
    I can then add another relative clause to make
        The man who was in the room which was on the second floor was reading
    and I can continue to add clauses to make, e.g.,
        The man who was in the room which was on the second floor which was part of the building which was in the High street was reading
    and so on ad infinitum.  There is theoretically, no limit although the sentence will, of course, become unmanageable and harder and harder to understand as it grows.  That's not the point; the point is that it is theoretically possible never to come to the end of the sentence.

In Syntactic Structures, Chomsky demonstrates the generative nature of this sort of grammar by considering the form of the negative in English.  This is not the place to repeat all the steps but the conclusion is (op cit:62):

The rules (37) and (40) now enable us to derive all and only the grammatical forms of sentence negation

As you may imagine, rules 37 and 40 are somewhat complex but that they can be used to produce all and only the grammatical forms is not in dispute.


competence and performance

The key here is that TGG is not concerned with analysing that which is actually said but with establishing the rules concerning what can be said.  For many, involved as they are in analysing what people actually say and write, this is a central weakness of Chomsky's position.

According to the theory, then, what happens is that speakers of a language work to an internalized set of rules from which they generate grammatically accurate language.  The proper concern of grammar studies, then, is to find out what these rules are and that cannot be done solely by looking at what is said but by considering what can be said.

There are, however, obvious problems with this simple distinction:


re-write rules

Re-write rules look similar to the structuralist tree diagrams we have seen above but they are different insofar as they are intended to generate grammatical sentences rather than simply analyse them.  Here's an example to generate (not simply analyse) the sentence
    The woman ate a pear.

Breaking this down, we can get:

1 S NP + VP i.e., the Sentence should be re-written as Noun Phrase + Verb Phrase
2 VP V + NP i.e., a Verb Phrase consists of a Verb + a Noun Phrase
3 NP Det + N i.e., a Noun Phrase consists of a Determiner + a Noun
4 V ate i.e. the verb in this case is the past form of eat
5 Det the, a i.e., there are two distinct determiners (both articles)
6 N woman, pear i.e., there are two nouns

We can, of course, represent this sort of phrase structure analysis in the same kind of branching tree diagram we saw above.  So we get:

the woman ate the pear

What is exemplified here, by the way, are PS-rules (phrase structure rules).

There are two things to note:

  1. Rules like these will generate a number of different sentences, e.g.,
        The woman ate the pear
        A woman ate the pear
        A woman ate a pear.

    The rules will also generate unacceptable sentences, however, such as
        The pear ate the woman
        A pear ate a woman
        The pear ate a woman.
    It will also generate the truly ungrammatical:
        A man stood the forest
        A person arrived a hotel
  2. We can combine re-write rules with transformational rules and that, as we saw above, can transform
        The woman ate a pear
        A pear was eaten by the woman
    by the removal of the second noun phrase to the front of the sentence and making changes to the verb as above.
    In other words:
    active   passive
    NP1 — Aux — V — NP2 NP2 — Aux + be + en—V— by + NP1
    The woman tense of verb (ate) the pear transforms into The pear tense of be (was) participle form (eaten) by the woman

However, getting around the problem of generating statements such as
     A pear ate a woman
    A person arrived a hotel
is not an easy matter.
This is done by stating upfront what kind of main verb is permitted with what kind of noun.  We forbid a certain class of noun (i.e., here, ones as subjects which are not animate) and certain classes of verb (i.e., ones which are not transitive).  Then the restrictions only have to be stated once in either our phrase-structure or transformation rules which we apply.
So we get

the woman ate a pear 2

Now this form of phrase-structure analysis cannot generate
    The pear ate the woman
    A person arrived a hotel
because The pear is not animate and arrive is not transitive.

What's more, applying these restrictions to the kernel sentence means that our transformational rules cannot produce
    The woman was eaten by the pear
but will generate
    The pear was eaten by the woman.

The rules will also produce odd-sounding sentences such as
    The bear ate the house
although we could get around that by specifying the kind of noun which slots into the NP2 position but, recall, we are concerned with possible sentences, not language that is actually produced.
Such an approach, combining the phrase-structure rules with transformational rules, can generate, for example,

and so on and so on.  Each and every sentence produced in this way will be grammatical.


Part 2: The Language Acquisition Device (LAD)

We have seen above that the rules governing the generation of grammatically correct sentences are subtle and at times very complex.  The question naturally follows:
    How do we learn the rules?

A behaviourist view of learning would state that we simply hear and repeat correct utterances in our first language and attend to the feedback (positive or negative) that we get from, e.g., parents and other adults.
We adjust what we say according to the type of feedback we get like this:

The fundamental problems Chomsky (and others) see with this are:

  1. There aren't enough data:
    Children acquire language very quickly and are making complex, grammatically correct sentences at a very early age.  By this stage in their development, they simply haven't been exposed to adequate information about the language to be able to do so.
    This is a debatable point because, in fact, normally brought up children are exposed to enormous amounts of data, certainly enough to base a linguistic corpus on before they are five years old.
  2. The data are not always grammatical:
    Studies show that carer-speak is focused on meaning not structure and that very young children in particular are exposed to a lot of language which is ungrammatical.  If that is the sole source of their production, much of it would remain at the 'Get choo-choo' level of speech.
    This, too, has been challenged and, for example, one study found that of 1500 utterances analysed, only one was ungrammatical (or a disfluency, in the jargon).  The study found that the speech of carer directed towards children was 'unswervingly well formed'
    (Newport, Gleitman and Gleitman, 1977:121, cited in Moerk 2000:96)
  3. Reinforcement is unreliable and inconsistent:
    • adults do not consistently respond positively to grammatically correct utterances.  They respond to meaning (and sometimes cuteness) more often than not, however malformed the child's language output is.
    • adults do not always provide loud and enthusiastic reinforcement (as the behaviourist theory would require). They often speak quietly, or even not at all, in response to whatever the child produces.

Chomsky states it this way:

Language is not a habit structure.  Ordinary linguistic behaviour characteristically involves innovation, formation of new sentences and patterns in accordance with rules of great abstractness and intricacy.
(Chomsky (2003) p 349)

There's an allied problem that, although many animals communicate with each other (sometimes sending quite sophisticated signals), only humans have developed such a complex and subtle system of communication: language.
The conclusion is that something else is going on.

What is going on according to Chomsky is that the child is using a genetically inherited Language Acquisition Device which is hard-wired in the structure of our brains.  We are, therefore, inherently prepared to analyse the structure of whatever first language(s) we are exposed to as infants.

What this means is that, before we even leave the womb, our brains are prepared for the kinds of phrase structures and transformational rules we will need to process the language we hear.
Some have compared this to a kind of internal switchboard with which we can categorise input making guesses and assumptions such as
    "Aha! This language uses a Subject - Object - Verb ordering but seems to place adjectives after nouns"
and so on.
How this works lies in the field of first-language acquisition theories, to which there is a guide on this site.

An allied concept is known as the Critical Period Hypothesis which is the source of a good deal of debate concerning its existence and, if it exists, its definition.  The two key issues are:


the evolution of the ability to process language

There is, unsurprisingly, some debate among evolutionary biologists concerning how such a mechanism may have evolved.  Recent genetical research is pointing to a set of genes including one called FOXP2 which looks like this:


It would be a gross oversimplification to dub this 'the language gene' as much else, including the interactions between this gene and a range of others and with the environment, is involved in the ability to process language.  However, the gene appears to be central to our ability to process and produce language and people who lack it or in whom it is mutated or inactive cannot handle language.  Incidentally, this gene has been identified in the DNA extracted from Neanderthal bones (and it also occurs in echolocating bats and songbirds).


The LAD can be visualised as operating in combination with phrase-structure and transformational rules something like this:



Part 3: Universal Grammar (UG)


An allied theory is that there is, therefore, something called Universal Grammar.  This is supposed to be a set of categories and rules common to all languages, no matter what their individual grammatical structures are like and no matter what sorts of languages they are (isolating, agglutinative, synthetic and so on).
The basis for this reasoning is that without such a UG, children would have nothing on which to use the LAD.

This has obvious implications for teaching:

Not to take advantage of learners' inherent knowledge would seem, therefore, to be somewhat perverse, wouldn't it?

As teachers, however, we will do well to bear in mind that, in Hymes' words (1971:278):

There are rules of use without which the rules of grammar would be useless.

So, before we get too enthusiastic about using Chomskyian theories to inform our teaching we should remember that he was addressing the ways in which:

  1. the language is structured at an abstract level (i.e., at the level of competence not performance).
  2. our first (not subsequent) languages are acquired.

Chomsky is not fundamentally concerned with second language teaching and learning.

Related guides
Krashen and the Natural Approach for the guide to this set of hypotheses
how learning happens for a general and simple overview
first-language acquisition for a guide to some current theories and how they may be relevant to teaching languages
second-language acquisition for a guide to some current theories
input for a related guide concerning what we do with the language we hear and read
types of languages for a guide relevant to Universal Grammar
communicative language teaching for more on a non-structural view of teaching and learning

Chomsky, N, 2002, Syntactic Structures (2nd Edition), New York: Mouton de Gruyter
Chomsky, N and Otero, CP, 2003, Chomsky on Democracy & Education, Psychology Press
Hymes, D, 1971, On communicative competence, in Pride, J & Holmes J (eds.), Sociolinguistics, London: Penguin
Lyons, J, 1970, Chomsky, New York: Viking Press
Moerk, EL, 2000, The Guided Acquisition of First Language Skills, Stamford: Greenwood Publishing Group
Palmer, F, 1971, Grammar, Harmondsworth: Penguin Books
(For more on FOXP2 and its role in language development, try the eminently accessible ScienceDirect article at http://www.sciencedirect.com/science/article/pii/S0002929707629024)