A hypothesis about speech
It is commonly assumed that speech has roughly the same kind of structure as a printed text. Of course, everyone knows about spelling irregularities like 'cough' having five letters where it has only three sounds (phonemes). But when we listen to speech, we have the impression of a string of phonemes, grouped together into words.
If that impression were true, we would expect that if we turned speech backwards, it should behave in the same kind of way as a printed text turned backwards. Consider the following printed text:
The quick brown fox jumped over the lazy dog.
If we turn that text backwards, it looks liek this:
.god yzal eht revo depmuj xof nworb kciuq ehT
Notice that individual letters are still recognisable, and, while the words look rather odd, their boundaries and structure are still clearly to be seen. With a bit of effort, we can even decipher the meaning of the sentence.
Testing the hypothesis
Consider now what happens to speech when it is turned backwards. Click the arrow to hear a short English phrase which has been reversed. The whole structure of the words and phonemes is destroyed. It is barely recognisable as English, and it is unlikely you can figure out what the original phrase was.
If you would like to hear the original, right-way-round phrase, click the second arrow (below). You may need to click back and forth a few times to really believe they are exactly the same recording, just played in different directions.
What does it mean?
Even the clearest of speech is more like very messy handwriting, with no gaps between words or sentences - perhaps something like this:

To see the similarity, consider what happens when we reverse such handwriting. It becomes indecipherable, much as speech does:

Why does it matter?
Keeping in mind that speech is more like very messy handwriting with no gaps (actually most speech is far 'messier' than the handwriting seen here) can give a somewhat more realistic sense of what is involved in perceiving and understanding speech. It is certainly not a simple matter of 'picking up' the acoustic structure of each phoneme, and putting them together sequentially to form meaningful words.