Lexical Density and Spoken Word
Spoken word presents a completely different set of patterns and considerations when analyzing a text.
One well-documented phenomenon is that the
lexical density
of spoken language tends to be lower than the written word (Johansson 2008, Ure 1971).
We analyzed six interview transcripts to test this observation,
three of which are celebrity interviews, and the other three are interviews with political figures.
In addition, we looked at the complexity (
readability)
of each interview transcript.
We also thought it would be interesting to look at the most common words of each interview.
Three Celebrities. Three Politicians.
Interview 1: Celebrity. Lexical Density: 45.36% Grade Level: 9.
Interview 2: Celebrity. Lexical Density: 43.7%. Grade Level: 5.
Interview 3: Celebrity. Lexical Density: 44.23%. Grade Level: 6.
Interview 4: Political Figure. Lexical Density: 45.3%. Grade Level: 6.
Interview 5: Political Figure. Lexical Density: 47.5%. Grade Level: 10.
Interview 6: Political Figure. Lexical Density: 45.14%. Grade Level: 9.
Methodology and Analysis
For every transcript we analyzed, we removed every word which was not actually spoken.
These include the names of the people speaking and notations of non-verbal language,
laughter, applause, and so forth.
We did not remove the speech of the interviewer as well since this is also natural speech.
So interview questions are also included in each transcript.
Our observations are well in line with those of Johansson (2008) and Ure (1971)
that lexical density of spoken language tends to be lower than for written texts.
The average of the sample was 45.02% which is lower than observed for
expository texts, fiction and
newspaper articles.
Perhaps not surprisingly, we noticed that both the complexity (readability) and
lexical density of the language used was lower among our celebrity interviews.
Due to such a small sample size, however, the results are hardly conclusive.
The results are summarized below.
| Average Lexical Density | Average Complexity (Readability) |
Celebrities | 44.43% | 6th Grade |
Political Figures | 45.98 | 8th Grade |
Concluding Remarks
We note that our choice of interview transcripts was mostly limited to what we could find
in an online search. So we cannot say that our sample is representative of modern speech.
In fairness, it should be noted that the differences observed between celebrities and political figures
is likely not a reflection of the intelligence of the interviewees,
but rather the complexity of subject matter being discussed and the amount of exposition required to adequately
address either difficult or not-so-difficult questions.
We also noticed that the celebrity interviews contained many instances of the words
"like" and "just." Perhaps, this observations is again not very surprising,
but nonetheless interesting as it is a reflection of modern speech and the constant
evolution of language.
What kind of results does the reader get when they try something similar?
Links and References
Johansson, V. (2008), Lexical diversity and lexical density in speech and writing: a developmental perspective, Working Papers 53, 61-79.
Ure, J. (1971), Lexical density and register differentiation. In G. Perren and J.L.M. Trim (eds), Applications of Linguistics, London: Cambridge University Press. 443-452.
© Holt.Blue