Holt.Blue

What is a Readability Index?

A readability index is an estimation of how difficult a text is to read. The estimation is made by measuring a text's complexity. Measurable attributes of texts such as word lengths, sentence lengths, syllable counts, and so on give us ways to measure the complexity of a text. Text complexity is then compared to how well readers comprehend the text. From this data, a formula is created which predicts a text's reading difficulty from its complexity.

Every index does this a little bit differently and emphasizes particular aspects of text complexity. Some emphasize syllable counts while others look only at word and sentence lengths. This is why there are more than a few readability indices available to us. However, the core idea of each is the same; readability is essentially a measure of text complexity.

We will discuss how to interpret and use the indices available on this website and what we must assume when using them.

Interpreting the Numbers

Each readability index (or score) on this website gives an estimated grade level (United States) required to be able to read and comprehend a text without difficulty. For those outside of the United States, this grade level can be considered the number of years of formal education (conducted in English) needed in order to read and understand a text. Thus, the lower the index, the easier the text is to read, and conversely, the higher the index, the more difficult the text is to read.

As 12th grade marks the conclusion of compulsory education in the U.S., any value greater than 12 can be considered a thoroughly adult level. Moreover, anyone writing for a general audience in the U.S. should take into consideration that the average adult reader in the U.S. reads at an 8th grade level.

Technical and professional literature found in scholarly journals generally scores above 12. Accordingly, texts which are meant to be accessible to a wide adult audience should fall somewhere between 8 and 12. Advertising copy is written anywhere from a 5th to 8th grade level depending on the market.

We also note that good writing isn't necessarily more complex. As many readers will see by using this site, many classic 20th century novels by authors such as Hemingway, Steinbeck, and others score around 8th or 9th grade level. Lower text complexity may even be an indicator of clear and concise writing while higher scores may indicate bloated, overwrought prose.

We underscore that readability is a lower bound on the level required to read the text. Thus, if a text is rated at an 8th grade level, it does not mean that the text is necessarily for 8th graders, it simply means that approximately 8th grade is a minimal level of education required to easily comprehend the text.

Note: some indices can be negative. Negative index values may be interpreted as either a text being written at a preschool level (some preschoolers can read before starting public school) or simply that a text written at the most basic level.

The table below summarizes the above while providing some additional examples:

Readability Index	Level	Examples
3 and below	Emergent and Early Readers	Picture and early reader books.
3-5	Children's	Chapter Books.
5-8	Young Adult	Advertising copy, Young Adult literature, some news articles.
8-12	General Adult	Novels, news articles, blog posts, political speeches.
12-16	Undergraduate	College textbooks.
16 and above	Graduate, Post-Graduate, Professional	Scholarly Journal and Technical articles.

An Example

The best way to illustrate how to interpret the readability indices on this website is with an example which compares texts. We shall compare "Green Eggs and Ham" with Federalist Paper No. 10. But before we do, let's ask the question: which of these well known writings would you estimate is more difficult? Good. Now let's look at the results:

The scores for "Green Eggs and Ham":

Index	Value
Gunning fog	3.24
Flesch-Kincaid	-1.03
SMOG	5.29
Coleman-Liau	-0.4
Automated	-2.04

Average Grade Level:	1.01
Median Grade Level:	-0.4

The scores for Federalist Paper No. 10:

Index	Value
Gunning fog	20.95
Flesch-Kincaid	16.96
SMOG	17.31
Coleman-Liau	13.05
Automated	19.64

Average Grade Level:	17.58
Median Grade Level:	17.31

We see that the scores for the first text are distinctly lower than the second. Is this generally what you expected? We believe the answer is yes.

Now one might quibble over the fact that there's quite a range of values for a text like "Green Eggs an Ham" and conclude that these numbers don't measure anything. Perhaps. But the reader is sure to agree that it is impossible to pinpoint with military accuracy how difficult a text is to read. Indeed, every person is different and will struggle with different points of reading. Moreover, if you ask parents and teachers for their estimation of the level of "Green Eggs and Ham," the answers will vary there too.

In similar fashion, if we look at the overall results given by this website, we are offered a range of grade levels with each index offering its own "opinion." In the case of "Green Eggs and Ham," all the indices point to a grade level of anywhere between preschool to 5th grade with the "middle" lying somewhere around Kindergarten to 1st grade. Interpreting these numbers as a kind of spectrum, it is certainly reasonable: some of us have known precocious preschoolers who could read more difficult texts than Dr. Seuss, and we all know that there are 5th graders (and sadly even some adults) for whom "Green Eggs and Ham" would be perfect fit for their reading level.

On the opposite end of the spectrum we have Federalist Paper No. 10. Again, we are given a range of values which indicate anything from early undergraduate to graduate level with the "middle" lying around grade 17 (first-year graduate). The same idea above applies: a well-prepared college freshman might have no trouble reading and understanding this document, yet there are scholarly journals devoted to the sole aim of analyzing and interpreting the Federalist Papers.

Assumptions and Limitations

First and foremost, the readability indices used by this website were developed with English in mind. Thus, any readability results for a non-English text will not be valid. However, Holt.Blue does support characters from several non-English alphabets, so character counts, word frequencies, and so forth should work well.

Secondly, these indices assume that a text is written in grammatically correct, properly punctuated English. If this requirement is not met and you feed a nonsense text into the program, the famous computing adage applies: "garbage in, garbage out." To put it plainly, if your text is a random collection of symbols, gibberish, or otherwise, the readability results will not be valid.

Additionally, it has been noted that intentionally "writing to the formula" (artificially changing word and sentence lengths or other parameters readability formulas use) to make a text "simpler" can in practice have the opposite effect. So if you're writing copy, be sure that it looks and sounds good first before feeding it into a readability analyzer.

It is also necessary to point out that some readability scores require a fairly substantial sample of text in order to be considered a valid measure of difficulty. In particular, the SMOG index requires at least 30 sentences to be considered valid (that is, it actually measures what it claims to measure).

Finally, we must disclose that in some cases, particularly the Gunning fog, Flesch-Kincaid, and SMOG indices, we can only estimate the value of the index (this is true of most online readability analyzers). The reason for this is that counting syllables from how an English word is written is not easy to do perfectly with a computer. Therefore, there may be some discrepancy between the value given here and the true value of the index as it is defined on paper. In any case, however, our implementation gives a good estimation of syllable count and, consequently, a value that is quite close to the true value of the index, and hence, a good estimation of the level difficulty of a text.

Other Indices Used On This Site

This website also includes the Fry and Raygor readability indices. These indices determine a text's grade level graphically.

All of the above mentioned formulas are measures of text complexity. The reader may also be interested in a Cloze Test which measures readability in terms of how comprehensible a text with regard to a target audience.

Dynamic Readability

This website also offers its users dynamic readability tools. That is, in addition to giving you a raw readability score for your entire text, you may also track the readability of your text from the beginning to the end for sufficiently long texts. These tools enable the reader to not only determine how readable a text is, but also how consistently a text is written.

Conclusions

The readability indices used by this website essentially offer a range of "honest opinions" which enable you, the discerning reader, to draw your own conclusions about the reading level of a text. Although the numbers aren't a perfect description of the difficulty of a text, they do at least give us an objective sense of a text's difficulty.

Go Back

Links and References

DuBay, W. H. 2006. Smart language: Readers, Readability, and the Grading of Text. Costa Mesa:Impact Information.

Article on this Website: The Fry Readability Formula.

Article on this Website: The Raygor Readability Estimate.

Wikipedia Article on Readability surveying various indices.