How reliable are GCSE and A level grades?

Over the last several years, Ofqual, the regulator of school examinations in England and Wales, has been conducting an extensive research programme on the reliability of GCSE and A Level grades. Some key results were published in November 2016 in a report entitled Marking Consistency Metrics; further results were presented at a symposium held in June 2017, including this chart from a presentation entitled StartFragmentQuality of marking - confidence and consistency:

This chart merits attention: for each of the subjects shown, the chart answers the question “If a GCSE or A level script is re-marked by a senior examiner, for example, on appeal, what is the probability that the originally-awarded grade will be confirmed?” This is an important question, for if a grade is not confirmed on appeal but changed, then the originally-awarded grade must have been wrong. As is clearly shown, over each of the last four years, the probability that the originally-awarded grade is confirmed if about 85% for physics, about 70% for English Language, and about 60% for history. Let me note that these percentages are averages over all scripts across the entire marking range: for a script marked close to a grade boundary, Figures 12 and 13 of Marking Consistency Metrics imply that the probability that the originally-awarded grade is confirmed is much lower - perhaps 50% or less.

When we consider the average percentages the other way around, we can appreciate their true significance: in each of the last four years, for every 100 candidates that took physics, 15 candidates were awarded the wrong grade; for English Language, 30; and for history, 40. To make that real: about 500,000 students take GCSE English Language annually, and so this data implies that, in each of 2013, 2014, 2015 and 2016, some 150,000 candidates have been awarded the wrong grade. 150,000. I’ll say that again. Over each of the last four years, some 150,000 candidates taking GCSE English Language have been awarded the wrong grade.

The wrong grade is 'wrong' both ways, so, of the 30 candidates in every 100 awarded the wrong grade in GCSE English Language, 15 candidates are awarded a grade higher than they merit ('lucky' candidates), whilst 15 candidates are awarded a lower grade ('disadvantaged' candidates) - and possibly, as a result, are unfairly denied important life chances. You might think that ‘disadvantaged’ candidates would all appeal, and that the errors would be corrected. Not so. No candidates know that an error has occurred, and many just shrug their shoulders, thinking “Oh dear, I didn’t do as well as I had hoped”. The error remains uncorrected, the injustice remains, and perhaps that life chance has gone for ever.

This is bad.

But even worse is what has quite likely happened in summer 2017, when three GCSE subjects - mathematics, English Language and English Literature - have been graded on a new scale. Hitherto, the top grade has been A*, followed by A, B, C, D...with the C/D grade boundary being particularly important. From summer 2017, the new grades are numeric: the top grade is designated 9, and then 8, 7, ... , such that the 'new' 4/3 boundary is equivalent to the 'old' C/D boundary. As a consequence, the 'space' previously occupied by the four grades A*, A, B and C is now occupied by the six grades 9, 8, 7, 6, 5, and 4, and, on average, grade widths are narrower by a factor of 4/6.

Here is a quotation from page 21 of the November 2016 report Marking Consistency Metrics :

"...the wider the grade boundary locations, the greater the probability of candidates receiving the definitive grade..."

What this does not say, but what it implies, is that the narrower the grade widths, the greater the probability that candidates will receive the wrong grade.

So here's the really bad news. For GCSE English Language, under the 'old' A*, A, B... grading system, 30 candidates in every 100 have been awarded the wrong grade. Under the new 9, 8, 7... grading system, because the grade widths are likely to be narrower, this number rises: perhaps as many as 45 candidates in every 100 will be awarded the wrong grade (45 being the original 30, scaled by 6/4 to account for the narrowing of the grade widths).

This is not just bad, it's outrageous. In - arguably - the single most important subject in the curriculum, how is it possible that almost one-half of the candidates might be awarded the wrong grade? But that is exactly what might have happened.

I might, of course, be wrong - and I hope I am, for our young people deserve better. So, if you have a view, please be in touch. And to prove me wrong, may I suggest that Ofqual do two things: for all the GCSE and A level examinations in summer 2017 (and thereafter), Ofqual should publish

■ data, for all subjects, clearly showing the percentage of candidates who are awarded the right grade, and the percentage awarded the wrong grade...

■ ...and, for those subjects changing from A*, A, B... to 9, 8, 7..., an analysis of the results under both grading structures, so that the percentages of candidates being awarded the right - and the wrong - grades can be compared.

Much more constructive, however, is to fix the problem of grading errors, and of grade unreliability. For an idea as to how to do this - an idea that minimises the likelihood that any candidate is ‘disadvantaged’ - take a look at my blog How to make GCSE and A level grades reliable.