How reliable are GCSE and A level grades?

Over the last several years, Ofqual, the regulator of school examinations in England and Wales, has been conducting an extensive research programme concerning the reliability of GCSE and A Level grades. Some important results were published in November 2016 in the report Marking Consistency Metrics, and further results were presented at a symposium held in June 2017. Here is slide 7 from a presentation entitled Quality of marking - confidence and consistency :

The vertical axis reads "Median probability of definitive grade". The "definitive grade" is the grade that would be awarded if a senior examiner were to mark a script, but since - in practice - most scripts are marked by markers who are not senior examiners, it is possible that the mark actually given to a script is rather different, with the possibility that the grade is different too. This difference is known as "tolerance", as discussed further in my blog How to make grades reliable.


The figure is startling. As is clearly shown, over each of the last four years, the probability of being awarded the right grade is about 85% for physics, about 70% for English Language, and about 60% for history.

Even more startling is to consider these numbers the other way around. So, for every 100 candidates that take GCSE or A level physics, 15 candidates are awarded the wrong grade; for English Language, 30; and for history, a staggering  40 candidates in every 100 are awarded the wrong grade!!! To make that real: every year, about 500,000 students take GCSE English Language, and so this data implies that, each year, some 150,000 candidates have been given the wrong grade. 150,000.

'The wrong grade' is 'wrong' both ways - so, for English Language, 30 candidates in every 100 are awarded the wrong grade, with 15 candidates being awarded a grade higher than they merit (so they are 'lucky'), and 15 candidates being awarded a grade lower than they merit (and so are 'disadvantaged'), and possibly, as a result, being unfairly denied important life chances.

This is bad.

But even worse what in summer 2017. In summer 2017, three GCSEs - Mathematics, English Language and English Literature - were graded on a new scale. Hitherto, the top grade has been A*, followed by A, B, C, D...with the C/D grade boundary being particularly important. From summer 2017, the new grades are numeric, with the top grade designated 9, and then 8, 7, ...  . Importantly, the 'space' currently occupied by grades A*, A, B and C will become occupied by grades 9, 8, 7, 6, 5 and 4 - and so where in the past there have been 4 grade boundaries (including C/D), there will now be 6 (including 4/3). 


Here's the really bad news. Under the 'old' A*, A, B... grading system, for GCSE English Language, 30 candidates in every 100 have been awarded the wrong grade. Under the new 9, 8, 7... grading system, because the grade widths are likely to become narrower, this will number will rise: perhaps as many as 45 candidates in every 100 will be awarded the wrong grade.

This is not just bad, it's outrageous. In - arguably - the single most important subject in the curriculum, how is it possible that almost one-half of the candidates might be awarded the wrong grade? But that is exactly what has happened.​

I might, of course, be wrong - and I hope I am, for our young people deserve better. So, if you have a view, please be in touch. And to prove me wrong, may I suggest that Ofqual do two things: for all the GCSE and A level examinations in Summer 2017 (and thereafter), Ofqual should publish

 ■  an analysis, of the type shown in the above diagram, for all subjects, .....clearly showing the percentage of candidates who are awarded the right grade, .....and the percentage awarded the wrong grade, and

 ■  for those subjects changing to 9, 8, 7... grades from A*, A, B... grades, an analysis of ....the results under both the new grading structure and the previous year's old ....grading structure, so that the percentages of candidates being awarded the right, ....and the wrong, grades can be compared.​