Biting the poisoned cherry - why the appeals process for school exams is so unfair
In principle, the exam appeals process should right inadvertent wrongs; in practice, the process is deeply unfair, as is the process by which the original grades were awarded.
To explain why, imagine that you take a bite from one side of a cherry. A few moments later, you don’t feel so well. You fear that you might have been poisoned, but you’re not sure. You also think, but once again you’re not sure, that the other side of the cherry contains an antidote. What should you do? I think many people – myself included – would take that second bite. And quite possibly many people – myself included – would argue that I should not be denied access to the antidote, simply because I had taken the first bite from the poisoned side. “It was just the luck of the draw, too bad.”
“It is not fair to allow some students to have a second bite of the cherry by giving them a higher mark on review, when the first mark was perfectly appropriate.”
Ah! Biting cherries!
And this is from another Ofqual blog dated 3rd June 2016
“There is often no single, correct mark for a question. In long, extended or essay-type questions it is possible for two examiners to give different but appropriate marks to the same answer. There is nothing wrong or unusual about that.”
To me, these two blogs, from the same source, and just a week apart, are in total conflict.
Suppose, for example, that the 4/3 boundary for an examination is 40. A candidate given 39 marks is awarded grade 3; as a result, many doors are slammed shut. But since Ofqual explicitly acknowledge that “it is possible for two examiners to give different but appropriate marks”, a different examiner might have given that candidate 40 marks, grade 4. And those life-chance doors are open.
How, then, can it be “unfair...to have a second bite of the cherry” when the first bite, the mark of 39, was the lottery of which examiner happened to mark the script? When the first bite was poisoned?
A single mark can, and does, make all the difference. At every grade boundary. And there are some interesting, official, statistics on this. In November 2016, Ofqual published Marking Consistency Metrics, which contains evidence that, for scripts marked at grade boundaries, there is a probability of at best 50% that the grade awarded is what Ofqual refer to as “the definitive grade” – and what an ordinary person would refer to as the “right” grade. So, for scripts marked at any grade boundary in any subject, at both GCSE and A level, there is about a 50:50 chance that the awarded grade is right - or indeed wrong. You might as well flip a coin. Is that “fair”? For the grade boundary cherry is surely poisoned, and bitterly so.
The fundamental problem underlying all this is not “marking error”. It is the acknowledged fact that marks are “fuzzy”. The script’s mark is not “39” precisely; rather, it’s any number between, say, 37 and 41, where each of these marks is, in Ofqual’s own words “different but appropriate”. This range straddles the grade boundary, causing great unfairness: the grade as published in August depends on which specific mark happens to be given; furthermore, the possibility that an original mark of 39 might have been 40 is an incentive to appeal. But an up-grade is possible only given if an appeal is made – and only if the barriers of the fee, the requirement to demonstrate that the original mark was “unreasonable”, and the implied threat of the whole-centre review, are all successfully hurdled. Fair??
In July 2017, the Supreme Court unanimously judged that the fee for an appeal to the employment tribunal was unlawful because “it has the effect of preventing access to justice”. Might something similar be said of the fee for an exam grade appeal? And, as regards “reasonableness”, do the inferences drawn from Marking Consistency Metrics - that there’s about a 50% chance that grades by grade boundaries are wrong - provide sufficient grounds?
But there is a solution to this muddle, as highlighted by senior examiner Neil Sheldon in an article in The Sunday Telegraph. Throw grades away, and award each candidate the given mark (say, 39), associated with a measure of the exam’s “fuzziness” (say, 2 marks either way). And since the recent ruling by the Information Commissioner makes all marks available in principle, why not make them available in practice from the outset?
Neil Sheldon’s suggestion mentions measuring “fuzziness”, the intrinsic variation or uncertainty in marking – an uncertainty that causes grades to be unreliable. Here is a quote from a document published in 2005:
“However, to not routinely report the levels of unreliability associated with examinations leaves awarding bodies open to suspicion and criticism. For example, Satterly (1994) suggests that the dependability of scores and grades in many external forms of assessment will continue to be unknown to users and candidates because reporting low reliabilities and large margins of error attached to marks or grades would be a source of embarrassment to awarding bodies. Indeed it is unlikely that an awarding body would unilaterally begin reporting reliability estimates or that any individual awarding body would be willing to accept the burden of educating test users in the meanings of those reliability estimates.”
Indeed. Nobody likes to wash their dirty linen in public.
The lead author of this document, Dr Michelle Meadows, was, in 2005, at the exam board AQA, and is now Ofqual's Executive Director of Strategy, Risk and Research. Yes, as most explicitly stated, “...it is unlikely that an awarding body would unilaterally begin reporting reliability estimates...”. What, then, is the role of the regulator?