Assessment

The Pangram Problem: When AI Detection Punishes the Polished

A Commonwealth Prize-winning story was flagged as AI-generated. The case is contested, but the schools parallel is clear. AI detectors deliver binary verdicts on prose they were never designed to judge – and the pupils most likely to be wrongly flagged are the ones who write best.

The 60-second Briefing

The Caribbean regional category of the 2026 Commonwealth Short Story Prize was awarded to Trinidadian writer Jamir Nazir, then flagged by AI detector Pangram as 100% machine-written.
Nazir denies using AI. The Commonwealth Foundation has stood by him while reviewing its policies. Granta has kept the story online.
The detection science is genuinely contested. False positives on polished human prose are well-documented.
For schools, the parallel is unmissable: the same class of tools is being used in classrooms to police pupil work, with the same unreliability.
The pupils most likely to be wrongly flagged are the ones who write well. The right response is to deepen our judgement, not delegate it to a probability score.
This is happening during the UK's National Year of Reading. The timing is, to put it gently, awkward.

Last month, the Commonwealth Foundation announced the regional winners of its 2026 Short Story Prize. The Caribbean's winner, chosen from over 7,800 entries, was the Trinidadian writer Jamir Nazir, for a story called "The Serpent in the Grove." It was published on Granta, the literary magazine that traditionally hosts the regional winners, and the judge praised its prose as "sublime – precise yet richly evocative." This should have been a small triumph for Caribbean writing.

Within days, the story had been flagged on social media as reading like the work of a large language model. Wharton professor Ethan Mollick ran it through Pangram, one of the more respected AI detection tools, and posted the result: 100% machine confidence. Nazir denied the allegations, arguing the story drew on his childhood memories of rural Trinidad and that AI detectors are known to generate false positives on polished prose. The Commonwealth Foundation has stood by him while promising a fuller review of its policies. Granta has kept the piece online, saying it will only take it down if "definite evidence comes to light."

The literary world has just walked into a problem schools have been wrestling with for two years, only at a higher profile and with sharper public consequences for the writer at the centre of it.

A few months ago, out of professional curiosity, I ran several pages of a Tom Clancy novel through one of the same class of detection tools that schools are increasingly being sold. Clancy, whatever else you might say about his prose, was unmistakably human, prolific, and emphatically pre-AI – he died in 2013, well before any of this technology existed. The detector came back over 75% confident the text was machine-generated.

Make of that what you will. Here is what I read into it: that these tools cannot be trusted as binary verdict-generators on long-form prose, however confident their interface looks.

The JCQ guidance to centres on AI misuse is reasonably balanced. It warns against relying on a single detection score and emphasises teacher judgement. But the operational reality in many schools – particularly those with stretched English departments managing hundreds of essays a term – is that the score quickly becomes the conversation. The pupil submits work. The platform runs a check. The teacher sees the flag, then reads the work with the flag in mind. By the time anyone is asking whether the score is reliable, the meeting with the parents is already in the diary.

This case has also landed in a year that the UK has officially declared the National Year of Reading, a Department for Education initiative delivered with the National Literacy Trust. The campaign exists because reading for pleasure among children and young people has fallen to its lowest level in a generation. Just one in three eight-to-eighteen-year-olds reported enjoying reading in their free time in 2025. The campaign's slogan asks the country to "Go All In" on books. Schools, libraries and parents have spent the first half of the year being told, quite rightly, that reading and writing well are skills we cannot afford to lose. The Caribbean prize controversy is a glimpse of what could happen to a pupil who accepts that invitation.

Because here is the part that has been bothering me since I heard about the Nazir story. The Caribbean judge praised his prose as "precise yet richly evocative", which is also a fair description of what we hope every pupil with a literary instinct learns to do. Precision and richness are not red flags. They are the explicit goal of good English teaching. They are the thing every prize day, every UCAS reference and every parents evening at every school in the country is set up to reward.

If polish is now the trigger that brings the detector down on a pupil's head, the pupils most damaged by AI detection are not the ones cutting corners with a chatbot at two in the morning. They are the ones who learned to write well. The ones who turned up to creative writing club. The ones whose parents read to them at bedtime. The ones whose teachers spent years showing them what an extended metaphor can do. Those pupils will produce prose that confuses the detector. The detector will return a verdict. The pupil will then be asked to defend themselves.

Writing about the Nazir case on her Substack, Victoria Livingstone made a point that struck me as the sharpest in the whole controversy. Many of the readers who pointed to "AI tells" in the prose had already seen the Pangram label before they started reading. The label primed the reading. Once you have been told a piece is AI, the slightly decorative simile looks like a giveaway instead of a stylistic choice. Livingstone calls this outsourcing our taste to the machine.

I would call it outsourcing our pedagogical judgement to the machine. Once a teacher reads pupil work after the detector has spoken, the teacher is no longer marking the work. They are marking the gap between the work and the score. That is a different exercise, and it is much worse at producing fair outcomes.

There is an asymmetry hiding in all of this that we do not talk about enough. When a literary prize winner is accused of AI use, they have a publisher, a foundation, sometimes a literary agent, occasionally a lawyer. They have a public platform to mount a defence. Nazir wrote a statement and got it published in the Trinidad Express. A Year 11 pupil flagged by their VLE has none of that. They have a Head of English with thirty other essays to mark, a slightly worried parent, and a probability score they cannot meaningfully challenge. The burden of proof is reversed, and it falls on the least-resourced person in the room.

Asking a pupil to prove they did not use AI is asking them to prove a negative. It cannot be done. They can produce drafts, version history, source notebooks, and evidence of their working. Any of those, if collected systematically and respected by the school, is a far better foundation for a judgement than a single probability score. But none of that is what the detector offers, and none of it is what most schools are currently set up to do.

In The Metacognitive Trap, I argued that if a piece of software removes the cognitive friction required for learning, IT departments should refuse to buy it. The same principle, in a slightly different form, applies to detection. If a piece of software cannot reliably distinguish a Tom Clancy thriller from machine output, schools should refuse to buy it as a verdict-generator. As a flag prompting a conversation, fine. As a binary judge, never. The day a school delegates a pupil's academic fate to a probability score is the day the school stops doing the job it exists to do.

The more interesting question is what the alternative looks like. The answer, as I set out in Repricing Human Judgement, is older and less glamorous than the detection industry would like. The things that have always demonstrated authorship will continue to do so. A pupil who can talk about the choices in their own essay – why this paragraph comes before that one, what they cut, what they considered and rejected – is a pupil who wrote it. A pupil whose drafts show the visible evolution of an argument is a pupil who wrote it. A pupil whose vocabulary in conversation matches their vocabulary on the page is a pupil who wrote it. This is slower than running a check. It is also enormously more reliable, and it has the bonus of being good teaching.

Two things are true at once here. The first is that AI misuse in pupil work is real, growing, and a genuine challenge for schools. Anyone who has read my posts on The Metacognitive Trap or The Illusion of Perfection would know that I am not minimising it. The second is that the current generation of detection tools is part of the problem rather than the solution. They produce confident binary verdicts on a task they cannot reliably perform, and they invite institutions to delegate judgements that belong to teachers.

The Commonwealth Foundation is right not to strip a prize on a probability score. Schools should be just as cautious. The right response to AI in pupil work is not to find a tool that will tell us what we cannot tell ourselves. It is to deepen the methods we have always used to know whether a piece of writing is genuinely the pupil's own. That work is unglamorous, time-consuming, and humanly demanding. It is also, as far as I can see, the only thing that actually works.

The DfE has declared 2026 the National Year of Reading. Schools, libraries, parents and pupils are being asked to Go All In on books. The least we can do, while asking that, is to not punish the pupils who do.

See you in the digital staffroom.

The Pangram Problem: When AI Detection Punishes the Polished

The 60-second Briefing

Read more

One Desk, Four Crises: The Job Advert That Says It All

The Artificial Advantage: Why 2026 Feels a Lot Like 2020

Facing the Deepfake Threat: Why IT Must Step into the Classroom

The 73% Problem: When The Numbers Make Your Argument For You