Fix. Your. Paper.

May 2, 2025

Whereas Generative AI is equipped with qualities that human examiners in our field cannot possibly possess — logical objectivity and linguistic proficiency — it is natural to give them a share of the grave responsibility of proofreading exam papers, spotting awkward places that non-native speakers aren’t sensitive enough to notice, and finding logical inconsistencies that ESL examiners, who have already read through the passages and are consequently biased, easily overlook.

A proofreader that is, unlike their human counterparts, truly independent. Impartial. Instant. Incomparable in proficiency.

Methodology

The AI model is first asked to complete the exam on its own. Then, the official answer key is provided for comparison. For any discrepancies between the AI’s answers and the key, a secondary check is initiated: the AI attempts to infer the logic of the examiner, then weighs it against its own reasoning to determine which holds up better. If the judgment remains favourable to the previous answer, the item is flagged as a controversial question.

Efficacy

Take, for instance, the mock exams given to final-year senior middle schoolers across Shanghai districts in 2020, typical of high quality and minimal controversy. The number of ambiguous or problematic questions flagged per paper was indeed fewer than one. Some district-level exams, such as those from Yangpu, were often cleared without issue.

Manual review confirms false positives and false negatives are minimised under this method.

There are occasional issues with this approach. For example, AI sometimes struggles to correctly extract the information from the PDFs probably due to split of context (two PDFs respectively for the paper and the key), and has difficulty in figuring out how many blanks there are in one Grammar question (presumably also PDF issue).

This experiment also limits itself to objective questions only (excluding listening comprehension).

Applying to Our Exams

Despite the presence of an independent reviewer, problems in our own school’s exams are more common and more serious. In monthly exams — including midterms and finals — an average of three controversial items were detected per paper. Many of these involved an alternative answer, which could not only be an equally proper choice, but also arguably be a better one.

Conclusion

Given AI’s unique strengths in objectivity and linguistic mastery, it shows great promise not only to assist but, in some cases — especially school-level exams — to replace human proofreading altogether. Not only does AI streamline this process (Fix. Your. Paper. takes about two minutes per English paper of Shanghai Gaokao specifications), but it also tends to enhance paper quality beyond what purely human review can achieve.

So, dear exam setters: next time before you fling that English test into the hands of a thousand innocent students, how about giving it a run through Fix. Your. Paper. first?

Fix. Your. Paper. is proudly open-source as part of the Leximory project. Visit it at leximory.com/fix-your-paper.