Beta

Marking Exam Done by A.I. - Sixty Symbols

Below is a short summary and detailed review of this video written by FutureFactual:

Can ChatGPT Pass a Physics Degree? Sixty Symbols Tests an Exam in Quantum Mechanics

Sixty Symbols revisits AI in physics education by feeding a year-2 quantum mechanics exam into ChatGPT and evaluating its performance as if it were a student. The video contrasts the current model's results with earlier findings, discusses how scores translate into UK degree classifications, and explores the implications for exams, cheating, and assessment reform. The hosts annotate the AI's reasoning, highlight where it succeeds on calculations but struggles with sketches and sign handling, and debate whether online exams can withstand AI assistance. Read on for insights into how AI might shape future physics assessment.

Overview

Sixty Symbols revisits the topic of artificial intelligence in physics education by evaluating how the latest version of ChatGPT performs on a real physics exam. Building on a 2021/2024 study that fed an entire degree path to an AI, the video now tests a year-2 quantum mechanics paper from 2024 and records the model's score and reasoning. The discussion centers on the practical implications for degree programs, assessment design, and the integrity of online examinations as AI capabilities advance.

Experiment Setup

The hosts begin with a clean ChatGPT account and upload a genuine year-2 undergraduate quantum mechanics paper from a UK university. They prompt the model to answer “in the manner of a year 2 student” and then compare the AI’s responses to human marking, using the same marking scheme and rubric. They emphasize that the in-person examination context is different from online formats and note that some institutions already lean toward online assessments which raises concerns about cheating and fairness.

Results and Scoring

Across multiple questions, ChatGPT demonstrates a strong grasp of the required physics concepts and can produce correct results, sometimes with the right final answers but for the wrong reasons. The hosts critique the model for not sketching plots or drawing Gaussian curves, and they scrutinize sign errors and minor calculation slips. In one instance the model resolves a problem correctly after a sign correction, and the marker even awards a bonus for the model's self-correction. In the end, ChatGPT scores highly on several parts of the paper but is not perfect, and the hosts discuss how these outcomes compare with the class average and the distribution of marks in the cohort.

Analysis of Specific Questions

The video delves into the logic of the AI’s solutions, showing pattern recognition alongside genuine understanding and occasional missteps. The hosts emphasize the importance of not just getting the right answer but also following a coherent physical argument. They highlight a common challenge for AI in physics: translating symbolic reasoning into clear, testable steps, and the limitations when tasks require visual sketches or intuitive diagrammatic reasoning. The discussion also touches on the risk of “pattern matching” rather than deep understanding, and the potential to exploit AI to detect cheating rather than to improve learning outcomes.

Ethical and Educational Implications

Beyond the technical results, the conversation shifts to broader questions about assessment design, integrity, and the future of physics education. The hosts argue for robust in-person assessments to preserve the ability to judge competence in real situations, while acknowledging that AI will increasingly influence higher education. They discuss AI detectors as fallible tools and the risk of outsourcing marking or relying on automated scoring. The video closes with calls to reform how exams are structured and assessed in the age of advanced AI, prioritizing data interpretation, problem-solving, and demonstration of understanding over rote pattern matching.

To find out more about the video and Sixty Symbols go to: Marking Exam Done by A.I. - Sixty Symbols.