ChatGPT’s Oscillating Accuracy A Reminder User Oversight Necessary

I wanted to share a recent study demonstrating ChatGPT’s accuracy has diminished for some key tasks, contrary to the prevalent assumption that training over time should increase accuracy.

This reminds us of the vital need for human oversight in any use of AI technologies.

Researchers from Stanford University and the University of California at Berkley studied ChatGPT results from March through June and found that its accuracy in specific tasks fell dramatically, concluding that ChatGPT shows drastic changes in accuracy on these specific tasks over a very short amount of time.

Researchers focused on two versions of the ChatGPT software, 3.5 and 4. The specific tasks considered: (1) solving math problems, (2) answering sensitive/dangerous questions, (3) generating software code, and (4) visual reasoning.

While ChatGPT-4’s ability to recognize a prime number plummeted, ChatGPT-3.5’s had drastically improved.

ChatGPT-4 answered fewer sensitive/dangerous questions and ChatGPT-3.5 answered more, but when refusing to answer questions, both were less verbose in explanations for refusal. ChatGPT-3.5 was nearly always susceptible to bypassing attacks (“Jailbreaking”), but ChatGPT-4’s defenses had improved significantly over time.

The abilities of both software versions deteriorated significantly in writing code, supplying more code with extraneous characters, making the code non-executable.

Results for visual reasoning technically showed marginal improvements for both ChatGPT-3.5 and ChatGPT-4, but this should be viewed with skepticism, as correct answers provided by ChatGPT-4 in March were not provided in June, so the overall number of correct answers is not the whole story.

Given that general language learning models can learn incorrect lessons for some tasks by learning correct lessons about other tasks, it is imperative that performance be monitored constantly and that AI platforms “show their work” by having an open and transparent reasoning process.

OpenAI’s open source origin quickly shifted to closed source under the control of a profit-driven corporation. With its commitment to a “blackbox” mentality, refusing to share access to how its product operates, any users should practice “buyer beware” on any ChatGPT generated results.