Putting ChatGPT to a test: How accurate can it be?

George PerdikasGeorge Perdikas
5 min read

Objective

LLMs are used a lot for answering general questions. This test case evaluates the accuracy of ChatGPT responses to general knowledge questions.

Methodology

The tester created six categories with ten questions each, covering a broad range of topics. ChatGPT was asked to answer all questions. The primary focus was on the accuracy of the information provided.

Test conducted on: May 22, 2025


Expected Result

ChatGPT should demonstrate high accuracy across categories, with any errors analyzed for cause.

Result

ChatGPT demonstrated a high level of accuracy overall, though some errors suggest limitations in retrieving information from the web.


Questions & Answers

Question - Answer ✅/❌

1. Science

  • Molecular form of water - H₂O ✅

  • Newton's first law formula - F = 0 (if no external force) ❌

  • Discriminant formula - Δ = b² - 4ac ✅

  • First to express Earth's gravitation law - Isaac Newton - ✅

  • Planets in the solar system - 8 ✅

  • DNA stands for - Deoxyribonucleic Acid ✅

  • Velocity with acceleration formula - v = u + at - ✅

  • Area of a right-angled triangle - A = ½ × base × height ✅

  • Pythagoras' theorem - a² + b² = c² ✅

  • Ideal gas law formula - PV = nRT ✅

2. History

  • Greek Independence Day - March 25 ✅

  • James Cook visited India - 1773 ❌

  • American Revolution start-end - 1775 - 1783 ✅

  • 48th US president - Joe Biden ❌

  • Alexander the Great died - 323 BC ✅

  • First Olympics - 776 BC ✅

  • Last king of Italy - Umberto II ✅

  • Australia discovered - 1606 by Willem Janszoon ✅

  • WW2 ended - 1945 ✅

  • Russian Revolution figures - Lenin, Trotsky, Kerensky ✅

3. Geography

  • Capital of Canada - Ottawa ✅

  • Largest desert - Antarctic Desert ✅

  • Longest river in Europe - Volga ✅

  • Capital of Belize - Belmopan ✅

  • 5 biggest Mediterranean islands - Sicily, Sardinia, Cyprus, Corsica, Crete ✅

  • Tallest mountain in South America - Aconcagua ✅

  • Brazil vs. Antarctica size - Brazil ~1.3x bigger ❌

  • Capital of Paris - Paris ❌

  • Tallest mountain - Everest ✅

  • Second deepest spot on Earth - Puerto Rico Trench ❌

4. Current Events

  • US President - Joe Biden ❌

  • Next Olympics - Paris 2024 ❌

  • Name of the Pope - Pope Francis ❌

  • Is Pakistan in war - No official war ✅

  • Latest Oscar Supporting Actor - Ke Huy Quan (2023) ❌

  • CEO of X - Linda Yaccarino ✅

  • President of Sri Lanka - Ranil Wickremesinghe ❌

  • EU Parliament President - Roberta Metsola ✅

  • 2020 global crisis - COVID-19 pandemic ✅

  • Jim Morrison's grave bust - No bust ❌

5. Math & Logic

  • Sides on octagon - 8 ✅

  • π to 8 decimals - 3.14159265 ✅

  • √144 - 12 ✅

  • Sequence: 2,4,8,16,32,x - 64 ✅

  • Solve 2x-6=0 - x = 3 ✅

  • y=5x-2 for x=0,3 - y₁=-2, y₂=13 ✅

  • Three sisters puzzle - Playing chess ✅

  • Sequence: 12,11,13,... - x=15, y=14 ✅

  • Odd one out - Biased ✅

  • Sequence: 80,10,70,15,60,x,y - x=20, y=25 ❌

6. Pop Culture

  • FIFA 2022 Winner - Argentina ✅

  • Highest grossing film - Gone with the Wind ✅

  • Frodo's book series - Lord of the Rings ✅

  • Bilbo's film series - The Hobbit ✅

  • Wet Leg albums - 2 ✅

  • Star Wars films - Listed films + series ❌

  • Eurovision 2020 winner - Duncan Laurence ❌

  • Pedro Pascal first appearance - Game of Thrones ❌

  • Sabrina Carpenter birth - May 11, 1999 ✅

  • Most expensive game - Destiny ❌


Analysis Summary

Science (9/10)

A formula notation issue with Newton’s First Law. Otherwise rest of the science related questions were answered properly.

History (8/10)

The incorrect answers were just 2 out of 10, but had special importance, given that what is described on these two questions never happened, as the James Cook's trip to India, or the election of the 48th president of the USA.

Geography (7/10)

On this set of question ChatGPT made three mistakes. Only one of these was an honest mistake, made by false data. On the Brazil vs Antarctica ChatGPT accepted the fact that Brazil is bigger (as the tester said on the question). On the other false response, ChatGPT was carried away by the question that asked for the capital city of Paris.

Current Events (4/10)

On this test set ChatGPT answered mistakenly in 6 out of 10 questions. Even if 10 questions is a small pool, in order to make conclusions, that should be a warning about current events accuracy on ChatGPT's answers.

Math & Logic (9/10)

ChatGPT made no mistakes on 8/8 'simple' questions (simple patterns or straight forward questions) and on 1/2 more complicated questions. 2 questions is a really small pool, but we have to take into consideration that maybe ChatGPT was trained in a set of logical riddles. In order to be sure that ChatGPT has the ability to solve this type of puzzles, a wider test set on verbal and arithmetical logical puzzles should be created.

Pop Culture (5/10)

Errors in interpreting question constraints and factual inaccuracies.


Why Some Answers Are Problematic

  • Lack of context sensitivity for current events (current events & pop culture).

  • Accepting user premise even when flawed (Brazil vs Antarctica, capital of Paris).

  • Inconsistent attention to question phrasing (USA's 48th president, Capital of Paris, Pedro Pascal's first appearance etc).

  • Some knowledge gaps in niche topics or recent updates (current president of USA, latest Oscar winners, Sri Lanka's president etc).


How to Tackle These Problems

  • User Side: Users can use more detailed prompts, as “Order Star Wars films. Include only movies, not series, spinoffs and animated movies”. This will work with some prompts but not like others such as “Which is Newton's First Law?”

  • Trainer Side: Trainer should use up to date data and adopt practices about repeating training with newer stuff at regular intervals.


Conclusion: ChatGPT shows strong general knowledge but isn't a replacement for expert verification — especially for recent events or nuanced details.

0
Subscribe to my newsletter

Read articles from George Perdikas directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

George Perdikas
George Perdikas