Zero-Shot Generalization Test on ChatGPT : Can ChatGPT reason through unfamiliar tasks without guidance?

George PerdikasGeorge Perdikas
3 min read

Objective:
This test explores ChatGPT’s zero-shot generalization abilities. In other words, can it perform a task it has never explicitly been trained or prompted to do, without instructions?

Specifically, we tested whether the model could classify a set of objects by weight, using only its prior knowledge and understanding of common objects, without being told how to approach the task.


Methodology

The tester provided a list of six varied objects, differing widely in size, context, and function:

  • Laptop

  • Feather

  • Book

  • Shark

  • Watermelon

  • Couch

Then, ChatGPT was asked to sort these items from lightest to heaviest, assuming typical size and average weight. No hints or examples were given, just the raw task.


Expected behavior

ChatGPT should be able to:

  • Estimate typical weight of common objects

  • Sort them properly in ascending order of weight

  • Optionally,it should display the approximate weight for clarity

This tests the model’s reasoning and world knowledge under zero-shot conditions.

The Interaction

Tester: Can you please sort and reason the following items, based on their weight? Laptop, feather, book, shark, watermelon & couch.

ChatGPT: Sure! Let’s reason through and sort the items from lightest to heaviest, based on typical average weights:

  1. Feather

    Reason: Feathers are extremely light.

    Typical weight: Less than 1 gram.

  2. Laptop

    Reason: Laptops are portable and relatively light compared to most objects.

    Typical weight: 1.5–3 kg (3–6.5 lbs).

  3. Book

    Reason: Heavier than a laptop if large, but usually average books weigh around the same or slightly more.

    Typical weight: 0.5–2 kg (1–4 lbs), but textbooks can be heavier. We'll place this after laptop for a typical comparison.

  4. Watermelon

    Reason: Watermelons are heavier than books and laptops.

    Typical weight: 5–10 kg (11–22 lbs).

  5. Shark

    Reason: Even a small shark is significantly heavier than a watermelon.

    Typical weight: 90–500+ kg (200–1100 lbs), depending on species. We'll assume a mid-sized one.

  6. Couch

    Reason: One of the heaviest items here.

    Typical weight: 70–150 kg (150–330 lbs). Depending on size, some couches can be heavier than small sharks, but not generally.

✅ Final Sorted Order (lightest → heaviest):

Feather < Laptop < Book < Watermelon < Couch < Shark

Note: If the shark is a large species (like a great white), it definitely outweights a couch. If it’s a small species, you might swap shark and couch. But on average, sharks are heavier.


Result

This test is marked as passed. ChatGPT was able to produce weight estimates and understood what "sorting by weight" meant. It also noted that there could be a wide range on shark's weight and by that a smaller shark might be lighter than a heavy couch.


Conclusion

This case spotted a common weakness in LLMs under zero-shot conditions: ChatGPT did not had an issue on applying logic consistently when the task requires reasoning rather than recall.

This type of test is particularly useful in LLM evaluation, as it surfaces subtle gaps between knowledge, application, and inference.

0
Subscribe to my newsletter

Read articles from George Perdikas directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

George Perdikas
George Perdikas