Get Structured Outputs from OpenAI Using Pydantic

Are you struggling to get structured outputs from OpenAI? Tired of inconsistent formats and hallucinations? 🤯

Structured Outputs

Well, worry no more! OpenAI has launched structured outputs in its beta version. Here's how you can take advantage of it.

Using Pydantic Models for Structured Outputs

First, you need to define your Pydantic models, which serve as a schema for the values you want to extract or generate.

Step 1: Define Your Schema

Let's define a schema to extract novel details from OpenAI. Create a file called schema.py and add the following:

from pydantic import BaseModel, Field
from typing import List
from datetime import date

class NovelDetails(BaseModel):
    novel_name: str = Field(..., description="The name of the novel")
    writer_name: str = Field(..., description="The author's name")
    year_published: date = Field(..., description="The publication year of the novel")

class Novel(BaseModel):
    novels: List[NovelDetails] = Field(..., description="List of novels with their details")

Step 2: Use OpenAI's Structured Output API

Now, let's use OpenAI's beta structured output API to extract these fields.

import openai
from schema import Novel

openai.api_key = "xxxxxxxxxxxxxxxxxxxxx"

response = openai.beta.chat.completions.create(
    model="gpt-4-turbo",
    messages=[
        {"role": "system", "content": "You are a book expert with vast knowledge about books. Answer user questions accurately."},
        {"role": "user", "content": "Give me a list of 10 thriller novels."}
    ],
    response_format={"type": "json_schema", "schema": Novel.model_json_schema()}
)

parsed_data = Novel(**response.choices[0].message.parsed)
print(parsed_data)

What’s Happening Here?

Define the schema: The Novel class ensures that the API response follows a strict format.
Send a request: Using openai.beta.chat.completions.create, we request OpenAI to return structured data.
Parse the response: The Novel model ensures that the extracted data is well-structured and correctly formatted.

Output Example

{
  "novels": [
    {
      "novel_name": "Gone Girl",
      "writer_name": "Gillian Flynn",
      "year_published": "2012-05-24"
    },
    {
      "novel_name": "The Girl with the Dragon Tattoo",
      "writer_name": "Stieg Larsson",
      "year_published": "2005-08-23"
    }
  ]
}

How to get structured outputs from Openai every time.