Gen AI for JavaScript Devs: Hands-on with OpenAI SDK

Arsalan YaldramArsalan Yaldram
7 min read

In our previous posts, we explored the basics of Large Language Models (LLMs) and how to choose the right one for your needs. Now, it’s time to roll up our sleeves and dive into the code. This post is getting started with using the OpenAI SDK in JavaScript, enabling you to bring the power of ChatGPT directly into your projects.

Getting Your OpenAI Project Keys

First things first—let’s get you set up with the necessary credentials. Head over to the OpenAI platform and log in using your ChatGPT credentials (or create an account if you’re new). Once logged in, go to User Settings > OpenAI API. You might need to set up billing—it’s a quick process and just a formality to get started.

OpenAI organizes your work neatly: your account can house multiple projects under one organization, which is great for managing different AI applications.

Now, let’s create your Project Key. Navigate to API Keys > OpenAI API and click "Create secret key." Select your project, and voilà! Your key is ready. Make sure to copy and store it safely—you won’t be able to see it again. With this key in hand, you’re all set to start working with the OpenAI SDK.

💡
A quick reminder: working with the SDK isn’t free. As we covered in a previous post, LLMs are billed based on the tokens you use (both in your prompts and the AI’s responses). You can keep track of your usage in the OpenAI Billing Dashboard.

Making a Request with the OpenAI SDK

Now for the fun part—let’s get your Node.js project set up. Follow these steps to get up and running:

  1. Install the necessary packages:

     npm install openai dotenv
    
  2. Create a .env file in the root of your project, and add your OpenAI Project key:

     OPENAI_PROJECT_KEY=your_openai_project_key_here
    
  3. Create a file named app.js and paste the following code:

     require("dotenv/config");
     const { OpenAI } = require("openai");
    
     const openai = new OpenAI({
       apiKey: process.env.OPENAI_PROJECT_KEY,
     });
    
     async function main() {
       const response = await openai.chat.completions.create({
         model: "gpt-4o-mini",
         messages: [{ role: "user", content: "What is the capital of India?" }],
       });
    
       console.log("AI Response", response.choices[0].message);
     }
    
     main();
    
  4. After you run your code, you should see something like this in your terminal:

     AI Response {
       role: 'assistant',
       content: 'The capital of India is New Delhi.',
       refusal: null
     }
    

Congratulations! You’ve just made your first AI call. Let’s break down what’s happening:

  • Model: When working with the OpenAI SDK, it’s important to select the right LLM for your needs. OpenAI offers various models. Each model has unique strengths, context length, and pricing. For instance, GPT-3.5 is great for text generation, while gpt-4o and gpt-4o-mini can handle both text and images. Check the pricing for each model here. Consider your project’s requirements before choosing a model. I highly recommend you read my previous post on this topic.

  • Making the Request: You asked the AI a question—“What is the capital of India?”—which you sent in an array of messages with the role set to 'user'. This role tells the AI that the message is coming from a user.

  • The Response: The AI replies with a message where the role is 'assistant'. This indicates the response is from the AI, and the content contains the actual answer: “The capital of India is New Delhi.”

  • The Role Parameter: The role parameter is key to managing conversations. By setting role: 'user', you tell the AI that the message is from the user. When the AI responds, it uses role: 'assistant' to indicate that it’s replying. This system helps structure interactions.

Customizing Roles

You can add more depth to your interactions by customizing roles. For example, you can introduce a 'system' role to guide the AI’s behavior:

async function main() {
  const response = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [
        { role: "system", content: "You are a friendly teacher, teaching 4-year-olds." },
        { role: "user", content: "What is the capital of India?" }
    ],
  });

  console.log("AI Response", response.choices[0].message);
}

Here’s what the AI might say:

AI Response {
  role: 'assistant',
  content: "The capital of India is New Delhi! It's a big city where important government buildings are located. Have you ever seen pictures of New Delhi? It has beautiful parks and monuments!",
  refusal: null
}

In this example, the system role acts as a guide, setting the context for the AI. By telling the AI to be a friendly teacher, you influence how it responds to the user’s question. The system role is handy for scenarios where you need consistent behavior from the AI. For example:

  • Customer Support: Set the AI to be patient and empathetic.

  • Educational Tool: Instruct the AI to be detailed and thorough in its explanations.

  • Creative Writing: Guide the AI to adopt a particular style or tone, like being humorous or serious.

Working with Multimodal LLMs

Using multimodal LLMs like gpt-4o-mini is straightforward. These models can handle both text and images, offering a wide range of applications. Here’s how you can work with an image in your AI request:

require("dotenv/config");
const { OpenAI } = require("openai");
const fs = require("fs");

const openai = new OpenAI({
  apiKey: process.env.OPENAI_PROJECT_KEY,
});

async function main() {
  const base64Image = Buffer.from(fs.readFileSync("laptop.jpeg")).toString("base64");

  const response = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [
      {
        role: "system",
        content: "You are a friendly teacher, teaching 4-year-olds.",
      },
      {
        role: "user",
        content: [
          { type: "text", text: "What is this image about?" },
          {
            type: "image_url",
            image_url: { url: `data:image/png;base64,${base64Image}` },
          },
        ],
      },
    ],
  });

  console.log("AI Response", response.choices[0].message);
}

Here’s an example of what the AI might say:

AI Response {
  role: 'assistant',
  content: "The image shows a laptop, specifically a MacBook Pro, with a scenic nature background displayed on the screen. The background seems to depict hills and trees, suggesting a beautiful outdoor landscape. The laptop's keyboard and trackpad are visible as well. It appears to be resting on a colorful tablecloth.",
  refusal: null
}

In this case, the content for the user is an array of objects, containing both text and image data. The image_url can either be a base64 encoded string for images stored on your device or a regular URL for images from the web.

Streaming

The stream parameter is an option you can use when making requests to the OpenAI API. By default, when you request a completion from the API, the entire response is generated and sent back to you all at once. This can take some time, especially for longer responses.

When you set the stream parameter to true, the API sends the response back in smaller chunks as they are generated. This means you can start processing or displaying the response before the entire completion is finished.

You might want to use the stream parameter in situations where providing real-time feedback is crucial, such as in chat applications where streaming can make interactions feel more responsive. It’s also beneficial for tasks that generate long responses, as it allows you to start processing the data sooner rather than waiting for the entire response to be generated. Additionally, streaming can significantly enhance the user experience by reducing perceived latency, making your application feel faster and more interactive.

async function main() {
  const stream = await openAI.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [
        { role: "system", content: "You are a friendly teacher, teaching 4-year-olds." },
        { role: "user", content: "What is the capital of India?" }
    ],
    stream: true,
  });

  for await (const chunk of stream) {
    process.stdout.write(chunk.choices[0]?.delta?.content || "");
  }
}

main()

Conclusion

In this post, we finally got our hands dirty with some code, starting with the OpenAI Node.js SDK. I encourage you to experiment with different models and prompts to familiarize yourself with the SDK. In our next post, we’ll dive deeper into the OpenAI SDK, exploring LLM memory, tokens, temperature, and other parameters you can tweak. Until next time, happy coding!

0
Subscribe to my newsletter

Read articles from Arsalan Yaldram directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Arsalan Yaldram
Arsalan Yaldram

Experienced Lead Full Stack JavaScript Developer | React, Typescript, Node.js Crafting excellence for 5 years, I specialize in creating robust solutions with React, Typescript, and Node.js. My proficiency extends beyond coding—I foster collaborative environments, mentor teams, and cultivate knowledge-sharing cultures.