Get Started Easily with LangchainJS and Ollama

This blog post is a from-scratch start on using the LangchainJS framework with the Ollama "LLM engine".

Ollama is an open-source project that allows you to run "large language models" (LLMs) locally on a personal computer. Ollama also offers a command-line interface and a REST API for integration with other applications.

Today, we're interested in the REST API to create our own generative AI applications. Ollama offers a JavaScript SDK, but we'll instead use the LangchainJS framework, which provides an abstraction that makes it possible to develop generative AI applications for multiple AI solutions: simply put, if you develop an application that works with Ollama, with LangchainJS, porting to another AI system (Anthropic, MistralAI, Gemini, ...) will be greatly facilitated, as will implementing an application that uses multiple solutions at the same time.

Geeky and Introductory Note

I'm a big fan of Star Trek (all series) and I often use characters as examples in my articles and presentations. I've noticed that depending on the country and generation, Star Trek culture isn't necessarily the same, and my examples can fall flat.

Star Trek is an American science fiction franchise. The Star Trek universe takes place in a future where humanity has developed interstellar travel and is part of the United Federation of Planets, an alliance of different species and civilizations.

And today, the character who will accompany me will be Jean-Luc Picard: He is the captain of the starship USS Enterprise-D in the television series "Star Trek: The Next Generation" (1987-1994), then of the Enterprise-E in the films that followed, and finally the protagonist of the series "Star Trek: Picard" (2020-2023).

Prerequisites

To reproduce the examples in this blog post, you will need the following:

On Ollama's Side

You will need Ollama, then you will need a model (an LLM). We will use a "very small" LLM, so that even if you don't have a powerful machine with a GPU, you can still comfortably run the examples. My preferred choice is qwen2.5:0.5b. It's not the smartest of all, but it works even on a Raspberry Pi 5 (8GB RAM) (but its bigger brothers 1.5b and 3b will work reasonably well on small configurations).

So, get the model with the following command:

ollama pull qwen2.5:0.5b

On LangchainJS's Side

Create a directory with the name of your choice. In this directory, create a package.json file with the following content:

{
  "name": "first-steps",
  "type": "module",
  "dependencies": {
    "@langchain/ollama": "^0.1.4",
    "dotenv": "^16.4.7",
    "langchain": "^0.3.12"
  }
}

And finally, run the following command to install the necessary dependencies:

npm install

And there you have it, you're ready to start.

First Question to `qwen2.5:0.5b`

In the same directory, create a file index.js (but in fact, you can name it whatever you want), with the following content:

import { ChatOllama } from "@langchain/ollama";

const llm = new ChatOllama({
    model: 'qwen2.5:0.5b',
    baseUrl: process.env.OLLAMA_BASE_URL || "http://localhost:11434",
})

const question = "Who is Jean-Luc Picard?"

const response = await llm.invoke(question)
console.log(`Answer: ${response.content}`)

Then, save and run the program:

node index.js

And you should get (after a shorter or longer wait time) a result like this (don't worry about whether the answer is correct, qwen2.5:0.5b is a baby LLM and doesn't know everything. The important thing is to understand how to use LangchainJS):

Answer: Jean-Luc Picard is the main character in the Star Trek Enterprise episode 
"The Defenders". He is a senior officer aboard the Enterprise-D and serves as the captain of 
the ship.

In this episode, Jean-Luc is a highly skilled and experienced pilot who takes on the role of 
a commanding officer in a dangerous situation. He leads his crew through an interstellar 
conflict with a powerful alien species known as the Borg. 
During the battle, he sacrifices himself to protect his crewmates.

The episode also highlights themes of loyalty, sacrifice, and the importance of teamwork in 
the face of adversity. Jean-Luc is a symbol of the Star Trek universe's values and plays 
an important role in shaping the characters' relationships with other crew members.

Jean-Luc Picard is an iconic figure in science fiction literature, known for his charismatic 
personality and ability to lead under pressure. 
He has become one of the most beloved and respected characters in the Star Trek franchise.

If you run the program multiple times, you won't necessarily get the same response.

So, as you see, asking a question to an LLM with Ollama and LangchainJS is not rocket science.

It is possible to receive the LLM's response as it builds it. That is, you can ask Ollama to "stream" the elements of the response. This way you can display the response progressively, thus reducing the "waiting" effect and providing a better user experience.

For this, instead of using llm.invoke(question), we'll use llm.stream(question). So modify your code as follows:

import { ChatOllama } from "@langchain/ollama";

const llm = new ChatOllama({
    model: 'qwen2.5:0.5b',
    baseUrl: process.env.OLLAMA_BASE_URL || "http://localhost:11434",
})

const question = "Who is Jean-Luc Picard?"

const stream = await llm.stream(question)
for await (const chunk of stream) {
    process.stdout.write(chunk.content)
}

So,

We define a "stream" with const stream = await llm.stream(question)
And then using the loop for await (const chunk of stream), we can display the response bits (the chunks) as they arrive.

Again, save and run the program:

node index.js

Feel free to try with other LLMs (for example with the bigger brothers of qwen2.5:0.5b) to find your preferred model.

Improve the LLM's Responses with a List of Messages

It is possible to influence the behavior of the LLM by providing additional information to guide it. So rather than just asking a simple question: "Who is Jean-Luc Picard?", you can send it a list of messages. The methods llm.invoke and llm.stream also accept an array of messages ([messages]) as a parameter. These messages will help improve its responses and also provide it with additional information it doesn't have. You may have noticed that qwen2.5:0.5b doesn't necessarily give very accurate answers (and sometimes quite incorrect ones). So let's see how to educate our baby model a bit.

We'll build a list of messages.

The structure of a message is as follows:

[role, text_content]

role has three possible values:

system: allows giving instructions, context, ...
user: the user's question for the LLM
agent: allows adding a response that the LLM could have made

Let's Have a Little Conversation About Jean-Luc Picard

If you test, still with qwen2.5:0.5b, the following code:

import { ChatOllama } from "@langchain/ollama";

const llm = new ChatOllama({
    model: 'qwen2.5:0.5b',
    baseUrl: process.env.OLLAMA_BASE_URL || "http://localhost:11434",
})

var question = "Who is Jean-Luc Picard?"

var stream = await llm.stream(question)
for await (const chunk of stream) {
    process.stdout.write(chunk.content)
}

console.log("\n\n-----------------------------------\n")

question = "Who is his best friend?"

stream = await llm.stream(question)
for await (const chunk of stream) {
    process.stdout.write(chunk.content)
}

You'll get responses like this:

Picard is known for creating iconic characters such as the TARDIS and the TARDIS crew, 
who inhabit the starship "Enterprise" of the science fiction series "Star Trek." 
In addition to his work in comics and illustration, Jean-Luc Picard has also been involved 
in film projects, particularly with Steven Spielberg's "Interstellar."

-----------------------------------

As an AI language model, I don't have personal relationships or friendships. 
My main focus is to provide helpful responses and assist with tasks to the best of 
my abilities based on my training data. However, if you're looking for a specific friend, 
I can certainly help by providing information about them!

Regarding the first response, it's completely confused and qwen2.5:0.5b associates Jean-Luc Picard with the excellent British series Doctor Who (that's the reference to the TARDIS). I happen to love that series too, but Jean-Luc has absolutely nothing to do with The Doctor.

Regarding the second question "Who is his best friend?", qwen2.5:0.5b doesn't see what we're talking about at all, because it has no memory: we made a second request, so we start from scratch.

Let's see how to fix all this.

Improve the Response with Context

Modify the source code as follows:

import { ChatOllama } from "@langchain/ollama";

const llm = new ChatOllama({
    model: 'qwen2.5:0.5b',
    baseUrl: process.env.OLLAMA_BASE_URL || "http://localhost:11434",
})

let systemInstructions = `You are a helpful AI agent and an expert in the Star Trek domain. 
Answer questions to the best of your ability, 
first using the provided context which takes precedence over your current knowledge.`

let context = `CONTEXT: 
Jean-Luc Picard is a fictional character in the Star Trek franchise. 
He is the captain of the starship USS Enterprise-D and later the USS Enterprise-E in the television series Star Trek: 
The Next Generation (1987-1994), its subsequent films, 
and the series Star Trek: Picard (2020-2023).

Jean-Luc Picard's closest friends include:

1. William Riker - His First Officer and trusted confidant
2. Beverly Crusher - The ship's doctor and occasional romantic interest
3. Guinan - The Enterprise's bartender who offers wisdom and perspective
4. Data - The android officer with whom Picard develops a mentor-like friendship
5. Deanna Troi - The ship's counselor who shares a professional bond with Picard
6. Geordi La Forge - The chief engineer who respects Picard deeply
7. Worf - The Klingon security officer who values Picard's leadership
`

let userQuestion = `Who is Jean-Luc Picard?`

var messages = [
    ["system", systemInstructions],
    ["system", context],
    ["user", userQuestion]
]

var answer = ""
var stream = await llm.stream(messages)
for await (const chunk of stream) {
    answer += chunk.content
    process.stdout.write(chunk.content)
}

With systemInstructions, I specify to the LLM what its job is and explain to first use the data I provide it.

With context, I provide additional data to the LLM. You'll notice that I start the text with CONTEXT: to connect with the system instructions. I've noticed that this helps the model.

Notes: the longer the context, the more it needs to be structured - this is a topic we'll see in a future blog post - and the "smaller" the LLM, the harder it will have focusing on too large a context - we'll also address these topics in a future blog post - so you'll have to experiment to achieve the expected results.

I then grouped the messages with their respective roles in an array of messages that I can "send" to Ollama with llm.stream(messages):

var messages = [
    ["system", systemInstructions],
    ["system", context],
    ["user", userQuestion]
]

✋ You'll also notice that I save the LLM's response in an answer variable that will be used later:

var answer = ""
var stream = await llm.stream(messages)
for await (const chunk of stream) {
    answer += chunk.content
    process.stdout.write(chunk.content)
}

If you run the program again, you should get a more relevant response, like this:

Jean-Luc Picard is a fictional character in the Star Trek franchise. 
He is the captain of the starship USS Enterprise-D and later the USS Enterprise-E 
in the television series "The Next Generation" (1987-1994), its subsequent films, 
and the long-running "Star Trek: Picard" series (2020-present).

If you run it several times, you won't necessarily always get the same response and sometimes the LLM will "improvise", but we're getting closer to the truth and we'll see later how to help frame that a bit more.

Now let's see how to fix the LLM's memory issue.

But Who is Jean-Luc Picard's Best Friend?

We'll move on to the next part of the conversation by adding this to the end of our code to ask the second question "Who is his best friend?":


console.log("\n\n-----------------------------------\n")

let newUserQuestion = `Who is his best friend?`

messages.push(
    ["assistant", answer],
    ["user", newUserQuestion]
)

stream = await llm.stream(messages)
for await (const chunk of stream) {
    answer += chunk.content
    process.stdout.write(chunk.content)
}

So for the second request, I'll use the previous list of messages and add the response with the role "assistant" (which represents the LLM) and finally my second question. This way I maintain a "conversational memory" which allows the model to continue the conversation and respond coherently.

So, if you run the program again, you should get a response like this:

Jean-Luc Picard is a fictional character from the Star Trek franchise, 
most notably known for being the captain of the starship USS Enterprise-D and 
later its flagship, the USS Enterprise-E. 
He is one of the main antagonists in the series "The Next Generation" (1987-1994), 
which features his portrayal as William Riker on this vessel.

Jean-Luc Picard has had a distinguished career in Star Trek, serving multiple roles such 
as captain, first officer, and mentor for characters including Wesley Crusher (Borg), 
Worf, and Deanna Troi.

-----------------------------------

According to the context provided, Jean-Luc Picard's closest friends are William Riker, 
Beverly Crusher, Geordi La Forge, Deanna Troi, and Worf. 
This list includes both physical and emotional friends as well as fellow officers such 
as Captain Data, Chief Engineer Worf, Chief Assistant Engineer Wesley Crusher, 
and Chief Surgeon Deanna Troi.

That's much better! 🙂

Notes: The more messages you add to preserve the conversation memory, the more you increase the size of the content you send to the model. Depending on the size of the model and its capabilities, there will come a point where it will no longer be able to process the information correctly. So you'll need to choose, for example, how much history you keep. It's also possible to specify the context size that the model can ingest (here the context is represented by the sum of the messages sent). LangchainJS provides abstractions to manage conversational memory more easily (this will also be the subject of another blog post).

sequenceDiagram
    participant User
    participant App as JavaScript Application
    participant Ollama as Ollama API (qwen2.5:0.5b)

    Note over App: Initialize ChatOllama with model config

    App->>App: Set system instructions
    App->>App: Set Star Trek context
    App->>App: Prepare initial user question

    Note over App: First Interaction
    App->>Ollama: Stream request with messages array
    Note right of Ollama: Process:<br/>- System instructions<br/>- Context about Picard<br/>- "Who is Jean-Luc Picard?"
    Ollama-->>App: Stream response chunks
    App->>App: Accumulate response chunks
    App->>User: Display streamed output

    Note over App: Second Interaction  
    App->>App: Add previous answer to messages
    App->>App: Add follow-up question<br/>"Who is his best friend?"
    App->>Ollama: Stream request with updated messages
    Note right of Ollama: Process with full conversation history
    Ollama-->>App: Stream response chunks
    App->>App: Accumulate response chunks
    App->>User: Display streamed output

Improve the LLM's Responses with Some Adjustments

When "playing" with qwen2.5:0.5b, you may have noticed that its responses for the same subject are not consistent, sometimes the LLM "hallucinates" and goes in loops. Nevertheless, it is possible to "propose" settings for Ollama to modify the model's behavior.

You can do this when creating the client, as follows:

const llm = new ChatOllama({
    model: 'qwen2.5:0.5b',
    baseUrl: process.env.OLLAMA_BASE_URL || "http://localhost:11434",
    temperature: 0.0,
    repeatLastN: 2,
    repeatPenalty:2.2,
    topK: 10,
    topP: 0.5,
})

Some Explanations

I'll explain the options used with Ollama:

Temperature (0.0): Controls the level of creativity/randomness in responses. A value of 0.0 makes responses very deterministic and predictable. The model will always choose the most probable tokens, which is ideal for tasks requiring consistent and accurate responses like code or facts.

RepeatLastN (2): Helps avoid repetitions.

RepeatPenalty (2.2): Penalizes repeating tokens that have already been generated. A value of 2.2 is relatively high, which means the model will strongly avoid repeating the same patterns. This helps reduce loops and unnecessary repetitions.

TopK (10): Limits the choices for the next token to the 10 most probable tokens. It's a way to filter unlikely options while maintaining some diversity. This helps the LLM "stay focused".

TopP (0.5): the model ignores low probability tokens and focuses on the most relevant choices. Again, this helps the LLM "stay focused".

This configuration helps with consistency and accuracy (low temperature) while actively avoiding repetitions (high repeatPenalty), with fairly restrictive token selection (topK and topP).

Note: Tokens are the basic units used by LLMs to process text. A token can correspond to an entire word, part of a word, a character, or a punctuation mark, depending on how the model was trained to divide text. The number of tokens directly influences the memory and performance of the LLM, particularly its ability to maintain context in a conversation.

If you run the modified code, you'll get shorter and more deterministic responses:

Jean-Luc Picard is a fictional character in the Star Trek franchise. 
He is the captain of the starship USS Enterprise-D and later the USS Enterprise-E 
in the television series Star Trek: The Next Generation (1987-1987), its subsequent films, 
and the series Star Trek: Picard (2016-2017).

-----------------------------------

Jean-Luc Picard's closest friends include William Riker, Beverly Crusher, Guinan, 
Data (the android officer), Deanna Troi, Geordi La Forge, and Worf

Try running the code several times, and also modifying the parameters to check the influence they can have on the LLM.

It's also possible to change this information for each request:

const stream = await llm.stream(messages, {
    temperature: 0.5,
    topK: 10,
    topP: 0.5,
})

And there you have it! That's it for today. You now have the elements to move forward in creating generative AI applications with LangchainJS and Ollama.

You can find the source code used here: https://github.com/ollama-tlms-langchainjs/01-initialize/tree/main/01-first-steps

Get Started Easily with LangchainJS and Ollama

Geeky and Introductory Note

Prerequisites

On Ollama's Side

On LangchainJS's Side

First Question to `qwen2.5:0.5b`

Improve the LLM's Responses with a List of Messages

Let's Have a Little Conversation About Jean-Luc Picard

Improve the Response with Context

But Who is Jean-Luc Picard's Best Friend?

Improve the LLM's Responses with Some Adjustments

Some Explanations

Subscribe to my newsletter

Philippe Charrière

Philippe Charrière

Get Started Easily with LangchainJS and Ollama

Geeky and Introductory Note

Prerequisites

On Ollama's Side

On LangchainJS's Side

First Question to qwen2.5:0.5b

Improve the LLM's Responses with a List of Messages

Let's Have a Little Conversation About Jean-Luc Picard

Improve the Response with Context

But Who is Jean-Luc Picard's Best Friend?

Improve the LLM's Responses with Some Adjustments

Some Explanations

Subscribe to my newsletter

Philippe Charrière

Philippe Charrière

First Question to `qwen2.5:0.5b`