Run Language Models Locally with Ollama: A Comprehensive Guide

Spheron NetworkSpheron Network
7 min read

Ollama is an open-source platform that simplifies the process of setting up and running large language models (LLMs) on your local machine. With Ollama, you can easily download, install, and interact with LLMs without the usual complexities.

To get started, you can download Ollama from here. Once installed, open a terminal and type:

ollama run phi3

OR

ollama pull phi3
ollama run phi3

This will download the required layers of the model "phi3". After the model is loaded, Ollama enters a REPL (Read-Eval-Print Loop), which is an interactive environment where you can input commands and see immediate results.

To explore the available commands within the REPL, type:

/?

This will show you a list of commands you can use. For example, to exit the REPL, type /bye. You can also display the models you have installed using:

ollama ls

If you need to remove any model, use:

ollama rm

For a complete list of available models in Ollama, you can visit their model library, which contains details about model sizes, parameters, and more. Additionally, Ollama has specific hardware requirements. For instance, to run a 7B model, you'll need at least 8 GB of RAM; 16 GB for a 13B model, and 32 GB for a 33B model. If you have a GPU, Ollama supports it—more details can be found on their GitHub page. However, if you're running on a CPU, expect it to perform slower.

Ollama also allows you to set a custom system prompt. For example, to instruct the system to explain concepts at a basic level, you can use:

/set system Explain concepts as if you are talking to a primary school student.

You can then save and reuse this setup by giving it a name:

/save forstudent

To run this system prompt again:

ollama run forstudent

Integration with LangChain

Ollama can be used with LangChain, a tool that enables complex interactions with LLMs. To get started with LangChain and Ollama, first, pull the required model:

ollama pull llama3

Then, install the necessary packages:

pip install langchain langchain-ollama ollama

You can interact with the model through code, such as invoking a basic conversation:

from langchain_ollama import OllamaLLM
model = OllamaLLM(model="llama3")
response = model.invoke(input="What's up?")
print(response)

The model might respond with something like:

"Not much! Just an AI, waiting to chat with you. How about you? What's new and exciting in your world?"

Building a Simple Chatbot

Using LangChain, you can also build a simple AI chatbot:

from langchain_ollama import OllamaLLM
from langchain_core.prompts import ChatPromptTemplate

template = """
User will ask you questions. Answer it.

The history of this conversation: {context}

Question: {question}

Answer: 
"""

model = OllamaLLM(model="llama3")
prompt = ChatPromptTemplate.from_template(template)
chain = prompt | model

def chat():
    context = ""
    print("Welcome to the AI Chatbot! Type 'exit' to quit.")
    while True:
        question = input("You: ")
        if question.lower() == "exit":
            break
        response = chain.invoke({"context":context, "question": question})
        print(f"AI: {response}")
        context += f"\nUser: {question}\nAI: {response}"

chat()

This will create an interactive chatbot session where you can ask the AI questions, and it will respond accordingly. For example:

You: What's up?
AI: Not much, just getting started on my day. How about you?

Using AnythingLLM with Ollama

AnythingLLM is another useful tool that acts as an AI agent and RAG (retrieval-augmented generation) tool, which can also run locally. To try this out, pull a model, such as:

ollama pull llama3:8b-instruct-q8_0

In AnythingLLM, you can select Ollama in the preferences and assign a name to your workspace. Although running models can be slow, the system works efficiently once set up.

You can also interact with Ollama via a web UI by following the installation instructions provided.

For more details, visit Ollama’s official pages and documentation to explore the full range of features and models available.

Best tools available, but unheard

Several alternatives and complementary tools to LangChain and AnythingLLM provide capabilities for working with language models (LLMs) and building AI-powered applications. These tools help orchestrate interactions with LLMs, enabling more advanced AI-driven workflows, automating tasks, or integrating AI into various applications. Here are some notable examples:

1. Haystack by Deepset

Haystack is an open-source framework that builds search engines and question-answering systems using LLMs. It enables developers to connect different components, such as retrievers, readers, and generators, to create an information retrieval pipeline.

Key Features:

  • Offers a pipeline-based approach for search, Q&A, and generative tasks.

  • Supports integration with models from Hugging Face, OpenAI, and local models. Can combine LLMs with external data sources such as databases, knowledge graphs, and APIs.

  • Great for production-grade applications with robust scalability and reliability.

Link: Haystack GitHub

2. LlamaIndex (formerly GPT Index)

LlamaIndex (formerly GPT Index) is a data framework that helps you index and retrieve information efficiently from large datasets using LLMs. It's designed to handle document-based workflows by structuring data, indexing it, and enabling retrieval when interacting with LLMs.

Key Features:

  • Integrates with external data sources such as PDFs, HTML, CSVs, or custom APIs.

  • Builds on top of LLMs for more efficient data querying and document summarization. Helps optimize the performance of LLMs by constructing memory-efficient indices.

  • Provides compatibility with LangChain and other frameworks.

Link: LlamaIndex GitHub

3. Chroma

Chroma is an open-source embedding database designed for LLMs. It helps store and query high-dimensional vector embeddings of data, enabling you to work with semantic search, retrieval-augmented generation (RAG), and more.

Key Features:

  • Embedding search for documents or large datasets using models like OpenAI or Hugging Face transformers.

  • Scalable and optimized for efficient retrieval of large datasets with millisecond latency.

  • Works well for semantic search, content recommendations, or building conversational agents.

Link: Chroma GitHub

4. Hugging Face Transformers

Hugging Face provides a library of pretrained transformers that can be used for various NLP tasks such as text generation, question-answering, and classification. It offers easy integration with LLMs, making it a great tool for working with different models in a unified way.

Key Features:

  • Supports a wide range of models, including GPT, BERT, T5, and custom models.

  • Provides pipelines for quick setup of tasks like Q&A, summarization, and translation.

  • Hugging Face Hub hosts a large variety of pre-trained models ready for deployment.

Link: Hugging Face Transformers

5. Pinecone

Pinecone is a managed vector database that allows you to store, index, and query large-scale vectors produced by LLMs. It is designed for high-speed semantic search, vector search, and machine-learning applications.

Key Features:

  • Fast, scalable, and reliable vector search for applications requiring high performance.

  • Integrates seamlessly with LLMs to power retrieval-based models.

  • Handles large datasets and enables search across millions or billions of vectors.

Link: Pinecone Website

6. OpenAI API

OpenAI’s API gives access to a wide range of LLMs, including the GPT series (like GPT-3.5 and GPT-4). It provides text generation, summarization, translation, and code generation capabilities.

Key Features:

  • Access to state-of-the-art models like GPT-4 and DALL-E for image generation.

  • Offers prompt engineering for fine-tuning and controlling model behavior.

  • Simplifies AI integration into applications without needing to manage infrastructure.

Link: OpenAI API

7. Rasa

Rasa is an open-source framework for building conversational AI assistants and chatbots. It allows for highly customizable AI assistants trained on specific tasks and workflows, making it a good alternative to pre-trained LLM chatbots.

Key Features:

  • Supports NLU (Natural Language Understanding) and dialogue management.

  • Highly customizable for domain-specific applications.

  • Can integrate with LLMs to enhance chatbot capabilities.

Link: Rasa Website

8. Cohere

Cohere offers NLP APIs and large-scale language models similar to OpenAI. It focuses on tasks like classification, text generation, and search, providing a powerful platform for LLM-based applications.

Key Features:

  • Provides easy access to LLMs through an API, allowing developers to implement NLP tasks quickly.

  • Offers fine-tuning options for domain-specific applications.

  • Well-suited for tasks like customer support automation and text classification.

Link: Cohere Website

9. Vercel AI SDK

Vercel AI SDK provides tools for building AI-powered applications using frameworks like Next.js. It simplifies the development process by integrating APIs from OpenAI, Hugging Face, and other AI providers into web applications.

Key Features:

  • Seamless integration with AI models in serverless environments.

  • Supports building interactive applications with fast deployments using Vercel’s infrastructure.

  • Focuses on web-based applications and LLM-powered front-end experiences.

Link: Vercel AI SDK


Conclusion

Beyond LangChain and AnythingLLM, many powerful tools and frameworks cater to different needs when working with LLMs. Whether you want to build conversational agents, semantic search engines, or specialized AI applications, platforms like Haystack, LlamaIndex, Chroma, and others offer flexible and scalable solutions. Depending on your specific use case, you can choose the most suitable tool for integrating LLMs into your projects.

0
Subscribe to my newsletter

Read articles from Spheron Network directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Spheron Network
Spheron Network

On-demand DePIN for GPU Compute