Run Language Models Locally with Ollama: A Comprehensive Guide
Ollama is an open-source platform that simplifies the process of setting up and running large language models (LLMs) on your local machine. With Ollama, you can easily download, install, and interact with LLMs without the usual complexities.
To get started, you can download Ollama from here. Once installed, open a terminal and type:
ollama run phi3
OR
ollama pull phi3
ollama run phi3
This will download the required layers of the model "phi3". After the model is loaded, Ollama enters a REPL (Read-Eval-Print Loop), which is an interactive environment where you can input commands and see immediate results.
To explore the available commands within the REPL, type:
/?
This will show you a list of commands you can use. For example, to exit the REPL, type /bye
. You can also display the models you have installed using:
ollama ls
If you need to remove any model, use:
ollama rm
For a complete list of available models in Ollama, you can visit their model library, which contains details about model sizes, parameters, and more. Additionally, Ollama has specific hardware requirements. For instance, to run a 7B model, you'll need at least 8 GB of RAM; 16 GB for a 13B model, and 32 GB for a 33B model. If you have a GPU, Ollama supports it—more details can be found on their GitHub page. However, if you're running on a CPU, expect it to perform slower.
Ollama also allows you to set a custom system prompt. For example, to instruct the system to explain concepts at a basic level, you can use:
/set system Explain concepts as if you are talking to a primary school student.
You can then save and reuse this setup by giving it a name:
/save forstudent
To run this system prompt again:
ollama run forstudent
Integration with LangChain
Ollama can be used with LangChain, a tool that enables complex interactions with LLMs. To get started with LangChain and Ollama, first, pull the required model:
ollama pull llama3
Then, install the necessary packages:
pip install langchain langchain-ollama ollama
You can interact with the model through code, such as invoking a basic conversation:
from langchain_ollama import OllamaLLM
model = OllamaLLM(model="llama3")
response = model.invoke(input="What's up?")
print(response)
The model might respond with something like:
"Not much! Just an AI, waiting to chat with you. How about you? What's new and exciting in your world?"
Building a Simple Chatbot
Using LangChain, you can also build a simple AI chatbot:
from langchain_ollama import OllamaLLM
from langchain_core.prompts import ChatPromptTemplate
template = """
User will ask you questions. Answer it.
The history of this conversation: {context}
Question: {question}
Answer:
"""
model = OllamaLLM(model="llama3")
prompt = ChatPromptTemplate.from_template(template)
chain = prompt | model
def chat():
context = ""
print("Welcome to the AI Chatbot! Type 'exit' to quit.")
while True:
question = input("You: ")
if question.lower() == "exit":
break
response = chain.invoke({"context":context, "question": question})
print(f"AI: {response}")
context += f"\nUser: {question}\nAI: {response}"
chat()
This will create an interactive chatbot session where you can ask the AI questions, and it will respond accordingly. For example:
You: What's up?
AI: Not much, just getting started on my day. How about you?
Using AnythingLLM with Ollama
AnythingLLM is another useful tool that acts as an AI agent and RAG (retrieval-augmented generation) tool, which can also run locally. To try this out, pull a model, such as:
ollama pull llama3:8b-instruct-q8_0
In AnythingLLM, you can select Ollama in the preferences and assign a name to your workspace. Although running models can be slow, the system works efficiently once set up.
You can also interact with Ollama via a web UI by following the installation instructions provided.
For more details, visit Ollama’s official pages and documentation to explore the full range of features and models available.
Best tools available, but unheard
Several alternatives and complementary tools to LangChain and AnythingLLM provide capabilities for working with language models (LLMs) and building AI-powered applications. These tools help orchestrate interactions with LLMs, enabling more advanced AI-driven workflows, automating tasks, or integrating AI into various applications. Here are some notable examples:
1. Haystack by Deepset
Haystack is an open-source framework that builds search engines and question-answering systems using LLMs. It enables developers to connect different components, such as retrievers, readers, and generators, to create an information retrieval pipeline.
Key Features:
Offers a pipeline-based approach for search, Q&A, and generative tasks.
Supports integration with models from Hugging Face, OpenAI, and local models. Can combine LLMs with external data sources such as databases, knowledge graphs, and APIs.
Great for production-grade applications with robust scalability and reliability.
Link: Haystack GitHub
2. LlamaIndex (formerly GPT Index)
LlamaIndex (formerly GPT Index) is a data framework that helps you index and retrieve information efficiently from large datasets using LLMs. It's designed to handle document-based workflows by structuring data, indexing it, and enabling retrieval when interacting with LLMs.
Key Features:
Integrates with external data sources such as PDFs, HTML, CSVs, or custom APIs.
Builds on top of LLMs for more efficient data querying and document summarization. Helps optimize the performance of LLMs by constructing memory-efficient indices.
Provides compatibility with LangChain and other frameworks.
Link: LlamaIndex GitHub
3. Chroma
Chroma is an open-source embedding database designed for LLMs. It helps store and query high-dimensional vector embeddings of data, enabling you to work with semantic search, retrieval-augmented generation (RAG), and more.
Key Features:
Embedding search for documents or large datasets using models like OpenAI or Hugging Face transformers.
Scalable and optimized for efficient retrieval of large datasets with millisecond latency.
Works well for semantic search, content recommendations, or building conversational agents.
Link: Chroma GitHub
4. Hugging Face Transformers
Hugging Face provides a library of pretrained transformers that can be used for various NLP tasks such as text generation, question-answering, and classification. It offers easy integration with LLMs, making it a great tool for working with different models in a unified way.
Key Features:
Supports a wide range of models, including GPT, BERT, T5, and custom models.
Provides pipelines for quick setup of tasks like Q&A, summarization, and translation.
Hugging Face Hub hosts a large variety of pre-trained models ready for deployment.
Link: Hugging Face Transformers
5. Pinecone
Pinecone is a managed vector database that allows you to store, index, and query large-scale vectors produced by LLMs. It is designed for high-speed semantic search, vector search, and machine-learning applications.
Key Features:
Fast, scalable, and reliable vector search for applications requiring high performance.
Integrates seamlessly with LLMs to power retrieval-based models.
Handles large datasets and enables search across millions or billions of vectors.
Link: Pinecone Website
6. OpenAI API
OpenAI’s API gives access to a wide range of LLMs, including the GPT series (like GPT-3.5 and GPT-4). It provides text generation, summarization, translation, and code generation capabilities.
Key Features:
Access to state-of-the-art models like GPT-4 and DALL-E for image generation.
Offers prompt engineering for fine-tuning and controlling model behavior.
Simplifies AI integration into applications without needing to manage infrastructure.
Link: OpenAI API
7. Rasa
Rasa is an open-source framework for building conversational AI assistants and chatbots. It allows for highly customizable AI assistants trained on specific tasks and workflows, making it a good alternative to pre-trained LLM chatbots.
Key Features:
Supports NLU (Natural Language Understanding) and dialogue management.
Highly customizable for domain-specific applications.
Can integrate with LLMs to enhance chatbot capabilities.
Link: Rasa Website
8. Cohere
Cohere offers NLP APIs and large-scale language models similar to OpenAI. It focuses on tasks like classification, text generation, and search, providing a powerful platform for LLM-based applications.
Key Features:
Provides easy access to LLMs through an API, allowing developers to implement NLP tasks quickly.
Offers fine-tuning options for domain-specific applications.
Well-suited for tasks like customer support automation and text classification.
Link: Cohere Website
9. Vercel AI SDK
Vercel AI SDK provides tools for building AI-powered applications using frameworks like Next.js. It simplifies the development process by integrating APIs from OpenAI, Hugging Face, and other AI providers into web applications.
Key Features:
Seamless integration with AI models in serverless environments.
Supports building interactive applications with fast deployments using Vercel’s infrastructure.
Focuses on web-based applications and LLM-powered front-end experiences.
Link: Vercel AI SDK
Conclusion
Beyond LangChain and AnythingLLM, many powerful tools and frameworks cater to different needs when working with LLMs. Whether you want to build conversational agents, semantic search engines, or specialized AI applications, platforms like Haystack, LlamaIndex, Chroma, and others offer flexible and scalable solutions. Depending on your specific use case, you can choose the most suitable tool for integrating LLMs into your projects.
Subscribe to my newsletter
Read articles from Spheron Network directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Spheron Network
Spheron Network
On-demand DePIN for GPU Compute