Serving LLMs Locally with Ollama


When working with large language models (LLMs) locally, Ollama presents a powerful and efficient solution. Ollama is a tool that allows developers to run several open-source LLMs on their own machines without relying on cloud-based services. This makes it an excellent option for development and testing purposes, especially for those who need to work with AI models offline or prefer local execution.
Getting Started with Ollama
To use Ollama, you first need to download and install it on your local machine. Once installed, you can pull the models you wish to use with the ollama
command-line tool. Some of the popular models available include:
Meta’s Llama 3.1
Google’s Gemma
Alibaba’s Qwen
MistalAI’s Mistral 7B
For example, to install the Gemma 2B model locally, use the following command:
$ ollama pull gemma:2b
Similarly, to use MistralAI’s Mistral 7B model, execute:
$ ollama pull mistral:7b
To check which models are installed on your machine, use:
$ ollama list
For more detailed information on the installed models, you can query Ollama's API:
$ http http://localhost:11434/api/tags -b
Using Ollama with Spring AI
Once the models are installed and Ollama is running, you can integrate them into your project using Spring AI’s Ollama starter dependency:
implementation 'org.springframework.ai:spring-ai-ollama-spring-boot-starter'
Unlike cloud-based AI providers such as OpenAI or MistralAI, Ollama does not require an API key, as it runs locally. This simplifies setup and enhances privacy by keeping data on your machine.
By default, Spring AI uses the Mistral 7B model with Ollama. If you want to use a different model, specify it using the spring.ai.ollama.chat.model
property in your configuration:
spring.ai.ollama.chat.model=gemma:2b
This setting directs Spring AI to use the Gemma 2B model from Ollama running on your local machine.
Exploring More
For a practical implementation of a Spring AI-based project, you can check out this GitHub repository: Board Game Buddy - GitHub
Conclusion
Ollama is an excellent choice for running AI models locally without the need for cloud services. By integrating it with Spring AI, developers can streamline AI development while maintaining full control over their data and computational resources. Whether for development, testing, or privacy-sensitive applications, Ollama provides a robust and flexible solution for working with LLMs on your local machine.
Subscribe to my newsletter
Read articles from Ilkay Polat directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
