Running LLMs Locally Using OLLAMA for AI-based App Development


Large Language Models (LLMs) have revolutionized how we build intelligent applications, but working with them can be challenging when relying solely on cloud-based solutions. This post explores how developers can leverage OLLAMA to run powerful language models locally, streamlining development workflows while reducing costs and dependencies.
Understanding LLMs and Generative AI
What Are Large Language Models?
Large Language Models represent a breakthrough in artificial intelligence that can understand, generate, and manipulate human language with remarkable sophistication. Unlike traditional machine learning models that perform specific, narrow tasks, LLMs demonstrate broad capabilities across various language tasks—from answering questions and summarizing content to generating creative text and translating languages.
These models are "large" not just in their parameter count (often ranging from billions to trillions of parameters), but also in their ability to capture complex patterns and relationships within language. Their architecture, typically based on transformer neural networks, allows them to process and generate text with an understanding of context, semantics, and even subtle nuances of human communication.
The Rise of Generative AI
Generative AI refers to artificial intelligence systems that can create new content rather than simply analyzing existing data. In the context of language models, this means the ability to generate human-like text based on provided prompts or instructions.
This technology has enabled a new generation of applications that can:
Draft emails and documents
Generate code for software development
Create creative content like stories and poetry
Assist with research by summarizing findings
Power conversational interfaces and chatbots
The transformative potential of generative AI lies in its ability to augment human creativity and productivity across numerous domains, from software engineering to content creation.
Challenges in AI Application Development
The Cloud Dependency Problem
Building applications with state-of-the-art LLMs traditionally requires relying on cloud-based API services provided by companies like OpenAI, Anthropic, or other AI providers. While convenient, this approach introduces several significant challenges:
Cost Concerns: API calls to commercial LLM services can quickly become expensive, especially for applications with high usage or during the iterative development process when frequent testing is necessary.
Latency Issues: Network calls to remote APIs introduce latency that can degrade user experience, particularly for applications requiring real-time responses.
Privacy and Data Security: Sending potentially sensitive data to third-party services raises privacy concerns, especially when developing applications that handle confidential or proprietary information.
Development Workflow Bottlenecks
Beyond infrastructure challenges, the development process itself faces friction:
Dependency on Internet Connectivity: Without local models, development grinds to a halt during internet outages or in environments with limited connectivity.
Limited Customization: Cloud-based APIs offer less flexibility to adjust model parameters or behaviors compared to running models directly.
Testing Constraints: Comprehensive testing of AI features can become prohibitively expensive or time-consuming when relying solely on external APIs.
These challenges highlight the need for developers to have options for running LLMs locally, especially during the development and testing phases of AI-enabled applications.
Introducing OLLAMA
What is OLLAMA?
OLLAMA is an open-source framework designed to simplify running Large Language Models locally on personal computers. It provides a streamlined way to download, run, and interact with various open-source LLMs without the complexity typically associated with setting up these models.
At its core, OLLAMA makes sophisticated AI technology accessible to developers by handling the technical complexities of model management, optimization, and inference. It packages models in a way that makes them easy to install and run with minimal configuration, even on consumer-grade hardware.
Key Features and Capabilities
OLLAMA stands out from other local LLM solutions through several distinguishing features:
Simplified Model Management: Download and switch between different models with simple commands
Optimized Performance: Automatic quantization and optimization for running on consumer hardware
REST API: Built-in API server for integrating with applications and tools
Cross-Platform Support: Available for macOS, Windows, and Linux
Wide Model Compatibility: Supports a growing ecosystem of open-source models
Resource Efficiency: Runs models with reasonable memory and CPU/GPU requirements
By providing these capabilities in a lightweight package, OLLAMA enables developers to incorporate powerful language AI into their workflows without the overhead of cloud dependencies.
Getting Started with OLLAMA
Installation Process
Getting OLLAMA up and running on your development machine is straightforward regardless of your operating system. The process involves downloading the appropriate installer for your platform and following a few simple setup steps.
For macOS:
Download the installation package from the official OLLAMA website
Open the downloaded file and follow the installation prompts
Once installed, OLLAMA will be available from your terminal
For Linux:
curl -fsSL https://ollama.com/install.sh | sh
After installation, verify that OLLAMA is working correctly by running a simple command in your terminal:
ollama -v
This should display the installed version of OLLAMA, confirming that the installation was successful.
Basic Command Structure
OLLAMA uses a simple command-line interface with intuitive commands for managing and using language models. The basic structure follows this pattern:
ollama [command] [options]
Common commands include:
pull
: Download a modelrun
: Run a model and start an interactive sessionlist
: Show available modelsrm
: Remove a modelserve
: Start the OLLAMA server
Understanding these fundamental commands will allow you to start working with language models immediately.
Deploying LLMs Locally: Hands-on Demo
Pulling Your First Model
Let's begin by downloading a model to your local machine. OLLAMA makes this process as simple as specifying which model you want. In this example, I will use deepseek-r1:1.5b
model for testing Deepseek model locally on my machine:
ollama pull deepseek-r1:1.5b
This command downloads the DeepSeek R1 1.5B model, a powerful open-source language model that balances performance and resource requirements well.
The download might take some time depending on your internet connection speed and the model size. Once complete, the model is ready to use locally.
Running Models Through the Command Line
With a model downloaded, you can start interacting with it directly from your terminal:
ollama run deepseek-r1:1.5b
This command launches an interactive session where you can chat with the model by typing prompts and receiving generated responses.
Try asking various questions or giving different instructions to explore the model's capabilities:
> Explain what is DevOps and the responsibilities of a DevOps engineer.
Using OLLAMA with Open Web UIs
Available Web Interfaces
While the command line provides direct access to models, graphical interfaces can enhance usability and provide additional features. Several open-source web UIs work seamlessly with OLLAMA:
Open WebUI: A feature-rich interface with conversation management, file uploads, and plugin support.
Ollama WebUI: A lightweight, minimalist interface focused on clean chat interactions.
LM Studio: A comprehensive tool for not just chatting with models but also comparing and benchmarking them.
Setting Up Open WebUI
Let's walk through setting up Open WebUI, one of the most popular interfaces for OLLAMA:
- Install Open WebUI using Docker:
podman run -d -p 3000:8080 -e OLLAMA_API_BASE_URL=http://localhost:11434/api --name openwebui --restart always ghcr.io/open-webui/open-webui:main
NOTE: If you are not using
podman
, instead, you can use thedocker
command to pull and create a container.
- Navigate to
http://localhost:3000
in your web browser:
- Start a new chat session with one of your installed models:
Creating and Managing Chat Sessions
Web interfaces enhance the chat experience by providing conversation management features that the command line lacks:
Save and name conversations for future reference
Switch between different models mid-conversation
Share conversation links with team members
Export conversations in various formats
These features make web UIs particularly valuable for collaborative development scenarios or when you need to maintain a history of your interactions with the model.
Conclusion
OLLAMA provides developers with a powerful way to incorporate Large Language Models into their development workflow without the dependencies and costs associated with cloud-based APIs. By running models locally, you gain greater control, enhanced privacy, and the ability to work offline—all valuable advantages during the development and testing phases.
As demonstrated throughout this post, setting up and using OLLAMA is straightforward, with flexible options for interacting with models through the command line, web interfaces, or direct API integration. The wide range of supported models ensures you can find options suitable for various use cases and hardware configurations.
Whether you're building AI-enhanced applications, creating developer tools, or simply exploring what's possible with modern language models, OLLAMA represents an essential tool in the modern developer's toolkit.
Next Steps
To continue your journey with local LLM development:
Experiment with different models to find the best fit for your use cases
Explore the growing ecosystem of tools built around OLLAMA
Consider contributing to the open-source community by sharing your experiences and utilities
Stay updated on new model releases that push the boundaries of what's possible locally
The landscape of AI development is evolving rapidly, and having the flexibility to run models locally provides valuable options as you build the next generation of intelligent applications.
In the next blog post, I will demonstrate how to integrate your development workflows with OLLAMA by using python OpenAI library and interact with hosted models by using programmatic approach.
Resources and References
Subscribe to my newsletter
Read articles from Hein Htet Win directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Hein Htet Win
Hein Htet Win
I am a DevOps Engineer from Yangon, Myanmar. I fell in love with automation and CI/CD. I also enjoy using open-source software and regularly contribute to and participate on webinars. In my spare time, I enjoy playing games with my friends in addition to my job.