Running LLMs Locally Using OLLAMA for AI-based App Development

Hein Htet WinHein Htet Win
7 min read

Large Language Models (LLMs) have revolutionized how we build intelligent applications, but working with them can be challenging when relying solely on cloud-based solutions. This post explores how developers can leverage OLLAMA to run powerful language models locally, streamlining development workflows while reducing costs and dependencies.

Understanding LLMs and Generative AI

What Are Large Language Models?

Large Language Models represent a breakthrough in artificial intelligence that can understand, generate, and manipulate human language with remarkable sophistication. Unlike traditional machine learning models that perform specific, narrow tasks, LLMs demonstrate broad capabilities across various language tasks—from answering questions and summarizing content to generating creative text and translating languages.

These models are "large" not just in their parameter count (often ranging from billions to trillions of parameters), but also in their ability to capture complex patterns and relationships within language. Their architecture, typically based on transformer neural networks, allows them to process and generate text with an understanding of context, semantics, and even subtle nuances of human communication.

The Rise of Generative AI

Generative AI refers to artificial intelligence systems that can create new content rather than simply analyzing existing data. In the context of language models, this means the ability to generate human-like text based on provided prompts or instructions.

This technology has enabled a new generation of applications that can:

  • Draft emails and documents

  • Generate code for software development

  • Create creative content like stories and poetry

  • Assist with research by summarizing findings

  • Power conversational interfaces and chatbots

The transformative potential of generative AI lies in its ability to augment human creativity and productivity across numerous domains, from software engineering to content creation.

Challenges in AI Application Development

The Cloud Dependency Problem

Building applications with state-of-the-art LLMs traditionally requires relying on cloud-based API services provided by companies like OpenAI, Anthropic, or other AI providers. While convenient, this approach introduces several significant challenges:

Cost Concerns: API calls to commercial LLM services can quickly become expensive, especially for applications with high usage or during the iterative development process when frequent testing is necessary.

Latency Issues: Network calls to remote APIs introduce latency that can degrade user experience, particularly for applications requiring real-time responses.

Privacy and Data Security: Sending potentially sensitive data to third-party services raises privacy concerns, especially when developing applications that handle confidential or proprietary information.

Development Workflow Bottlenecks

Beyond infrastructure challenges, the development process itself faces friction:

Dependency on Internet Connectivity: Without local models, development grinds to a halt during internet outages or in environments with limited connectivity.

Limited Customization: Cloud-based APIs offer less flexibility to adjust model parameters or behaviors compared to running models directly.

Testing Constraints: Comprehensive testing of AI features can become prohibitively expensive or time-consuming when relying solely on external APIs.

These challenges highlight the need for developers to have options for running LLMs locally, especially during the development and testing phases of AI-enabled applications.

Introducing OLLAMA

What is OLLAMA?

OLLAMA is an open-source framework designed to simplify running Large Language Models locally on personal computers. It provides a streamlined way to download, run, and interact with various open-source LLMs without the complexity typically associated with setting up these models.

At its core, OLLAMA makes sophisticated AI technology accessible to developers by handling the technical complexities of model management, optimization, and inference. It packages models in a way that makes them easy to install and run with minimal configuration, even on consumer-grade hardware.

Key Features and Capabilities

OLLAMA stands out from other local LLM solutions through several distinguishing features:

  • Simplified Model Management: Download and switch between different models with simple commands

  • Optimized Performance: Automatic quantization and optimization for running on consumer hardware

  • REST API: Built-in API server for integrating with applications and tools

  • Cross-Platform Support: Available for macOS, Windows, and Linux

  • Wide Model Compatibility: Supports a growing ecosystem of open-source models

  • Resource Efficiency: Runs models with reasonable memory and CPU/GPU requirements

By providing these capabilities in a lightweight package, OLLAMA enables developers to incorporate powerful language AI into their workflows without the overhead of cloud dependencies.

Getting Started with OLLAMA

Installation Process

Getting OLLAMA up and running on your development machine is straightforward regardless of your operating system. The process involves downloading the appropriate installer for your platform and following a few simple setup steps.

For macOS:

  1. Download the installation package from the official OLLAMA website

  2. Open the downloaded file and follow the installation prompts

  3. Once installed, OLLAMA will be available from your terminal

For Linux:

curl -fsSL https://ollama.com/install.sh | sh

After installation, verify that OLLAMA is working correctly by running a simple command in your terminal:

ollama -v

This should display the installed version of OLLAMA, confirming that the installation was successful.

Basic Command Structure

OLLAMA uses a simple command-line interface with intuitive commands for managing and using language models. The basic structure follows this pattern:

ollama [command] [options]

Common commands include:

  • pull: Download a model

  • run: Run a model and start an interactive session

  • list: Show available models

  • rm: Remove a model

  • serve: Start the OLLAMA server

Understanding these fundamental commands will allow you to start working with language models immediately.

Deploying LLMs Locally: Hands-on Demo

Pulling Your First Model

Let's begin by downloading a model to your local machine. OLLAMA makes this process as simple as specifying which model you want. In this example, I will use deepseek-r1:1.5b model for testing Deepseek model locally on my machine:

ollama pull deepseek-r1:1.5b

This command downloads the DeepSeek R1 1.5B model, a powerful open-source language model that balances performance and resource requirements well.

The download might take some time depending on your internet connection speed and the model size. Once complete, the model is ready to use locally.

Running Models Through the Command Line

With a model downloaded, you can start interacting with it directly from your terminal:

ollama run deepseek-r1:1.5b

This command launches an interactive session where you can chat with the model by typing prompts and receiving generated responses.

Try asking various questions or giving different instructions to explore the model's capabilities:

> Explain what is DevOps and the responsibilities of a DevOps engineer.

Using OLLAMA with Open Web UIs

Available Web Interfaces

While the command line provides direct access to models, graphical interfaces can enhance usability and provide additional features. Several open-source web UIs work seamlessly with OLLAMA:

Open WebUI: A feature-rich interface with conversation management, file uploads, and plugin support.

Ollama WebUI: A lightweight, minimalist interface focused on clean chat interactions.

LM Studio: A comprehensive tool for not just chatting with models but also comparing and benchmarking them.

Setting Up Open WebUI

Let's walk through setting up Open WebUI, one of the most popular interfaces for OLLAMA:

  1. Install Open WebUI using Docker:
podman run -d -p 3000:8080 -e OLLAMA_API_BASE_URL=http://localhost:11434/api --name openwebui --restart always ghcr.io/open-webui/open-webui:main

NOTE: If you are not using podman, instead, you can use the docker command to pull and create a container.

  1. Navigate to http://localhost:3000 in your web browser:

  1. Start a new chat session with one of your installed models:

Creating and Managing Chat Sessions

Web interfaces enhance the chat experience by providing conversation management features that the command line lacks:

  • Save and name conversations for future reference

  • Switch between different models mid-conversation

  • Share conversation links with team members

  • Export conversations in various formats

These features make web UIs particularly valuable for collaborative development scenarios or when you need to maintain a history of your interactions with the model.

Conclusion

OLLAMA provides developers with a powerful way to incorporate Large Language Models into their development workflow without the dependencies and costs associated with cloud-based APIs. By running models locally, you gain greater control, enhanced privacy, and the ability to work offline—all valuable advantages during the development and testing phases.

As demonstrated throughout this post, setting up and using OLLAMA is straightforward, with flexible options for interacting with models through the command line, web interfaces, or direct API integration. The wide range of supported models ensures you can find options suitable for various use cases and hardware configurations.

Whether you're building AI-enhanced applications, creating developer tools, or simply exploring what's possible with modern language models, OLLAMA represents an essential tool in the modern developer's toolkit.

Next Steps

To continue your journey with local LLM development:

  1. Experiment with different models to find the best fit for your use cases

  2. Explore the growing ecosystem of tools built around OLLAMA

  3. Consider contributing to the open-source community by sharing your experiences and utilities

  4. Stay updated on new model releases that push the boundaries of what's possible locally

The landscape of AI development is evolving rapidly, and having the flexibility to run models locally provides valuable options as you build the next generation of intelligent applications.

In the next blog post, I will demonstrate how to integrate your development workflows with OLLAMA by using python OpenAI library and interact with hosted models by using programmatic approach.

Resources and References

0
Subscribe to my newsletter

Read articles from Hein Htet Win directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Hein Htet Win
Hein Htet Win

I am a DevOps Engineer from Yangon, Myanmar. I fell in love with automation and CI/CD. I also enjoy using open-source software and regularly contribute to and participate on webinars. In my spare time, I enjoy playing games with my friends in addition to my job.