What we're building today

By the end of this post, you'll have a fully operational chat interface connected to the LLM of your choice, running completely locally. No external APIs, no data leaving your machine. Just pure local AI power.

We'll implement and orchestrate the foundational self-hosted LLM inference tools that form the backbone for everything we'll build in this series.

The bigger picture

I'm currently building a self-hosted AI stack from scratch, and I'm documenting everything as I go. I want to capture the knowledge while it's fresh and share the real-world lessons I've learned along the way.

Here's the thing: even with today's resources, piecing together a solid self-hosted AI stack is surprisingly challenging. The more complex your needs get, the scarcer the guidance becomes. This series is the comprehensive guide I wish I had when I started this journey. At least this way, people like you and me who are totally new to it can get up and start quickly, then evolve and expand from there.

What's coming up (outline may evolve as I discover better integration approaches while writing):

Part 2: Document Processing and RAG with Apache Tika and Qdrant Vector Database
Part 3: Visual Data Extraction: Web upload interface with N8N workflows and Vision Language Models for extracting structured data from images
Part 4: Model Superpowers: Advanced WebUI configuration with tools and knowledge integration
Part 5: Intelligent Automation: WebUI filters and N8N workflows for content processing

Prerequisites

A machine with at least 16GB of RAM and 50GB of disk space
Docker installed
GPU recommended for better performance. Without a GPU, expect slower responses (10-30 seconds vs 1-3 seconds) as models run on CPU. Still totally usable, just requires patience. If performance is too slow, consider downloading a smaller model.

Credits where credits are due

Massive thanks to the creators and contributors of these incredible open source projects that make all this possible:

Ollama - Repository: ollama/ollama
Open WebUI - Repository: open-webui/open-webui
PostgreSQL - Repository: postgres/postgres
pgadmin - Repository: pgadmin-org/pgadmin4
Docker - Repositories: moby/moby & docker/compose

Our foundation stack

We're building with battle-tested open source tools that work beautifully together. This foundation stays simple and robust, perfect for expanding on in later parts.

Here's what we're working with:

Ollama - Self-hosted LLM hosting made simple
Open WebUI - Beautiful and feature-rich chat UI for LLMs
PostgreSQL - Reliable data storage for conversations and configs
pgadmin - Web-based administration UI for PostgreSQL
Docker & Docker Compose - Container orchestration to tie it all together

Why these specific tools? They're what I'm actually running in production, and I know their quirks inside and out. Feel free to swap any component for your preferred alternatives. The architecture stays the same.

Quick start (skip explanations, just get it running)

Install Ollama: Head to https://ollama.com/download

Mac users: Use the app instead of Docker. My experience with Docker on an M4 MacBook was rough. The containerized version couldn't access my GPU, making everything painfully slow.
Download a model (this pulls and runs it):
```
 ollama pull qwen3:8b
```
Why qwen3:8b? It hits the sweet spot between performance and compatibility. Since LLM usage demands significant computing power, this model is small enough to run on most machines without specialized hardware. Check out other options at https://ollama.com/search.
Verify your model downloaded:
```
 ollama list
```
You should see qwen3:8b listed in the output:
```
 NAME                                   ID              SIZE      MODIFIED    
 qwen3:8b                               bdbd181c33f2    5.3 GB    1 hour ago
```
Starting Ollama: If you ever need to get Ollama running, you can either start the app from your Applications folder or run ollama serve in your terminal. If you see "Error: listen tcp 127.0.0.1:11434: bind: address already in use", that means Ollama is already running.

Clone the repository:

 git clone https://github.com/FarzamMohammadi/self-hosted-ai-stack

Navigate to the foundation:
```
 cd part-1-building-the-foundation
```
Fire it up (make sure Docker is running):
```
 docker compose up -d
```

That's it! Open http://localhost:3000, sign up, select qwen3:8b in the top-left corner, and start chatting with your self-hosted AI.

Understanding what you built

Ollama

Ollama makes self-hosted LLM hosting dead simple. It's built on top of the excellent llama.cpp project, trading some customization for incredible ease of use. Installation takes minutes, and Ollama automatically detects the optimal configuration for your hardware. No tweaking required. Just download and run.

Installation options

You can run Ollama via Docker or as a native app, but this guide uses the native app since that's what our docker-compose setup expects.

Download the native app from https://ollama.com/download and install it following the standard process for your operating system.

Why I go with the native app

My experience with Docker on an M4 MacBook was pretty disappointing. The containerized version couldn't tap into my machine's GPU, turning what should be snappy responses into sluggish waits. The native app, however, plays nicely with macOS's Metal framework and delivers the performance you'd expect.

Choosing your model

The model library at https://ollama.com/search is extensive. You can pull any listed model with a simple command, or even create custom models using Modelfiles.

For this tutorial, I'm using qwen3:8b. It strikes a great balance between capability and resource requirements:

ollama pull qwen3:8b

Docker setup alternative

If you prefer setting it up via Docker, here's an example setup:

services:
  ollama:
    image: docker.io/ollama/ollama:latest
    ports:
      - 7869:11434
    volumes:
      - ./ollama/ollama:/root/.ollama
    container_name: ollama
    pull_policy: always
    tty: true
    restart: always
    environment:
      - OLLAMA_KEEP_ALIVE=24h
      - OLLAMA_HOST=0.0.0.0
    networks:
      - ollama-docker

networks:
  ollama-docker:
    external: false

Source: mythrantic/ollama-docker

PostgreSQL & pgadmin

PostgreSQL becomes our data backbone, storing conversations, configurations, and eventually (in later parts) data for additional services we'll integrate. I went with PostgreSQL because it's bulletproof reliable and Open WebUI supports it excellently. pgadmin gives us a clean web interface for database exploration.

The configuration

Keeping it straightforward with just the essentials:

services:
  postgres:
    image: postgres:15-alpine
    container_name: postgres
    ports:
      - '5432:5432'
    environment:
      - POSTGRES_DB=openwebui
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=securepassword123
    volumes:
      - ./volumes/postgres/data:/var/lib/postgresql/data
    restart: unless-stopped
    healthcheck:
      test: ['CMD-SHELL', 'pg_isready -U postgres -d openwebui']
      interval: 10s
      timeout: 5s
      retries: 5

  pgadmin:
    image: dpage/pgadmin4:latest
    container_name: pgadmin
    ports:
      - '5050:80'
    environment:
      - PGADMIN_DEFAULT_EMAIL=admin@local.ai
      - PGADMIN_DEFAULT_PASSWORD=admin123
      - PGADMIN_CONFIG_SERVER_MODE=False
      - PGADMIN_CONFIG_MASTER_PASSWORD_REQUIRED=False
    volumes:
      - ./volumes/pgadmin:/var/lib/pgadmin
    restart: unless-stopped
    depends_on:
      postgres:
        condition: service_healthy

Configuration notes:

Using postgres:15-alpine for the right balance of size and features
The healthcheck prevents connection race conditions
pgadmin runs in desktop mode since this is local-only
Simple credentials for localhost. Change them if you expose this externally

Open WebUI

Open WebUI delivers the ChatGPT experience, basically a nice chat interface for streamlined interaction with LLMs, but running entirely on your machine. After testing various interfaces (text-generation-webui, SillyTavern, and others), Open WebUI won me over with its clean design and incredible customization options.

What makes it special

Clean interface that actually works without fuss
Built-in RAG support (we'll dig into this in Part 2)
Seamless Ollama integration
Excellent APIs for integration with your own projects
Active development with a supportive community

Setting it up

Here's the configuration that ties everything together:

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: webui
    ports:
      - '3000:8080'
    volumes:
      - ./volumes/open-webui/data:/app/backend/data
    environment:
      # Ollama connection
      - OLLAMA_BASE_URL=http://host.docker.internal:11434

      # Database connection
      - DATABASE_URL=postgresql://postgres:securepassword123@postgres:5432/openwebui

      # Basic settings
      - WEBUI_SECRET_KEY=your-secret-key-here
      - WEBUI_AUTH=true
      - ENABLE_SIGNUP=true
      - DEFAULT_MODELS=qwen3:8b

    extra_hosts:
      - "host.docker.internal:host-gateway"
    restart: unless-stopped
    depends_on:
      postgres:
        condition: service_healthy

Note on extra_hosts: This setting is required because we're running Ollama as a native app (not in Docker). If you choose to run Ollama in Docker instead, remove the extra_hosts section and update OLLAMA_BASE_URL to use the container name (e.g., http://ollama:11434).

Key settings explained:

OLLAMA_BASE_URL points to our native Ollama app via host.docker.internal
Database URL connects to our PostgreSQL container
Auth is enabled. Disable for single-user setups if preferred
DEFAULT_MODELS should match your downloaded model

Putting it all together

Time to wire everything up. Here's the complete docker-compose.yml that orchestrates our entire self-hosted AI stack.

The complete configuration

Create a new directory for your project and drop in this docker-compose.yml:

version: '3.8'

services:
  postgres:
    image: postgres:15-alpine
    container_name: postgres
    ports:
      - '5432:5432'
    environment:
      - POSTGRES_DB=openwebui
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=securepassword123
    volumes:
      - ./volumes/postgres/data:/var/lib/postgresql/data
    restart: unless-stopped
    healthcheck:
      test: ['CMD-SHELL', 'pg_isready -U postgres -d openwebui']
      interval: 10s
      timeout: 5s
      retries: 5
    networks:
      - local-ai-network

  pgadmin:
    image: dpage/pgadmin4:latest
    container_name: pgadmin
    ports:
      - '5050:80'
    environment:
      - PGADMIN_DEFAULT_EMAIL=admin@local.ai
      - PGADMIN_DEFAULT_PASSWORD=admin123
      - PGADMIN_CONFIG_SERVER_MODE=False
      - PGADMIN_CONFIG_MASTER_PASSWORD_REQUIRED=False
    volumes:
      - ./volumes/pgadmin:/var/lib/pgadmin
    restart: unless-stopped
    depends_on:
      postgres:
        condition: service_healthy
    networks:
      - local-ai-network

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: webui
    ports:
      - '3000:8080'
    volumes:
      - ./volumes/open-webui/data:/app/backend/data
    environment:
      # Ollama connection
      - OLLAMA_BASE_URL=http://host.docker.internal:11434

      # Database connection
      - DATABASE_URL=postgresql://postgres:securepassword123@postgres:5432/openwebui

      # Basic settings
      - WEBUI_SECRET_KEY=your-secret-key-here
      - WEBUI_AUTH=true
      - ENABLE_SIGNUP=true
      - DEFAULT_MODELS=qwen3:8b

    extra_hosts:
      - 'host.docker.internal:host-gateway'
    restart: unless-stopped
    depends_on:
      postgres:
        condition: service_healthy
    networks:
      - local-ai-network

networks:
  local-ai-network:
    driver: bridge

Starting the stack

Three simple steps:

Make sure Docker is running
Confirm Ollama is active: Run ollama serve in your terminal. If it starts, great! If you see "Error: listen tcp 127.0.0.1:11434: bind: address already in use", that means Ollama is already running
Fire it up from your project directory:

docker compose up -d

The -d flag runs everything in the background. First startup takes a few minutes while Docker downloads the images.

Once everything is up and running, head to http://localhost:3000 and you'll need to create an account before you can start using the interface. The first user to sign up automatically becomes the admin.

Testing your setup

Let's verify everything works. Here's my quick validation routine:

1. Container health check

docker compose ps

All three containers should show as running. If any show "Exited", investigate with:

docker compose logs [service-name]

2. Verify Ollama connection

Quick test to ensure Ollama responds:

Mac/Linux/Windows Command Prompt:

curl http://localhost:11434/api/tags

Windows PowerShell:

Invoke-RestMethod http://localhost:11434/api/tags

Browser fallback: Open http://localhost:11434/api/tags in your browser

You should see your downloaded models in the response.

3. Access the web interface

Navigate to http://localhost:3000
Create an account (first user becomes admin automatically)
You should land on the clean chat interface

4. First conversation

The qwen3:8b model should be selected by default (since we set it as DEFAULT_MODELS in our docker-compose). If it isn't, you can find the model selector in the top-left corner
Choose qwen3:8b if not already selected
Send a test message and wait for the response

Slow responses? Check that Ollama is running and your model downloaded completely.

5. Peek at the database (optional)

Curious about what's happening under the hood?

Visit http://localhost:5050
Login with admin@local.ai / admin123
Add a server connection:
1. Right click on Servers (in the left side menu) → Register → Server
2. In the General tab → Name: local-ai
3. Switch to the Connection tab and enter:
  - Host name/address: postgres
  - Port: 5432
  - Maintenance database: postgres
  - Username: postgres
  - Password: securepassword123
4. Click Save
5. Once connected, navigate to local-ai → Databases → openwebui → Schemas → public → Tables to see the OpenWebUI tables that were created automatically when it connected

Troubleshooting common issues

"Cannot connect to Ollama"

Verify Ollama is actually running (check system tray)
If using Docker Ollama, double-check port mappings

"Database connection failed"

Give PostgreSQL more initialization time
Confirm postgres container health: docker compose ps

"Port already in use"

Modify port mappings in docker-compose.yml
Stop whatever service is occupying those ports

"Performance is awful"

Usually means Ollama can't access your GPU
On Mac, stick with the native app over Docker
Try a smaller model if resources are tight

What's next

Congratulations! You've built a solid foundation for your self-hosted AI stack. You're now running:

✅ Ollama serving your self-hosted LLM
✅ Open WebUI providing a beautiful chat interface
✅ PostgreSQL storing conversations and configurations
✅ pgadmin for database management

This foundation gives us a rock-solid base for the advanced capabilities we'll add in upcoming parts.

Coming up in Part 2

We'll supercharge our setup with RAG (Retrieval-Augmented Generation) by adding:

Apache Tika for document processing (PDFs, Word docs, images, etc.)
Qdrant vector database for semantic search
Document upload and intelligent retrieval through Open WebUI

Everything we've built today will integrate seamlessly with these new components.

Homework before Part 2

Take some time to explore what we've built:

Try different models (llama3.2, codellama, mistral)
Experiment with Open WebUI's settings and themes
Upload some documents and see how basic file handling works
Poke around the conversation history in pgadmin

Helpful resources

This is part of my "Complete Self-Hosted AI Infrastructure" series. Follow along as we build increasingly sophisticated AI capabilities, all running self-hosted on your machine.

Setting Up Your Self-Hosted AI Stack - Part 1: Building the foundation with Open WebUI, Ollama, and Postgres

Table of contents