Running ollama on Proxmox VM

Andre WongAndre Wong
4 min read

I am using a Gigabyte GTX1060 G1 Gaming 3GB GDDR5 as my GPU to run ollama models. To do that, I first need to do a GPU passthrough to my virtual machine. Reference for this tutorial

proxmox host setup for GPU passthrough

  1. Verify CPU supports hardware virtualization and IOMMU

    • might have to enable settings in the BIOS

    • mine was under Asus BIOS → CPU advanced settings → vmx virtualization technology

  2. In the Proxmox host machine shell, add intel_iommu=on to /etc/default/grub

     # /etc/default/grub
    
     GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"
    
  3. update-grub and then sudo reboot to save settings

  4. Verify IOMMU is turned on: dmesg | grep -e DMAR -e IOMMU

     dmesg | grep -e DMAR -e IOMMU
    
     [    0.007531] ACPI: DMAR 0x000000008EC3B3C0 000070 (v01 INTEL  KBL      00000001 INTL 00000001)
     [    0.007554] ACPI: Reserving DMAR table memory at [mem 0x8ec3b3c0-0x8ec3b42f]
     [    0.027442] DMAR: IOMMU enabled
     [    0.074297] DMAR: Host address width 39
     [    0.074298] DMAR: DRHD base: 0x000000fed90000 flags: 0x1
     [    0.074303] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap d2008c40660462 ecap f050da
     [    0.074305] DMAR: RMRR base: 0x0000008eaee000 end: 0x0000008eb0dfff
     [    0.074308] DMAR-IR: IOAPIC id 2 under DRHD base  0xfed90000 IOMMU 0
     [    0.074309] DMAR-IR: HPET id 0 under DRHD base 0xfed90000
     [    0.074310] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
     [    0.075656] DMAR-IR: Enabled IRQ remapping in x2apic mode
     [    0.262942] DMAR: No ATSR found
     [    0.262943] DMAR: No SATC found
     [    0.262944] DMAR: dmar0: Using Queued invalidation
     [    0.263559] DMAR: Intel(R) Virtualization Technology for Directed I/O
    
  5. we do not want the host machine to use the GPU, so we blacklist Nvidia drivers

     # /etc/modprobe.d/blacklist.conf
    
     blacklist nouveau
     blacklist nvidiafb
    
  6. Find our GPU hardware device, lspci -nn

    We can see 01:00.0 and 01:00.1 here correspond to our graphics card, and lets get the vendor and device ID: 10de:1c02 and 10de:10f1

     lscpi -nn
    
     01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP106 [GeForce GTX 1060 3GB] [10de:1c02] (rev a1)
     01:00.1 Audio device [0403]: NVIDIA Corporation GP106 High Definition Audio Controller [10de:10f1] (rev a1)
    
  7. Add the graphics card we want to pass through with the vendor and device IDs we found earlier and do a restart

     echo "options vfio-pci ids=10de:1c02,10de:10f1 disable_vga=1" > /etc/modprobe.d/vfio.conf
     update-initramfs -u
     sudo reboot
    

proxmox virtual machine setup for GPU passthrough

  1. Most of the steps are similar to creating a normal virtual machine

  2. When selecting a CPU type, be sure to use host instead of a virtual CPU, as it may not support certain instruction sets

  3. Be sure to allocate enough memory for the ollma model that you are using

  4. After the virtual machine is created, do not start it yet, go to hardware → add → PCI device and select our graphics card

  5. Start our virtual machine, follow the installation, etc

  6. Install GPU drivers in the virtual machine

     # running as su
    
     add-apt-repository ppa:graphics-drivers/ppa
     apt update
     apt upgrade
     ubuntu-drivers autoinstall
     echo "blacklist nouveau" > /etc/modprobe.d/blacklist-nouveau.conf
     echo "options nouveau modeset=0" >> /etc/modprobe.d/blacklist-nouveau.conf
     update-initramfs -u
     reboot
    
  7. Check our GPU status

     +-----------------------------------------------------------------------------------------+
     | NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
     |-----------------------------------------+------------------------+----------------------+
     | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
     | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
     |                                         |                        |               MIG M. |
     |=========================================+========================+======================|
     |   0  NVIDIA GeForce GTX 1060 3GB    Off |   00000000:01:00.0 Off |                  N/A |
     |  0%   41C    P8              7W /  180W |       5MiB /   3072MiB |      0%      Default |
     |                                         |                        |                  N/A |
     +-----------------------------------------+------------------------+----------------------+
    
     +-----------------------------------------------------------------------------------------+
     | Processes:                                                                              |
     |  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
     |        ID   ID                                                               Usage      |
     |=========================================================================================|
     |  No running processes found                                                             |
     +-----------------------------------------------------------------------------------------+
    

installing ollama

curl -fsSL https://ollama.com/install.sh | sh

ollama run deepseek-r1:14b
# test ollama inference
>>> what is 2 + 2
<think>
I see the question is asking for the sum of two plus two.

First, I'll identify the numbers involved in the addition.

Next, I'll add them together to find the total.

Finally, I'll present the final answer clearly.
</think>

Sure! Let's solve \(2 + 2\) step by step:

1. **Identify the Numbers:**
   - We have two numbers to add: 2 and 2.

2. **Add the Numbers:**
   \[
   2 + 2 = 4
   \]

3. **Final Answer:**
   \[
   \boxed{4}
   \]

Now we can run our model in the command line, we can then connect to a front end to use our model as well.

installing open-webui

# install docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh ./get-docker.sh
sudo usermod -aG docker $USER
logout

# pull open-webui container
docker pull ghcr.io/open-webui/open-webui:main

# run container on host network
docker run -d -e OLLAMA_BASE_URL=http://127.0.0.1:11434 -v open-webui:/app/backend/data --network=host --name open-webui ghcr.io/open-webui/open-webui:main

# visit website
http://<host ip>:8080
  • After visiting the website and creating an account, go to settings → connection, update the Ollama API connection to http://127.0.0.1:11434 (they are in the same virtual machine)

  • We can ask any question like ChatGPT

    • not the best hardware, seeing that it took about 6 minutes to generate a response, but it is something to try =)

0
Subscribe to my newsletter

Read articles from Andre Wong directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Andre Wong
Andre Wong

I am a software developer who is passionate about creating innovative and efficient solutions to complex problems. I also enjoy writing about my personal projects and sharing my knowledge with others. I am maintaining a blog to document my coding adventures, share tips and tricks for software development, and discuss interesting topics in computer science.