I am using a Gigabyte GTX1060 G1 Gaming 3GB GDDR5 as my GPU to run ollama models. To do that, I first need to do a GPU passthrough to my virtual machine. Reference for this tutorial

proxmox host setup for GPU passthrough

Verify CPU supports hardware virtualization and IOMMU
- might have to enable settings in the BIOS
- mine was under Asus BIOS → CPU advanced settings → vmx virtualization technology

In the Proxmox host machine shell, add intel_iommu=on to /etc/default/grub

 # /etc/default/grub

 GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"

update-grub and then sudo reboot to save settings

Verify IOMMU is turned on: dmesg | grep -e DMAR -e IOMMU

 dmesg | grep -e DMAR -e IOMMU

 [    0.007531] ACPI: DMAR 0x000000008EC3B3C0 000070 (v01 INTEL  KBL      00000001 INTL 00000001)
 [    0.007554] ACPI: Reserving DMAR table memory at [mem 0x8ec3b3c0-0x8ec3b42f]
 [    0.027442] DMAR: IOMMU enabled
 [    0.074297] DMAR: Host address width 39
 [    0.074298] DMAR: DRHD base: 0x000000fed90000 flags: 0x1
 [    0.074303] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap d2008c40660462 ecap f050da
 [    0.074305] DMAR: RMRR base: 0x0000008eaee000 end: 0x0000008eb0dfff
 [    0.074308] DMAR-IR: IOAPIC id 2 under DRHD base  0xfed90000 IOMMU 0
 [    0.074309] DMAR-IR: HPET id 0 under DRHD base 0xfed90000
 [    0.074310] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
 [    0.075656] DMAR-IR: Enabled IRQ remapping in x2apic mode
 [    0.262942] DMAR: No ATSR found
 [    0.262943] DMAR: No SATC found
 [    0.262944] DMAR: dmar0: Using Queued invalidation
 [    0.263559] DMAR: Intel(R) Virtualization Technology for Directed I/O

we do not want the host machine to use the GPU, so we blacklist Nvidia drivers

 # /etc/modprobe.d/blacklist.conf

 blacklist nouveau
 blacklist nvidiafb

Find our GPU hardware device, lspci -nn

We can see 01:00.0 and 01:00.1 here correspond to our graphics card, and lets get the vendor and device ID: 10de:1c02 and 10de:10f1

 lscpi -nn

 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP106 [GeForce GTX 1060 3GB] [10de:1c02] (rev a1)
 01:00.1 Audio device [0403]: NVIDIA Corporation GP106 High Definition Audio Controller [10de:10f1] (rev a1)

Add the graphics card we want to pass through with the vendor and device IDs we found earlier and do a restart

 echo "options vfio-pci ids=10de:1c02,10de:10f1 disable_vga=1" > /etc/modprobe.d/vfio.conf
 update-initramfs -u
 sudo reboot

proxmox virtual machine setup for GPU passthrough

Most of the steps are similar to creating a normal virtual machine
When selecting a CPU type, be sure to use host instead of a virtual CPU, as it may not support certain instruction sets
Be sure to allocate enough memory for the ollma model that you are using
After the virtual machine is created, do not start it yet, go to hardware → add → PCI device and select our graphics card
Start our virtual machine, follow the installation, etc

Install GPU drivers in the virtual machine

 # running as su

 add-apt-repository ppa:graphics-drivers/ppa
 apt update
 apt upgrade
 ubuntu-drivers autoinstall
 echo "blacklist nouveau" > /etc/modprobe.d/blacklist-nouveau.conf
 echo "options nouveau modeset=0" >> /etc/modprobe.d/blacklist-nouveau.conf
 update-initramfs -u
 reboot

Check our GPU status

 +-----------------------------------------------------------------------------------------+
 | NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
 |-----------------------------------------+------------------------+----------------------+
 | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
 | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
 |                                         |                        |               MIG M. |
 |=========================================+========================+======================|
 |   0  NVIDIA GeForce GTX 1060 3GB    Off |   00000000:01:00.0 Off |                  N/A |
 |  0%   41C    P8              7W /  180W |       5MiB /   3072MiB |      0%      Default |
 |                                         |                        |                  N/A |
 +-----------------------------------------+------------------------+----------------------+

 +-----------------------------------------------------------------------------------------+
 | Processes:                                                                              |
 |  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
 |        ID   ID                                                               Usage      |
 |=========================================================================================|
 |  No running processes found                                                             |
 +-----------------------------------------------------------------------------------------+

installing ollama

curl -fsSL https://ollama.com/install.sh | sh

ollama run deepseek-r1:14b

# test ollama inference
>>> what is 2 + 2
<think>
I see the question is asking for the sum of two plus two.

First, I'll identify the numbers involved in the addition.

Next, I'll add them together to find the total.

Finally, I'll present the final answer clearly.
</think>

Sure! Let's solve \(2 + 2\) step by step:

1. **Identify the Numbers:**
   - We have two numbers to add: 2 and 2.

2. **Add the Numbers:**
   \[
   2 + 2 = 4
   \]

3. **Final Answer:**
   \[
   \boxed{4}
   \]

Now we can run our model in the command line, we can then connect to a front end to use our model as well.

installing open-webui

# install docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh ./get-docker.sh
sudo usermod -aG docker $USER
logout

# pull open-webui container
docker pull ghcr.io/open-webui/open-webui:main

# run container on host network
docker run -d -e OLLAMA_BASE_URL=http://127.0.0.1:11434 -v open-webui:/app/backend/data --network=host --name open-webui ghcr.io/open-webui/open-webui:main

# visit website
http://<host ip>:8080

After visiting the website and creating an account, go to settings → connection, update the Ollama API connection to http://127.0.0.1:11434 (they are in the same virtual machine)
We can ask any question like ChatGPT
- not the best hardware, seeing that it took about 6 minutes to generate a response, but it is something to try =)

Running ollama on Proxmox VM

proxmox host setup for GPU passthrough

proxmox virtual machine setup for GPU passthrough

installing ollama

installing open-webui

Subscribe to my newsletter

Andre Wong

Andre Wong