Running ollama on Proxmox VM


I am using a Gigabyte GTX1060 G1 Gaming 3GB GDDR5 as my GPU to run ollama models. To do that, I first need to do a GPU passthrough to my virtual machine. Reference for this tutorial
proxmox host setup for GPU passthrough
Verify CPU supports hardware virtualization and IOMMU
might have to enable settings in the BIOS
mine was under Asus BIOS → CPU advanced settings → vmx virtualization technology
In the Proxmox host machine shell, add
intel_iommu=on
to/etc/default/grub
# /etc/default/grub GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"
update-grub
and thensudo reboot
to save settingsVerify IOMMU is turned on:
dmesg | grep -e DMAR -e IOMMU
dmesg | grep -e DMAR -e IOMMU [ 0.007531] ACPI: DMAR 0x000000008EC3B3C0 000070 (v01 INTEL KBL 00000001 INTL 00000001) [ 0.007554] ACPI: Reserving DMAR table memory at [mem 0x8ec3b3c0-0x8ec3b42f] [ 0.027442] DMAR: IOMMU enabled [ 0.074297] DMAR: Host address width 39 [ 0.074298] DMAR: DRHD base: 0x000000fed90000 flags: 0x1 [ 0.074303] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap d2008c40660462 ecap f050da [ 0.074305] DMAR: RMRR base: 0x0000008eaee000 end: 0x0000008eb0dfff [ 0.074308] DMAR-IR: IOAPIC id 2 under DRHD base 0xfed90000 IOMMU 0 [ 0.074309] DMAR-IR: HPET id 0 under DRHD base 0xfed90000 [ 0.074310] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping. [ 0.075656] DMAR-IR: Enabled IRQ remapping in x2apic mode [ 0.262942] DMAR: No ATSR found [ 0.262943] DMAR: No SATC found [ 0.262944] DMAR: dmar0: Using Queued invalidation [ 0.263559] DMAR: Intel(R) Virtualization Technology for Directed I/O
we do not want the host machine to use the GPU, so we blacklist Nvidia drivers
# /etc/modprobe.d/blacklist.conf blacklist nouveau blacklist nvidiafb
Find our GPU hardware device,
lspci -nn
We can see 01:00.0 and 01:00.1 here correspond to our graphics card, and lets get the vendor and device ID:
10de:1c02
and10de:10f1
lscpi -nn 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP106 [GeForce GTX 1060 3GB] [10de:1c02] (rev a1) 01:00.1 Audio device [0403]: NVIDIA Corporation GP106 High Definition Audio Controller [10de:10f1] (rev a1)
Add the graphics card we want to pass through with the vendor and device IDs we found earlier and do a restart
echo "options vfio-pci ids=10de:1c02,10de:10f1 disable_vga=1" > /etc/modprobe.d/vfio.conf update-initramfs -u sudo reboot
proxmox virtual machine setup for GPU passthrough
Most of the steps are similar to creating a normal virtual machine
When selecting a CPU type, be sure to use
host
instead of a virtual CPU, as it may not support certain instruction setsBe sure to allocate enough memory for the ollma model that you are using
After the virtual machine is created, do not start it yet, go to hardware → add → PCI device and select our graphics card
Start our virtual machine, follow the installation, etc
Install GPU drivers in the virtual machine
# running as su add-apt-repository ppa:graphics-drivers/ppa apt update apt upgrade ubuntu-drivers autoinstall echo "blacklist nouveau" > /etc/modprobe.d/blacklist-nouveau.conf echo "options nouveau modeset=0" >> /etc/modprobe.d/blacklist-nouveau.conf update-initramfs -u reboot
Check our GPU status
+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce GTX 1060 3GB Off | 00000000:01:00.0 Off | N/A | | 0% 41C P8 7W / 180W | 5MiB / 3072MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+
installing ollama
curl -fsSL https://ollama.com/install.sh | sh
ollama run deepseek-r1:14b
# test ollama inference
>>> what is 2 + 2
<think>
I see the question is asking for the sum of two plus two.
First, I'll identify the numbers involved in the addition.
Next, I'll add them together to find the total.
Finally, I'll present the final answer clearly.
</think>
Sure! Let's solve \(2 + 2\) step by step:
1. **Identify the Numbers:**
- We have two numbers to add: 2 and 2.
2. **Add the Numbers:**
\[
2 + 2 = 4
\]
3. **Final Answer:**
\[
\boxed{4}
\]
Now we can run our model in the command line, we can then connect to a front end to use our model as well.
installing open-webui
# install docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh ./get-docker.sh
sudo usermod -aG docker $USER
logout
# pull open-webui container
docker pull ghcr.io/open-webui/open-webui:main
# run container on host network
docker run -d -e OLLAMA_BASE_URL=http://127.0.0.1:11434 -v open-webui:/app/backend/data --network=host --name open-webui ghcr.io/open-webui/open-webui:main
# visit website
http://<host ip>:8080
After visiting the website and creating an account, go to settings → connection, update the Ollama API connection to
http://127.0.0.1:11434
(they are in the same virtual machine)We can ask any question like ChatGPT
not the best hardware, seeing that it took about 6 minutes to generate a response, but it is something to try =)
Subscribe to my newsletter
Read articles from Andre Wong directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Andre Wong
Andre Wong
I am a software developer who is passionate about creating innovative and efficient solutions to complex problems. I also enjoy writing about my personal projects and sharing my knowledge with others. I am maintaining a blog to document my coding adventures, share tips and tricks for software development, and discuss interesting topics in computer science.