Setting Up a GPU Server on Ubuntu for Azure N-Series VMs: A Step-by-Step Guide

kushagrakushagra
2 min read

Table of contents

As many of us work with GPU servers, setting one up on an Ubuntu machine can be challenging due to dependency issues like installing the CUDA toolkit, selecting the correct NVIDIA driver, and choosing the appropriate PyTorch version. This guide aims to simplify the process, particularly for Azure VMs that use the N-series with NVIDIA Tesla T4 GPUs.

Steps to Set Up the GPU Server:

1. Set Up the Virtual Machine:

Start by creating a virtual machine with an N-series instance. Often, your Azure account may not have the quota for N-series GPUs by default. You’ll need to request an increase in quota, which is usually approved within 1-2 business days.

2. Choose the Right OS Image:

Use the Ubuntu 20.04 image. Avoid images labeled as “Data Science” since they might lack proper security credentials and consume around 30-40 GB of space unnecessarily.

3. Configure SSD Storage:

Select SSD storage with at least 128 GB. You can attach an existing disk or create a new one. If you attach an existing disk, remember to manually mount it to your VM. Keep in mind that the /mnt space provided by Azure VMs is temporary and will be reset after a restart.

Installing CUDA and NVIDIA Drivers:

After setting up the machine, you can install the necessary drivers and CUDA toolkit. Here’s how to proceed:

1. Update the Machine:

sudo apt update

2. Install Ubuntu Drivers:

sudo apt install -y ubuntu-drivers-common

  1. . Install the Latest NVIDIA Drivers:

    sudo ubuntu-drivers install

  2. Reboot the VM:

    Restart your virtual machine to apply the changes.

  3. Install CUDA Toolkit:

    Follow these steps to install CUDA 12.2. Note that a higher CUDA version can run models that require lower versions (e.g., CUDA 12.2 can run models needing CUDA 11.7), but not vice versa.

  4.  wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
     sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
     wget https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda-repo-ubuntu2004-12-2-local_12.2.0-535.54.03-1_amd64.deb
     sudo dpkg -i cuda-repo-ubuntu2004-12-2-local_12.2.0-535.54.03-1_amd64.deb
     sudo cp /var/cuda-repo-ubuntu2004-12-2-local/cuda-*-keyring.gpg /usr/share/keyrings/
     sudo apt-get update
     sudo apt-get -y install cuda
    
1
Subscribe to my newsletter

Read articles from kushagra directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

kushagra
kushagra

I am pursuing masters in cloud computing from Galgotias University. have good fundamental knowledge of Devops Tools. I am open to work on devops project