How to Configure NVLink on a VPS with Multi-GPU Setup

NVLink is NVIDIA’s high-bandwidth, energy-efficient interconnect for linking multiple GPUs, providing significantly higher communication bandwidth compared to traditional PCIe. For data scientists, ML engineers, and high-performance computing (HPC) users, NVLink offers seamless GPU memory sharing, reducing bottlenecks and enabling faster training and inference.

But can you use NVLink in a VPS? The answer is yes, but with important caveats. Let’s explore what NVLink is, how to configure it, and what’s required for it to work in a virtualized server environment.

Limitations and Caveats

  • Not all VPS providers support NVLink setups.

  • Only bare-metal-based VPS or dedicated GPU virtual machines with direct passthrough allow NVLink to work.

  • No NVLink support in containerized environments like Docker unless run on the host directly.

What Is NVLink?

NVLink allows two or more compatible NVIDIA GPUs to:

  • Share memory across GPUs for large datasets

  • Exchange data at up to 600 GB/s total bandwidth

  • Perform faster multi-GPU training without CPU involvement

Supported on GPUs like:

  • NVIDIA A100, V100, RTX 3090, 4090, A6000, etc.

  • Usually requires a physical NVLink bridge

NVLink in VPS: Prerequisites

Before attempting to configure NVLink on a VPS, ensure the following:

Host Hardware

  • The physical server must have:

    • At least two NVLink-compatible GPUs

    • NVLink bridge(s) installed

    • BIOS and firmware that supports NVLink

  • Common compatible setups include dual A100 or RTX 3090 with NVLink bridge.

VPS Configuration

  • The VPS must be provisioned on a GPU passthrough-enabled hypervisor, like:

    • KVM/QEMU with VFIO (PCI passthrough)

    • VMware ESXi with DirectPath I/O

    • Proxmox VE with GPU passthrough

⚠️ Note: NVLink does not work across virtualized devices unless both GPUs are passed through as full PCIe devices to the same VM.

Step-by-Step: How to Configure NVLink on a VPS

Step 1: Ensure Passthrough of GPUs

The host needs to pass both physical GPUs directly to your VPS.

For KVM/QEMU with VFIO:

# Example for assigning two GPUs via vfio-pci
echo "vendor_id device_id" > /sys/bus/pci/devices/0000:65:00.0/driver/unbind
echo "vendor_id device_id" > /sys/bus/pci/devices/0000:66:00.0/driver/unbind
echo "vendor_id device_id" > /sys/bus/pci/drivers/vfio-pci/new_id

Update libvirt or qemu XML to pass both GPUs through.

Step 2: Install NVIDIA Drivers

Inside the VPS (guest OS), install the latest NVIDIA driver:

sudo apt update
sudo apt install -y nvidia-driver-535

Reboot after installation.

Step 3: Verify NVLink Topology

Once inside the guest OS:

nvidia-smi topo -m

You should see:

GPU0GPU1CPU Affinity
GPU0XNV10-15
GPU1NV1X0-15

Where NV1 means NVLink is active between GPU0 and GPU1.

Step 4: Enable Peer-to-Peer Access (Optional but Recommended)

nvidia-smi p2p

Ensure Peer-to-Peer and Access are both marked as Enabled.

 Security Considerations

  • Isolated access: Ensure your VPS is not oversubscribed or co-hosted with others when using full GPU passthrough.

  • No shared memory leakage: NVLink creates a shared memory space—limit access to trusted environments.

  • Audit access to /dev/nvidia devices*.

Troubleshooting NVLink Issues

SymptomPossible CauseFix
NVLink not shown in nvidia-smiGPUs not bridged properlyPower off host and reinstall physical NVLink bridge
Only one GPU visiblePassthrough misconfigurationCheck VM XML/device passthrough settings
Peer-to-peer disabledDriver mismatch or BIOS settingsUpgrade driver, check BIOS for NVLink support
Low bandwidthNVLink lanes underutilizedUse nvidia-smi nvlink –status to verify lanes

 

NVLink is a game-changer for GPU-intensive workloads, offering immense performance advantages when properly configured—even in virtual environments. With direct GPU passthrough and careful setup, you can harness the power of multi-GPU interconnects on a VPS, turning it into a high-performance computing node for demanding applications.