Leveraging AMD GPUs for Hugging Face Model Inference: A Step-by-Step Guide

2 minute read

With the growing importance of generative AI and large language models, having access to high-performance GPU accelerators is critical for model inference. Traditionally, NVIDIA’s CUDA ecosystem has dominated the AI landscape. However, AMD’s ROCm (Radeon Open Compute) platform provides a strong alternative for AMD GPUs, including the cutting-edge AMD Instinct MI300X accelerators.

Photo by AMD

In this blog, we will explore how to set up AMD GPUs for inference with Hugging Face models, covering driver installation, software setup, and how to execute model inference.

Step 1: Understanding AMD ROCm – The Equivalent of CUDA

AMD ROCm (Radeon Open Compute) is an open software platform for GPU computing that provides tools for running AI workloads. Similar to NVIDIA’s CUDA, ROCm enables deep learning frameworks and libraries to utilize AMD GPUs for training and inference.

Key components:

HIP (Heterogeneous-compute Interface for Portability): A runtime and kernel compilation API for writing GPU programs that can run on AMD and NVIDIA GPUs.
miopen (AMD’s Deep Learning Library): Equivalent to cuDNN, it optimizes deep learning operations on AMD GPUs.
rocBLAS: A GPU-accelerated library for matrix operations, similar to cuBLAS.

Step 2: Specifications of AMD Instinct MI300X

The IBM-AMD collaboration introduces the MI300X accelerator with significant advancements tailored for generative AI and HPC applications:

192GB HBM3 Memory: Handles large models efficiently.
Support for ROCm Software Stack: Compatible with frameworks like PyTorch and TensorFlow.
Optimized for IBM watsonx AI and Red Hat OpenShift AI Platforms.

This guide focuses on using these GPUs for Hugging Face model inference.

Step 3: Installing AMD ROCm Drivers

Follow these steps to install AMD ROCm drivers and configure the environment for AMD GPU inference.

1. Check GPU Compatibility

Ensure your server supports AMD ROCm. For MI300X accelerators, ROCm 5.x or later is required.

Run the following command to check your GPU:
```
lspci | grep -i vga
```
Look for entries like Advanced Micro Devices (AMD).

2. Add ROCm Repository

AMD provides ROCm packages for Ubuntu and RHEL-based distributions.

For Ubuntu:

wget -qO - http://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -
echo 'deb [arch=amd64] http://repo.radeon.com/rocm/apt/5.x/ ubuntu main' | sudo tee /etc/apt/sources.list.d/rocm.list
sudo apt update

3. Install ROCm Software

Install the ROCm platform and required libraries:

sudo apt install rocm-dev rocm-libs miopen-hip

4. Verify Installation

Check if ROCm is correctly installed:
```
/opt/rocm/bin/rocminfo
```
Ensure your AMD GPU is visible:
```
/opt/rocm/opencl/bin/clinfo
```

Step 4: Installing PyTorch with ROCm Support

Hugging Face models rely on PyTorch or TensorFlow as the backend. PyTorch natively supports ROCm.

Install PyTorch for ROCm:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2

Test GPU Availability:

import torch
print(torch.cuda.is_available())  # Should print True for AMD GPUs

Step 5: Loading Hugging Face Models for Inference

Install Hugging Face Transformers:
```
pip install transformers
```

Load the Model for GPU Inference: Use the device argument to specify the ROCm-compatible GPU.

from transformers import AutoModelForCausalLM, AutoTokenizer
   
model_name = "meta-llama/Llama-2-7b-hf"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to("cuda")
   
# Inference
inputs = tokenizer("Hello, how are you?", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0]))

Step 6: Troubleshooting Common Issues

Issue: GPU Not Recognized
- Ensure ROCm drivers are correctly installed.
- Check if the GPU is visible:
```
rocminfo
```
Issue: PyTorch Fails to Detect GPU
- Ensure the correct PyTorch version with ROCm support is installed.
- Verify installation with:
```
python -m torch.utils.collect_env
```

Issue: Slow Performance

Use optimized batch sizes and enable mixed precision:

from torch.cuda.amp import autocast
    
with autocast():
    outputs = model.generate(**inputs)

Step 7: Next Steps

IBM Cloud’s upcoming deployment of AMD Instinct MI300X accelerators opens new opportunities to scale generative AI workloads efficiently. By leveraging ROCm and AMD GPUs, you can significantly reduce costs and boost performance for Hugging Face models.

Stay tuned for updates on IBM Cloud’s support for MI300X accelerators and how to integrate them into your AI workflows.

References:

Enjoy building your generative AI models on AMD-powered infrastructure!

Share on

Twitter Facebook LinkedIn

Ruslan Magana Vsevolodovna

Leveraging AMD GPUs for Hugging Face Model Inference: A Step-by-Step Guide

Step 1: Understanding AMD ROCm – The Equivalent of CUDA

Step 2: Specifications of AMD Instinct MI300X

Step 3: Installing AMD ROCm Drivers

1. Check GPU Compatibility

2. Add ROCm Repository

3. Install ROCm Software

4. Verify Installation

Step 4: Installing PyTorch with ROCm Support

Step 5: Loading Hugging Face Models for Inference

Step 6: Troubleshooting Common Issues

Step 7: Next Steps

Share on

Leave a comment

You may also enjoy

Building Multi-Agent Systems with Universal A2A Agent

01 Oct 2025

From Zero to Hero: Building a Multi-Agent System with Watsonx Orchestrate

07 Jul 2025

Watsonx.ai Agent to MCP Gateway

01 Jul 2025

Building a Watsonx.ai Chatbot RAG Server with MCP

20 Apr 2025