GUIDE

Hugging Face & Local LLMs

Practical tutorials for Transformers, local models, embeddings, GGUF conversion, and open-source LLM workflows.

Ruslan Magana Vsevolodovna

Machine Learning Engineer · Data Scientist · Physicist

Genoa, Italy

Featured tutorial

Convert a Hugging Face model to GGUF in Google Colab

Quantize and export models from the Hub to GGUF for llama.cpp and Ollama. Step-by-step workflow using Transformers, bitsandbytes, and Colab.

GGUFColabTransformersllama.cpp

Read tutorial →

Latest tutorials

Models & Storage

How to download Hugging Face models to a custom cache folder

Organize model files, snapshots, and revisions with a custom HF_HOME directory and cache_dir.

Hugging FaceCache

Read tutorial →

Local LLMs

Run FLAN-T5 and GPT locally with Gradio

Launch interactive demos for local models with an easy Gradio interface.

GradioFLAN-T5GPT

Read tutorial →

Transformers & Fine-tuning

Fine-tune Llama 3 with Unsloth (LoRA + GGUF)

Fine-tune an instruction-following Llama 3 on a single Colab GPU with LoRA, then export to GGUF for local use.

UnslothLlama 3LoRA

Read tutorial →

Transformers & Fine-tuning

Build a basic LLM (GPT) from scratch in Python

Implement a small GPT-style language model and understand the transformer loop end to end.

TransformersLLMFrom Scratch

Read tutorial →

Transformers & Fine-tuning

Fine-tune a transformer model for text classification

Prepare data, train, evaluate, and save your own transformer model with the Hugging Face Trainer.

TransformersFine-tuningNLP

Read tutorial →

Embeddings & Vector Search

Build embeddings and semantic search with Sentence Transformers

Create embeddings and a vector search pipeline with FAISS and Hugging Face Sentence Transformers.

EmbeddingsSentence TransformersFAISS

Read tutorial →

Deployment & Inference

Deploy a Hugging Face model with FastAPI

Build a production-ready API for inference with FastAPI and Uvicorn.

FastAPIDeploymentAPI

Read tutorial →

Deployment & Inference

Use Hugging Face pipelines for rapid prototyping

Leverage pipelines for text, vision, speech, and more in just a few lines of code.

PipelinesNLPVision

Read tutorial →

FAQ

What is Hugging Face used for?

Hugging Face provides models, datasets, libraries, and tools for building and deploying machine learning and LLM workflows.

Can I run Hugging Face models locally?

Yes. Many models can be downloaded and run locally with Transformers, llama.cpp, Ollama, Gradio, or FastAPI depending on the format and hardware.

What is GGUF and why is it useful?

GGUF is a model format commonly used with llama.cpp and local inference tools. It helps run quantized models efficiently on local machines.