Featured tutorial
Convert a Hugging Face model to GGUF in Google Colab
Quantize and export models from the Hub to GGUF for llama.cpp and Ollama. Step-by-step workflow using Transformers, bitsandbytes, and Colab.
Read tutorial →GUIDE
Practical tutorials for Transformers, local models, embeddings, GGUF conversion, and open-source LLM workflows.
Featured tutorial
Quantize and export models from the Hub to GGUF for llama.cpp and Ollama. Step-by-step workflow using Transformers, bitsandbytes, and Colab.
Read tutorial →Latest tutorials
Organize model files, snapshots, and revisions with a custom HF_HOME directory and cache_dir.
Read tutorial →
Launch interactive demos for local models with an easy Gradio interface.
Read tutorial →
Fine-tune an instruction-following Llama 3 on a single Colab GPU with LoRA, then export to GGUF for local use.
Read tutorial →
Implement a small GPT-style language model and understand the transformer loop end to end.
Read tutorial →
Prepare data, train, evaluate, and save your own transformer model with the Hugging Face Trainer.
Read tutorial →
Create embeddings and a vector search pipeline with FAISS and Hugging Face Sentence Transformers.
Read tutorial →
Build a production-ready API for inference with FastAPI and Uvicorn.
Read tutorial →
Leverage pipelines for text, vision, speech, and more in just a few lines of code.
Read tutorial →Hugging Face provides models, datasets, libraries, and tools for building and deploying machine learning and LLM workflows.
Yes. Many models can be downloaded and run locally with Transformers, llama.cpp, Ollama, Gradio, or FastAPI depending on the format and hardware.
GGUF is a model format commonly used with llama.cpp and local inference tools. It helps run quantized models efficiently on local machines.