Practical guides for working with Hugging Face Transformers and running large language models locally.
Models & storage
- How to download Hugging Face models to a custom cache folder (Python) —
cache_dir,HF_HOME,TRANSFORMERS_CACHE, andsave_pretrained.
Run LLMs locally
- Convert a Hugging Face model to GGUF in Google Colab — quantize and export for llama.cpp / Ollama.
- Run FLAN-T5 and GPT large language models locally with Gradio — a local web UI in Python.
Build models from scratch
- Build a basic LLM (GPT) from scratch in Python
- Text generation from scratch: build a generative AI model in Python
Related topics
Continue with Generative AI, AI Agents, and the daily AI Rankings of top AI repositories, papers, and packages.
FAQ
How do I stop Hugging Face from filling up my default drive?
Point the cache elsewhere with cache_dir / HF_HOME — see downloading models to a custom cache folder.
How do I run a Hugging Face model offline / locally? Convert it to GGUF in Google Colab, then run it locally, or use a local Gradio app for FLAN-T5 / GPT.