Hugging Face Hub is the central repository for open-source machine learning models, datasets, and applications — hosting hundreds of thousands of models with versioning, access control, and serving infrastructure, making it the GitHub of machine learning and the primary distribution channel for open-source AI.
What Is Hugging Face Hub?
- Definition: Platform for hosting and sharing ML artifacts.
- Content: Models, datasets, Spaces (apps), documentation.
- Scale: 500K+ models, 100K+ datasets.
- Integration: Native with transformers, diffusers libraries.
Why Hub Matters
- Discovery: Find pre-trained models for any task.
- Distribution: Share your models with the community.
- Versioning: Track model versions and changes.
- Infrastructure: Free hosting, serving, and compute.
- Community: Collaborate, discuss, contribute.
Using Hub Models
Basic Model Loading:
``python
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model and tokenizer
model_name = "meta-llama/Llama-3.1-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
`
Inference with Pipeline:
`python
from transformers import pipeline
# Quick inference
generator = pipeline("text-generation", model="gpt2")
output = generator("Hello, I am", max_length=50)
print(output[0]["generated_text"])
# Sentiment analysis
classifier = pipeline("sentiment-analysis")
result = classifier("I love this product!")
# [{"label": "POSITIVE", "score": 0.99}]
`
Model Card:
``
Every model page includes:
- Model description and capabilities
- Usage examples
- Training details
- Limitations and biases
- Evaluation results
- License
Uploading Models
Via Python:
`python
from huggingface_hub import HfApi
api = HfApi()
# Create repo
api.create_repo("my-username/my-model", private=False)
# Upload model files
api.upload_folder(
folder_path="./model_output",
repo_id="my-username/my-model",
)
`
Via Transformers:
`python`
# After training
model.push_to_hub("my-username/my-model")
tokenizer.push_to_hub("my-username/my-model")
Via CLI:
`bash
# Login first
huggingface-cli login
# Upload
huggingface-cli upload my-username/my-model ./model_output
`
Dataset Hub
`python
from datasets import load_dataset
# Load dataset
dataset = load_dataset("squad")
# Load specific split
train_data = load_dataset("squad", split="train")
# Load from Hub
custom_data = load_dataset("my-username/my-dataset")
# Preview
print(dataset["train"][0])
`
Spaces (ML Apps)
Create Gradio Demo:
`python
import gradio as gr
def predict(text):
return f"You said: {text}"
demo = gr.Interface(fn=predict, inputs="text", outputs="text")
demo.launch()
# Deploy to Space
# Create Space on HF, push this code
`
Popular Space Types:
``
Type | Framework | Use Case
------------|-------------|------------------------
Gradio | gradio | Interactive demos
Streamlit | streamlit | Dashboards
Docker | Docker | Custom apps
Static | HTML/JS | Simple pages
Model Discovery
Search Filters:
``
- Task: text-generation, image-classification, etc.
- Library: transformers, diffusers, timm
- Dataset: Models trained on specific data
- Language: en, zh, multilingual
- License: MIT, Apache, commercial
API Access:
`python
from huggingface_hub import HfApi
api = HfApi()
# Search models
models = api.list_models(
filter="text-generation",
sort="downloads",
limit=10
)
for model in models:
print(f"{model.modelId}: {model.downloads} downloads")
`
Inference API
`python
import requests
API_URL = "https://api-inference.huggingface.co/models/gpt2"
headers = {"Authorization": "Bearer YOUR_TOKEN"}
response = requests.post(
API_URL,
headers=headers,
json={"inputs": "Hello, I am"}
)
print(response.json())
``
Best Practices
- Model Cards: Always write thorough documentation.
- Licensing: Choose appropriate license for your use case.
- Versioning: Use branches/tags for different versions.
- Testing: Verify model works before publishing.
- Community: Engage with issues and discussions.
Hugging Face Hub is the infrastructure backbone of open-source AI — providing the discovery, distribution, and collaboration tools that enable the community to share and build upon each other's work, democratizing access to state-of-the-art models.