Hugging Face Hub

Hugging Face Hub is the central repository for open-source machine learning models, datasets, and applications — hosting hundreds of thousands of models with versioning, access control, and serving infrastructure, making it the GitHub of machine learning and the primary distribution channel for open-source AI.

What Is Hugging Face Hub?

- Definition: Platform for hosting and sharing ML artifacts.
- Content: Models, datasets, Spaces (apps), documentation.
- Scale: 500K+ models, 100K+ datasets.
- Integration: Native with transformers, diffusers libraries.

Why Hub Matters

- Discovery: Find pre-trained models for any task.
- Distribution: Share your models with the community.
- Versioning: Track model versions and changes.
- Infrastructure: Free hosting, serving, and compute.
- Community: Collaborate, discuss, contribute.

Using Hub Models

Basic Model Loading:
``python from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer model_name = "meta-llama/Llama-3.1-8B-Instruct" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name)`

Inference with Pipeline:`python from transformers import pipeline

# Quick inference generator = pipeline("text-generation", model="gpt2") output = generator("Hello, I am", max_length=50) print(output[0]["generated_text"])

# Sentiment analysis classifier = pipeline("sentiment-analysis") result = classifier("I love this product!") # [{"label": "POSITIVE", "score": 0.99}]`

Model Card:`Every model page includes: - Model description and capabilities - Usage examples - Training details - Limitations and biases - Evaluation results - License`

Uploading Models

Via Python:`python from huggingface_hub import HfApi

api = HfApi()

# Create repo api.create_repo("my-username/my-model", private=False)

# Upload model files api.upload_folder( folder_path="./model_output", repo_id="my-username/my-model", )`

Via Transformers:`python # After training model.push_to_hub("my-username/my-model") tokenizer.push_to_hub("my-username/my-model")`

Via CLI:`bash # Login first huggingface-cli login

# Upload huggingface-cli upload my-username/my-model ./model_output`

Dataset Hub

`python from datasets import load_dataset

# Load dataset dataset = load_dataset("squad")

# Load specific split train_data = load_dataset("squad", split="train")

# Load from Hub custom_data = load_dataset("my-username/my-dataset")

# Preview print(dataset["train"][0])`

Spaces (ML Apps)

Create Gradio Demo:`python import gradio as gr

def predict(text): return f"You said: {text}"

demo = gr.Interface(fn=predict, inputs="text", outputs="text") demo.launch()

# Deploy to Space # Create Space on HF, push this code`

Popular Space Types:`Type | Framework | Use Case ------------|-------------|------------------------ Gradio | gradio | Interactive demos Streamlit | streamlit | Dashboards Docker | Docker | Custom apps Static | HTML/JS | Simple pages`

Model Discovery

Search Filters:`- Task: text-generation, image-classification, etc. - Library: transformers, diffusers, timm - Dataset: Models trained on specific data - Language: en, zh, multilingual - License: MIT, Apache, commercial`

API Access:`python from huggingface_hub import HfApi

api = HfApi()

# Search models models = api.list_models( filter="text-generation", sort="downloads", limit=10 )

for model in models: print(f"{model.modelId}: {model.downloads} downloads")`

Inference API

`python import requests

API_URL = "https://api-inference.huggingface.co/models/gpt2" headers = {"Authorization": "Bearer YOUR_TOKEN"}

response = requests.post( API_URL, headers=headers, json={"inputs": "Hello, I am"} ) print(response.json())``

Best Practices

- Model Cards: Always write thorough documentation.
- Licensing: Choose appropriate license for your use case.
- Versioning: Use branches/tags for different versions.
- Testing: Verify model works before publishing.
- Community: Engage with issues and discussions.

Hugging Face Hub is the infrastructure backbone of open-source AI — providing the discovery, distribution, and collaboration tools that enable the community to share and build upon each other's work, democratizing access to state-of-the-art models.

Want to learn more?