Hugging Face Hub

Keywords: hugging face, model hub, transformers, datasets, spaces, open source models, model hosting

Hugging Face Hub is the central repository for open-source machine learning models, datasets, and applications — hosting hundreds of thousands of models with versioning, access control, and serving infrastructure, making it the GitHub of machine learning and the primary distribution channel for open-source AI.

What Is Hugging Face Hub?

- Definition: Platform for hosting and sharing ML artifacts.
- Content: Models, datasets, Spaces (apps), documentation.
- Scale: 500K+ models, 100K+ datasets.
- Integration: Native with transformers, diffusers libraries.

Why Hub Matters

- Discovery: Find pre-trained models for any task.
- Distribution: Share your models with the community.
- Versioning: Track model versions and changes.
- Infrastructure: Free hosting, serving, and compute.
- Community: Collaborate, discuss, contribute.

Using Hub Models

Basic Model Loading:
``python
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model_name = "meta-llama/Llama-3.1-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
`

Inference with Pipeline:
`python
from transformers import pipeline

# Quick inference
generator = pipeline("text-generation", model="gpt2")
output = generator("Hello, I am", max_length=50)
print(output[0]["generated_text"])

# Sentiment analysis
classifier = pipeline("sentiment-analysis")
result = classifier("I love this product!")
# [{"label": "POSITIVE", "score": 0.99}]
`

Model Card:
`
Every model page includes:
- Model description and capabilities
- Usage examples
- Training details
- Limitations and biases
- Evaluation results
- License
`

Uploading Models

Via Python:
`python
from huggingface_hub import HfApi

api = HfApi()

# Create repo
api.create_repo("my-username/my-model", private=False)

# Upload model files
api.upload_folder(
folder_path="./model_output",
repo_id="my-username/my-model",
)
`

Via Transformers:
`python
# After training
model.push_to_hub("my-username/my-model")
tokenizer.push_to_hub("my-username/my-model")
`

Via CLI:
`bash
# Login first
huggingface-cli login

# Upload
huggingface-cli upload my-username/my-model ./model_output
`

Dataset Hub

`python
from datasets import load_dataset

# Load dataset
dataset = load_dataset("squad")

# Load specific split
train_data = load_dataset("squad", split="train")

# Load from Hub
custom_data = load_dataset("my-username/my-dataset")

# Preview
print(dataset["train"][0])
`

Spaces (ML Apps)

Create Gradio Demo:
`python
import gradio as gr

def predict(text):
return f"You said: {text}"

demo = gr.Interface(fn=predict, inputs="text", outputs="text")
demo.launch()

# Deploy to Space
# Create Space on HF, push this code
`

Popular Space Types:
`
Type | Framework | Use Case
------------|-------------|------------------------
Gradio | gradio | Interactive demos
Streamlit | streamlit | Dashboards
Docker | Docker | Custom apps
Static | HTML/JS | Simple pages
`

Model Discovery

Search Filters:
`
- Task: text-generation, image-classification, etc.
- Library: transformers, diffusers, timm
- Dataset: Models trained on specific data
- Language: en, zh, multilingual
- License: MIT, Apache, commercial
`

API Access:
`python
from huggingface_hub import HfApi

api = HfApi()

# Search models
models = api.list_models(
filter="text-generation",
sort="downloads",
limit=10
)

for model in models:
print(f"{model.modelId}: {model.downloads} downloads")
`

Inference API

`python
import requests

API_URL = "https://api-inference.huggingface.co/models/gpt2"
headers = {"Authorization": "Bearer YOUR_TOKEN"}

response = requests.post(
API_URL,
headers=headers,
json={"inputs": "Hello, I am"}
)
print(response.json())
``

Best Practices

- Model Cards: Always write thorough documentation.
- Licensing: Choose appropriate license for your use case.
- Versioning: Use branches/tags for different versions.
- Testing: Verify model works before publishing.
- Community: Engage with issues and discussions.

Hugging Face Hub is the infrastructure backbone of open-source AI — providing the discovery, distribution, and collaboration tools that enable the community to share and build upon each other's work, democratizing access to state-of-the-art models.

Want to learn more?

Search 13,225+ semiconductor and AI topics or chat with our AI assistant.

Search Topics Chat with CFSGPT