Hf Mcp

Automate and integrate Hugging Face MCP model and pipeline workflows

Hf Mcp is a community skill for building Model Context Protocol servers that integrate Hugging Face model inference, dataset access, and hub operations into AI agent workflows through a standardized tool interface.

What Is This?

Overview

Hf Mcp provides patterns for creating MCP servers that expose Hugging Face capabilities as discoverable tools for AI agents. It covers model inference endpoints, dataset loading and querying, model card retrieval, space deployment, and hub search operations. The skill enables AI assistants to access the full Hugging Face ecosystem through the Model Context Protocol without requiring direct API knowledge.

Who Should Use This

This skill serves developers building AI agents that need access to Hugging Face models and datasets, platform engineers creating unified tool servers for ML operations, and teams that want AI assistants to search, evaluate, and deploy models through natural language interaction.

Why Use It?

Problems It Solves

Accessing Hugging Face services from AI agents requires writing custom integration code for each capability. Model selection involves manual hub browsing rather than programmatic search based on task requirements. Dataset loading and preprocessing steps must be scripted individually for each use case. Without an MCP layer, switching between AI agent platforms requires reimplementing Hugging Face integrations from scratch.

Core Highlights

Model inference tools run predictions on Hugging Face hosted models through a single tool interface. Dataset tools load, filter, and preview datasets from the Hub without manual download scripts. Search tools query the Hub for models, datasets, and spaces with filtering by task type, library, and popularity. Resource endpoints expose model cards and dataset documentation as readable context for AI clients.

How to Use It?

Basic Usage

from mcp.server import Server
from mcp.types import TextContent
from huggingface_hub import InferenceClient, HfApi
import json

server = Server("huggingface-tools")
hf_api = HfApi()
inference = InferenceClient()

@server.tool()
async def search_models(
    query: str, task: str = "", limit: int = 5
) -> list[TextContent]:
    """Search Hugging Face Hub for models."""
    models = hf_api.list_models(
        search=query, task=task or None, limit=limit,
        sort="downloads", direction=-1
    )
    results = [{
        "id": m.id, "task": m.pipeline_tag,
        "downloads": m.downloads, "likes": m.likes
    } for m in models]
    return [TextContent(type="text", text=json.dumps(results, indent=2))]

@server.tool()
async def run_inference(
    model_id: str, text: str
) -> list[TextContent]:
    """Run inference on a Hugging Face model."""
    result = inference.text_generation(text, model=model_id, max_new_tokens=200)
    return [TextContent(type="text", text=result)]

Real-World Examples

from datasets import load_dataset

@server.tool()
async def preview_dataset(
    dataset_id: str, split: str = "train", rows: int = 5
) -> list[TextContent]:
    """Load and preview rows from a Hugging Face dataset."""
    ds = load_dataset(dataset_id, split=split, streaming=True)
    samples = []
    for i, row in enumerate(ds):
        if i >= rows:
            break
        samples.append(row)
    return [TextContent(type="text", text=json.dumps(samples, indent=2, default=str))]

@server.tool()
async def get_model_info(
    model_id: str
) -> list[TextContent]:
    """Get detailed information about a model."""
    info = hf_api.model_info(model_id)
    details = {
        "id": info.id, "pipeline_tag": info.pipeline_tag,
        "library_name": info.library_name,
        "downloads": info.downloads, "likes": info.likes,
        "tags": info.tags[:10] if info.tags else []
    }
    return [TextContent(type="text", text=json.dumps(details, indent=2))]

@server.tool()
async def classify_text(
    text: str, model_id: str = "distilbert-base-uncased-finetuned-sst-2-english"
) -> list[TextContent]:
    """Classify text using a Hugging Face model."""
    result = inference.text_classification(text, model=model_id)
    return [TextContent(type="text", text=json.dumps(result, indent=2))]

Advanced Tips

Cache model search results and dataset metadata to reduce Hub API calls during iterative exploration. Use streaming dataset loading for large datasets that would not fit in memory. Implement tool-level rate limiting to stay within Hugging Face API quotas.

When to Use It?

Use Cases

Build AI assistants that search and evaluate Hugging Face models for specific tasks. Create data exploration agents that preview and analyze datasets through conversation. Develop ML workflow tools that let agents run inference and compare model outputs.

Related Topics

Model Context Protocol specification, Hugging Face Hub API, transformers library, dataset loading patterns, and AI agent tool integration.

Important Notes

Requirements

Python with the mcp, huggingface_hub, and datasets packages installed. A Hugging Face API token for accessing gated models and elevated rate limits. An MCP-compatible client for testing tool interactions.

Usage Recommendations

Do: use streaming mode for large datasets to avoid memory exhaustion. Implement timeout handling for inference calls that may take longer on large models. Return structured JSON from tools for reliable parsing by AI clients.

Don't: expose model deletion or repository write operations without explicit authorization checks. Load entire large datasets into memory when only a preview is needed. Skip error handling for model inference failures that may occur with incompatible inputs.

Limitations

Inference API rate limits restrict the frequency of model predictions. Large model inference may have significant latency on free-tier hosted endpoints. Not all models on the Hub support the Inference API, requiring local deployment for some architectures.