Building an MCP Server with Unstructured API

Unstructured

Mar 13, 2025

Authors

Maria Khalusova

Unstructured

Authors

Maria Khalusova

Unstructured

Imagine telling Claude to preprocess your unstructured data, making your documents RAG-ready. Or seamlessly asking it questions about any document—without worrying about file type limitations. How would you make that possible? Unstructured provides all the tools you need for data processing, but how can Claude Desktop, Cursor, or your LLM agent access and use them? That’s where the Model Context Protocol (MCP) comes in.

In this tutorial, we'll walk through building an MCP server that integrates with the Unstructured API. You'll learn the fundamentals of MCP architecture, explore Unstructured’s capabilities, and follow a step-by-step implementation guide. While this tutorial doesn’t cover full MCP integration, it provides a solid foundation to get you started.

What is MCP?

Model Context Protocol (MCP) is an open protocol that standardizes how applications provide context to LLMs. Think of it as a common language that helps LLMs communicate effectively with your applications. MCP is particularly powerful because it:

Standardizes context delivery to LLMs
Enables building complex workflows and agents
Promotes reusability and interoperability
Simplifies the integration with different APIs

Unlike other integration methods that might require custom implementations for each LLM model, MCP offers a standardized approach that will work across different clients. Once you build your MCP server, you can use it from Claude, or Windsurf, or any custom LLM tool or agent.

Why Unstructured API?

Unstructured API is a powerful interface to the Unstructured Enterprise ETL product that is designed for processing and managing unstructured data for GenAI applications. The API provides all of the capabilities that are currently available via Unstructured UI, for example:

Managing document processing workflows
Handling multiple source and destination connectors
Extracting and structuring content from various document formats
Automating document processing pipelines

By building an MCP server over Unstructured API you can provision Unstructured data for your GenAI systems by simply telling your LLM what to do in natural language!

Imagine asking your LLM, in English, for GenAI ready data:

"Transform these PDFs into JSON"
"Convert data from this S3 bucket to JSON"
"Run this Unstructured workflow: transform, chunk, and embed the data in my Google Drive with text-embedding-3-small and push it to Pinecone"

Once built, you can integrate your MCP server with Claude Desktop, Cursor, Windsurf or any other client to control the entire data preprocessing as if you were chatting with your team mate.

Isn’t it exciting? Let's walk through how you can build your own MCP server to expose some of the Unstructured API functionality to LLMs, LLM agents and other MCP clients.

Note: We are currently working on our Unstructured MCP server implementation, and you can follow our progress in this GitHub repo. We’ll use this repo as a reference for this tutorial.

Project Setup

Let's start by setting up our project. You’ll need to set up your uv environment first, and add the required dependencies - see README.md.

You will also need to create a .env file with your Unstructured API key:

UNSTRUCTURED_API_KEY=your_api_key_here

To start your free 14-day free trial, sign up for Unstructured today and follow these steps to access your API key to use Unstructured API.

Here's the project structure we'll be working with. Keep in mind that the structure in the linked GitHub repo may differ as this is a project that is being actively developed:

uns-mcp/
├── .env
├── pyproject.toml
├── server.py
├── connectors/
│   ├── __init__.py
│   ├── source/
│   │   ├── __init__.py
│   │   └── s3.py
│   └── destination/
│       ├── __init__.py
│       └── s3.py

Building the MCP Server

Core MCP Concepts

In general, MCP servers can provide three main types of capabilities:

Resources: File-like data that can be read by clients
Tools: Functions that can be called by the LLM
Prompts: Pre-written templates that help users accomplish specific tasks

For our Unstructured MCP server we will only need to build tools for managing document processing workflows and connectors through the Unstructured API.

To build the core of this server we’ll use FastMCP which significantly simplifies MCP server implementation. With the FastMCP class, you can use Python type hints and docstrings to automatically generate tool definitions, making it easy to create and maintain MCP tools.

Core Components

FastMCP Server: The main server implementation using the FastMCP class
Application Context: MCP servers use a lifecycle management system to handle initialization, state management, and cleanup.
MCP Tools: A collection of tools for interacting with the Unstructured API

Server Implementation

Here's our main server implementation:

from dataclasses import dataclass
from typing import AsyncIterator
from mcp.server.fastmcp import FastMCP, Context
from unstructured_client import UnstructuredClient


@dataclass
class AppContext:
    client: UnstructuredClient


@asynccontextmanager
async def app_lifespan(server: FastMCP) -> AsyncIterator[AppContext]:
    """Manage Unstructured API client lifecycle"""
    api_key = os.getenv("UNSTRUCTURED_API_KEY")
    if not api_key:
        raise ValueError("UNSTRUCTURED_API_KEY environment variable is required")
    client = UnstructuredClient(api_key_auth=api_key)
    try:
        yield AppContext(client=client)
    finally:
        pass


# Create MCP server instance
mcp = FastMCP("Unstructured API", lifespan=app_lifespan)

Application context helps to initialize resources once, share across requests, clean them properly on server shutdown, and prevent memory leaks.

MCP Tools

Tools are the core functionality providers in an MCP server. Let's explore how to implement them effectively. When building an MCP server with FastMCP we can implement tools for interacting with the Unstructured API by decorating normal Python functions with @mcp.tool(). Here’s an example of a tool that can call Unstructured API to list available workflows:

@mcp.tool()
async def list_workflows(
        ctx: Context,
        destination_id: Optional[str] = None,
        source_id: Optional[str] = None,
        status: Optional[str] = None
) -> str:
    """
    List workflows from the Unstructured API.
    Args:
        destination_id: Optional destination connector ID to filter by
        source_id: Optional source connector ID to filter by
        status: Optional workflow status to filter by
    Returns:
        String containing the list of workflows
    """
    client = ctx.request_context.lifespan_context.client
    request = ListWorkflowsRequest(
        destination_id=destination_id,
        source_id=source_id
    )
    if status:
        try:
            request.status = WorkflowState[status]
        except KeyError:
            return f"Invalid workflow status: {status}"
    response = await client.workflows.list_workflows_async(request=request)
    # Sort workflows by name
    sorted_workflows = sorted(
        response.response_list_workflows,
        key=lambda workflow: workflow.name.lower()
    )
    if not sorted_workflows:
        return "No workflows found"
    # Format response
    result = ["Available workflows:"]
    for workflow in sorted_workflows:
        result.append(f"- {workflow.name} (ID: {workflow.id})")
    return "\n".join(result)

Running the Server

Once ready, you can run the server with the following command:

mcp run server.py

If you want to make this MCP server available in your Claude Desktop, go to ~/Library/Application Support/Claude/ and create a claude_desktop_config.json file.

In this file add the following:

{
    "mcpServers": {
        "UNS_MCP": {
            "command": "ABSOLUTE/PATH/TO/.local/bin/uv",
            "args": [
                "--directory",
                "ABSOLUTE/PATH/TO/UNS-MCP",
                "run",
                "server.py"
            ],
            "disabled": false
        }
    }
}

Restart Claude Desktop application, and you should now be able to ask Claude to interact with Unstructured.

You should see a tiny hammer icon indicating how many tools you have available:

Now you can use what you built:

Type in your requests, questions, or instructions and you are ready to go!

Best Practices

When you're working with FastMCP, it's a good idea to take full advantage of its built-in context management—it really helps organize your code.
When it comes to organizing your code, a modular design is the way to go. Keep your connectors in separate modules so everything is more manageable and reusable. Keep your tools focused on one task to keep things simple and easy to maintain.
Don’t forget to validate inputs to make sure no harmful data gets through, and always be careful with sensitive data to ensure it's handled securely at every stage of your app.

Conclusion

Building an MCP server with the Unstructured API unlocks a powerful new way to preprocess unstructured data for GenAI applications. By leveraging MCP, you can seamlessly integrate Unstructured’s document processing capabilities into Claude Desktop, Cursor, Windsurf, or any other LLM-powered client. Instead of manually handling data transformations, you can simply tell your LLM what you need in natural language—bridging the gap between AI and enterprise-grade data processing.

If you're excited about this integration, we encourage you to build your own MCP server using the approach outlined in this guide. Whether you want to customize tools, expand workflows, or streamline how your LLM interacts with data, the possibilities are vast.

To get started, check out our GitHub repo, where we're actively developing an Unstructured MCP server. Use it as a reference, contribute, or adapt it for your own use cases.

We can’t wait to see what you build. Let us know how you’re using MCP with Unstructured API—happy coding! 🚀