open-data · agents
$ ./connect --to open-data

Plugging agents into
open data

The shift: for years we built open data for humans to read. Increasingly the consumer is an AI agent. So we need open-data systems designed for agents, not just people.

Four ways in: CLIs, APIs, Skills, and MCP. Today: how to actually implement MCP servers over open data.

use ← → or space to navigate
01 · the map
four ways in

How an agent reaches your data

Same goal, different contracts. Each layer answers "who handles auth, routing, and reasoning?"

CLI

The agent's native surface. Run a command, pipe it, chain it. In 2026 the terminal became where agents retrieve and act. gh, aws, ckanapi.

API

Raw reach. Maximum surface, zero opinion. The agent must know the endpoints.

Skill

Packaged know-how: instructions + scripts the client runs. Great front door over messy systems.

MCP

A running server exposing tools over a protocol. Where you put shared, agentic logic.

Today's focus: MCP servers — and when to lean on CLIs & Skills instead.

02 · skill vs mcp
runs where?

Skill vs MCP

They live on different sides of the wire.

client-side

Skill

A folder of instructions + code the model loads on demand. Zero infra to run. Perfect for wrapping a CLI or a few API calls behind plain-language steps. Lives with the agent.

server-side

MCP server

A process that advertises typed tools, resources, and prompts over a standard protocol. Any compliant client can connect. Put shared, stateful, or heavy logic here, once, for everyone.

Rule of thumb: personal & lightweight → Skill. Shared service or agentic backend → MCP.

03 · cli for auth
let the binary hold the keys

Skill + CLI = auth solved for free

skill → shells out to an authed CLI
# The CLI already did the OAuth dance,
# stores & refreshes the token for us.
import subprocess, json

def search_open_data(query):
    out = subprocess.run(
        ["ckanapi", "action", "package_search",
         f"q={query}", "--remote", PORTAL],
        capture_output=True, text=True,
    )
    return json.loads(out.stdout)["result"]
  • No secrets in the agent or prompt
  • Token refresh handled by the tool
  • Same path your team uses by hand
  • The Skill is just orchestration over a trusted binary

CLIs are the cheapest auth boundary you'll ever get. Wrap, don't re-implement.

04 · minimal mcp
fastmcp · python

An MCP server is a decorated function

FastMCP turns a typed Python function into a discoverable tool. The signature is the schema.

server.pyfrom fastmcp import FastMCP

mcp = FastMCP(name="OpenDataServer")

@mcp.tool
def find_datasets(query: str, limit: int = 5) -> list[dict]:
    """Search the open-data catalogue for datasets."""
    hits = catalogue.search(query, k=limit)
    return [d.summary() for d in hits]

if __name__ == "__main__":
    mcp.run()
  • Type hints → JSON schema, automatically
  • Docstring → the tool's description
  • Return value → handed back to the model
  • FastMCP: the most-used Python MCP framework (Prefect / J. Lowin).
04b · typescript
vercel ai sdk · mcp-handler

Same idea in TS: re-expose tools you already have

WASH AI wraps its existing knowledge-retrieval tools and serves the Sanihub tenant over one MCP endpoint.

app/api/[transport]/route.tsimport { createMcpHandler } from 'mcp-handler';
import { knowledgeSearch } from '@/lib/ai/tools/knowledge-search';

// existing AI SDK tool, scoped + opinionated
const sanihub = knowledgeSearch({
  tenantId: 'sanihub',
  allowedTenants: ['sanihub'],   // tenant gating
  enableReranking: true,         // server-side rerank
});

const handler = createMcpHandler((server) => {
  server.registerTool('knowledgeSearch',
    { description: sanihub.description,
      inputSchema: sanihub.inputSchema.shape },
    async (args) => ({ content: [{ type: 'text',
      text: await sanihub.execute(args) }] }));
});
export { handler as GET, handler as POST };
any AI SDK client connectsimport { experimental_createMCPClient
  as createMCPClient } from 'ai';

const mcp = await createMCPClient({
  transport: { type: 'sse',
    url: 'https://washai.org/api/mcp',
    headers: { Authorization: `Bearer ${tok}` }}});

const tools = await mcp.tools();   // MCP → AI SDK
await generateText({ model, tools, prompt });
  • Reuse the app's tools, zero rewrite
  • One endpoint extends Sanihub to any client
05 · the trap
⚠ context bloat

Every tool's schema is paid for in tokens

Auto-generate 50–200 tools and the JSON schemas eat the window before the model reasons.

one tool, expanded into context{
  "name": "get_dataset_resource_view_v3",
  "description": "Return a resource view ...",
  "inputSchema": {
    "type": "object",
    "properties": {
      "dataset_id": {"type":"string","description":"..."},
      "resource_id":{"type":"string","description":"..."},
      "view_id":   {"type":"string","description":"..."},
      "include_draft":{"type":"boolean"},
      "format":   {"enum":["json","csv","xml"]}
    },
    "required":["dataset_id","resource_id"]
  }
}   # × 200 tools …

Context window used by tool schemas:

~15k+ tokens · before reasoning
  • Higher cost & latency, every call
  • Model confusion → wrong tool picks
06 · the fix
discovery, not dumping

Expose one search tool, not two hundred

The agent searches for capability on demand; only matched schemas enter context.

the only tool the client sees up front@mcp.tool
def search_tools(need: str) -> list[ToolCard]:
    """Find the right tool for a task, by intent."""
    return registry.rank(need)        # keyword + embeddings

# agent: search_tools("query a CSV resource")
#  → returns 2–3 candidates, full schema on demand

Context window used by tool schemas:

~2–3k tokens
  • 200 tools become 1 entry point
  • Schemas load just-in-time
  • ~50k → ~2–3k tokens of overhead

FastMCP's tool search / code mode ships this pattern out of the box.

07 · free leverage
don't hand-write what a spec already describes

OpenAPI & CKAN → MCP in a few lines

any REST API → MCP serverimport httpx
from fastmcp import FastMCP

client = httpx.AsyncClient(base_url="https://api.example.com")
spec   = httpx.get("https://api.example.com/openapi.json").json()

mcp = FastMCP.from_openapi(
    openapi_spec=spec,
    client=client,          # auth lives on the client
    name="CatalogueAPI",
)
mcp.run()

Open data already has standard APIs

CKAN powers data.gov, open.canada.ca, dati.gov.it. Socrata powers data.calgary.ca and many city portals. Wrap the standard once, every portal gets the same agent interface.

  • Every endpoint becomes a tool by default
  • RouteMap to include / exclude / retag
07b · shipped
open-data MCPs, already live

The discovery pattern

A whole portal behind a handful of intent tools. Search to find, inspect the schema, then query.

Calgary · Socrata

3 tools, any dataset

search_datasets(query)
get_dataset_metadata(id) → schema
query_dataset(id, select, where, order)

SoQL over the live portal. The agent never sees 200 schemas, it discovers them.

IATI · aid data

domain-shaped tools

query_activities · search_text
search_by_country · _by_sector · _by_organization
get_facets · get_activity

Humanitarian & development funding, queryable in plain language. Hosted, with OAuth.

Calgary leans generic (SoQL passes through); IATI leans opinionated (aid concepts as tools).

08 · the catch
auto-generated ≠ done

A thin wrapper pushes the work onto the client

Mirror an API 1:1 and the agent must know which of 200 endpoints to call, in what order, with what params.

client agentmust be smart picks tools · chains calls · re-ranks · cleans
generic mcpthin pass-through 1 tool per endpoint
apiraw responses

Fine for prototypes. Fragile when the client is small or the API is large & noisy.

09 · the other end
put the brains behind the wire

Opinionated MCP → a dumb client can win

one intent-shaped tool, heavy lifting hidden@mcp.tool
def find_knowledge(question: str) -> list[Passage]:
    """Answer-ready evidence for a question."""
    hits   = hybrid_search(question)      # BM25 + vectors
    ranked = cross_encoder_rerank(question, hits)
    ranked = dedupe(ranked)
    return [trim(p) for p in ranked[:5]]  # small, clean
client agentcan be simple
opinionated mcpsearch · rerank · dedupe · trim
storesindex + vectors

The agentic work moves server-side once — every client inherits it.

10 · the dial
it's a spectrum, choose on purpose

Generic reach opinionated depth

generic · API→MCP

Choose when: broad coverage fast, capable client, internal tools, exploration.

Cost: smart client, schema bloat, brittle chaining.

opinionated MCP

Choose when: shared product surface, thin or many clients, quality & safety matter.

Cost: design & maintain the logic yourself.

Most real servers mix both: tool search for breadth + a few intent tools for the hot paths.

10b · guardrails
open ≠ unguarded

Auth, gating & the cost of AI on the server

An open endpoint is still your endpoint. Decide who gets in, how often, and who pays for the compute.

Auth & data access

IATI: user's API key encrypted into a stateless JWT (OAuth 2.1), no server storage. WASH AI: validateTenantAccess blocks private tenants, public data flows freely.

Gating & rate limits

Open data invites scraping. Rate-limit per token, paginate hard caps, separate public vs private scopes so a noisy client can't drain the portal.

AI runs cost money

Reranking, embeddings, research search: every call burns tokens + latency server-side. Budget & meter per call / per tenant, or one agent loop bills you all day.

Push intelligence server-side on purpose, then put a price tag and a gate on it.

recap
$ summary --open-data

Five things to walk out with

  • Skill = client, MCP = server. Lightweight & personal vs shared & agentic.
  • Let the agent drive a CLI. A composable surface, auth included.
  • Watch the token bill. 200 schemas can eat half your window.
  • Expose one search_tools. Discovery beats dumping.
  • Re-expose what you built. mcp-handler wraps existing app tools, like Sanihub.
  • Gate & price it. Auth, rate limits, and a budget on server-side AI.

Open data already speaks CKAN / Socrata / OpenAPI. Meet it where it is, then add opinion where it pays.

fastmcp · gofastmcp.com · ckan-mcp-server

whoami
ai tools for social good

Baobab Tech

A small team building open-source AI for social impact. We help organizations tap into the richness of their data and find new ways of working, now for agents as much as people.

github.com/baobab-tech · linkedin.com/company/baobabtech · thanks ✦
Baobab Tech