Making a Country Legible to AI: My mcp.sv Hobby Project

I have a hobby project called mcp.sv. The one-line version, straight from
its about page:

"An attempt to make the whole state of El Salvador visible from one place, from any AI client."

This post isn't the deep technical one — I wrote that already,
about the multilingual search and the RAG layers. This is the why, and the more interesting
question hiding behind it: what does it actually take to make a country understandable to an
AI? Turns out it's not "scrape everything and embed it." It's a series of deliberate layers,
and building them has taught me more about my own country's plumbing than a decade of living
abroad did.

Why I'm building it

I was born in Suchitoto. I've spent the last decade as a digital nomad, working remotely,
watching El Salvador from a distance go through a genuinely large amount of structural change —
in technology, in legislation, in the administrative machinery that lets people actually
transact with their government. The country quietly built the kind of digital state surface
most outside observers stopped paying attention to years ago.

But that surface is fragmented. Two dozen institutions each publish on their own website, in
their own format, under their own conventions. A human can eventually find what they need by
clicking through five portals. An AI can't — there's no single place to ask, and the data
isn't shaped for a machine to reason over. So when someone asks ChatGPT "what are the rules for
issuing electronic invoices in El Salvador?", the honest answer is usually a vague,
out-of-date guess.

mcp.sv is my attempt to fix that for one country. It's where my professional habit (.NET,
Postgres, embeddings, MCP servers) and my origin happen to meet. Tagline I gave it: "a modern
channel for a modernizing country."

Where it is today

As of the latest ingest, the corpus is 279 documents / ~19,000 searchable chunks / 24
sources, across 11 categories:

legal 124 · fiscal 37 · government 27 · health 17 · education 16
· economy 13 · history 11 · diaspora 11 · tourism 9 · bitcoin 8 · environment 6

The sources are real institutions — Asamblea Legislativa, the central bank (BCR), the electronic
invoicing (DTE) spec, the ministries of Education, Health, Environment, Tourism, the Supreme
Court's documentation center, CORSATUR for tourism, simple.sv for government procedures, and
more. Another dozen are anchored but not yet populated — DGII, Presidencia, Migración, the
National Registry, the Bitcoin Office — waiting for me to write their "discoverers."

And here's the part that matters for the thesis: an AI doesn't browse any of this. It asks.
mcp.sv speaks the Model Context Protocol, so any MCP-capable
assistant gets a small set of tools:

search_knowledge — keyword + semantic search over the whole corpus
get_country_context — curated context on a topic ("bitcoin", "remittances")
get_corpus_stats — what's indexed, by category and source
explain_document — one document with its full text and source
get_recent_signals / get_daily_brief — recent dated facts
plus category-scoped searches (government, tourism, economy)

It's read-only and addressable from any client. That's the whole interface: a country you can
query.

What it actually takes to make a country legible to an AI

This is the question the project keeps forcing me to answer. A pile of PDFs is not knowledge an
AI can use well. Making it usable is a stack of layers — each one a step from "raw documents" to
"a country a machine can reason about."

Layer 1 — Unify the surface

Step zero is boring and unglamorous: get everything into one place with one shape. Twenty-four
portals → one corpus, one query endpoint. Until this exists, nothing else matters, because the AI
has nowhere to point. Most of my actual hours go here — writing per-source "discoverers" that find
new documents and diff them against a manifest.

Layer 2 — Structure, not just text

A scanned decree is a wall of words. What makes it legible is the structure I attach: every
legal document carries a decree number (D.L. 33/2023), an issue date, a last-reform
date, and the regulator that issued it. Each chunk knows its page and the article or
chapter heading above it. That's the difference between "the AI said something about taxes" and
"Article 57 of Decreto 233, page 3 says…" — a checkable citation instead of a vibe.

Layer 3 — Cross the language barrier

The corpus is Spanish; the world asks in many languages. So I bridge languages in the index,
ahead of time — synthetic English/Russian keyword aliases, stored translations of titles and
summaries, and accent-folding so "constitucion" finds "Constitución." An AI (or a Russian
researcher) can ask in their own language and still hit a Spanish decree. A country shouldn't be
legible only to people who already speak its language.

Layer 4 — Trust and provenance

For a country, "where did this come from" is not optional. Every source has a trust level —
Official, Verified, Community — and every answer can cite the issuing institution. An AI
summarizing the law should be able to say "according to the Asamblea Legislativa" and mean it.
Legibility without provenance is just confident guessing.

Layer 5 — Facts, not only documents

This is where I'm still building. Documents answer "what does the law say?" They don't answer
"what's the price of fuel this week?" or "what's the current exchange rate?" Those are signals
— small, dated, structured facts that change over time. I've started a Signal shape and
get_recent_signals / get_daily_brief tools for exactly this. A country is not just its
archive; it's its current state, and that needs a different data shape than a PDF.

Layer 6 — The missing entities

The honest gap: I have documents, but not yet a structured map of the country — ministries
and the officials who run them, institutions and how they relate, laws cross-referenced by topic,
municipalities and basic demographics, the archaeological and indigenous-language record (Náhuat
barely appears yet). And one feature I keep circling: a legal-status flag —
vigente / reformado / derogado — so an AI never cites a repealed law as current. That single
field might be the highest-value thing left to build.

The pattern across all six: legibility is the work you do so a machine doesn't have to
guess. Unify, structure, translate, attribute, quantify, connect. Each layer turns a little
more of a country from "documents a human can eventually find" into "knowledge a machine can
reason over."

Why this generalizes

The project is called mcp.sv "for historical reasons" — but every record carries an ISO country
code, so the same engine serves Guatemala, Honduras, Nicaragua the moment someone writes their
source discoverers. It's positioned as El Salvador's national knowledge MCP — a product of
the Sivar ecosystem, not a member of its app fleet — and the read-only
corpus + pgvector engine is meant to be reusable "national MCP" tech.

That's the quiet ambition under the hobby: not just one country indexed, but a repeatable recipe
for making any country legible to AI. Start with the fragmented surface every government already
has, and add the layers until a machine can answer questions about the place with citations
instead of hallucinations.

The honest status

It's a hobby project. Nobody is using it heavily yet — there's a punch list a mile long
(normalize ingest URLs, expand the Supreme Court jurisprudence from 10 boletines toward the
thousands that exist, wire a nightly job that auto-generates multilingual questions and
translation candidates, add the fuel-price signal as a proof of the structured-data path). Some
features are deliberately deferred until there's a real user-reliance signal — the legal-status
flag, a terms of service, a personal-data filter.

But the core works: you can point an AI client at https://mcp.sv, ask about Salvadoran law or
the bitcoin framework or how to do a government procedure, and get back grounded, cited answers
from real institutions — in Spanish, English, or Russian.

A country you can query. That's the whole idea, and it's been one of the most satisfying things
I've built precisely because the country is the dataset. If you're thinking about doing this
for your country, or want to compare notes, find me on the links on the about page.