Dogfood

We used our own MCP to find a 155-company gap in our public data.

We shipped the OnlyData MCP server, then pointed it at our own catalog against a private curated list of 199 agentic AI companies. ~155 of them were completely missing. Here's what happened next.

Cam Fortin · April 2026

Yesterday we shipped two things in the same session: the OnlyData MCP connector on Claude.ai (so you can query our entire catalog in any conversation), and the artifacts feature (so anything you build with that data gets its own permanent home on your OnlyData profile). The two halves close a loop that has been missing from every data platform I've used: upload sloppy data, have it normalized and matched, reason about it with Claude, then publish the output — all in one place.

Within an hour of the MCP going live, I pointed it at our own data to test it. What I found is embarrassing and fantastic at the same time: we had a massive blind spot in the most commercially interesting category in AI right now, and we only knew because the tool we'd just built let us see it.

The setup

I had a private custom dataset on my OnlyData profile called Agentic Ecosystem 2026. 199 companies, 13 sublists, with richer metadata than our standard schema — funding stage, primary differentiator, curated one-line descriptions. I'd been building it for weeks as a reference, but it was stuck in my private uploads. Nobody else could see it, no agent could query it, and it wasn't feeding into our public datasets.

We already have The AI Economy — our curated public dataset of ~265 companies across 21 sectors, from chips to foundation models to enterprise SaaS. It's the flagship list. When I started thinking about making the Agentic Ecosystem public, my first question was:

Is this even net-new? Or does The AI Economy already cover most of these companies?

If the overlap was 80%, promoting Agentic to its own dataset would be redundant. If the overlap was 20%, it'd be a flagship in its own right. I didn't know. And nobody had a way to answer that question quickly — until we built the MCP.

The test: one prompt, full round-trip

In a fresh Claude.ai conversation with the OnlyData connector attached, I asked:

Call query_custom_dataset for my Agentic Ecosystem 2026 and retrieve all 199 rows. Then call browse_dataset for ai-economy. Compare them by domain and tell me (1) how many are already in AI Economy, (2) how many are net-new, (3) the categories with the least overlap, and (4) the 10 most interesting net-new companies.

That's it. Four MCP tool calls — query_custom_dataset, browse_dataset, then a second browse_dataset call to paginate past 100 results, plus whatever joins Claude did internally — and about 90 seconds of reasoning. It came back with a full diff and a rendered visualization I could share. Here's what it produced:

Artifact · Agentic Ecosystem 2026 vs AI Economy · Overlap analysis across 19 categories

total agentic

199

in AI Economy*

confirmed net-new

~155

zero-overlap categories

*Matched 99 of 265 AI Economy companies returned by browse_dataset — est. ~30–35 total matches if full set were returned

Overlap rate by Agentic category

some overlap zero overlap — biggest gap

The finding

Eleven of the nineteen categories in my private list had zero overlap with the AI Economy — not low overlap, not partial overlap, none at all. Entire subcategories that every credible analyst writes about in 2026 did not exist anywhere in our public data:

Agent Memory & Context (10 companies) — vector DBs and purpose-built memory layers: Mem0, Zep, Letta, Pinecone, Qdrant, Weaviate, Milvus, Chroma, LanceDB, Cognee. Missing.
Agent Security & Guardrails (10 companies) — Guardrails AI, Virtue AI, Aembit, Wiz AI-SPM, WhyLabs, CrowdStrike AIDR, Palo Alto AIRS. One of the hottest categories of 2026. Missing.
Agent Infrastructure (10 companies) — E2B, Daytona, Modal, Together AI, Fireworks AI, Replicate, Railway, Vercel, Northflank, SiliconFlow. The where agents actually run layer. Missing.
Agent-Native CRMs & Workflows (10 companies) — Zapier, Make, n8n, Retool, Monday, Databricks, Postman, reframed through an agent-first lens. Missing.
Vertical Agents — Legal (6 companies) — Harvey ($11B), Legora ($8B), Darrow, Paxton, Garfield AI, Lawgeex. The entire legal AI vertical. Missing.

The categories where we did have overlap only averaged 10–29%. The AI Economy skews toward the model and infrastructure layer — chips, data centers, foundation labs, GPU clouds. Agentic 2026 fills the application and tooling layer above it, which is where most of the commercial action in 2026 actually lives. We had a hole where the entire agentic stack was supposed to be.

Ten specific companies the analysis flagged as the "biggest plays" net-new to OnlyData:

Featured net-new

Harvey ($11B legal AI, 25K+ custom agents) · Sierra ($10B Benioff-backed enterprise CX) · Legora ($8B Series F legal) · E2B (AWS Lambda for agent code execution) · Mem0 (AWS's exclusive agent memory partner) · Aembit (Zero Trust identity for non-human agents) · Browserbase (managed cloud browsers for agents) · Skyvern (vision-based browser automation, 85.85% WebVoyager) · Hippocratic AI (150M+ clinical interactions) · SkillsMP (425K+ indexed agent skill definitions — the npm registry for agent capabilities).

The action: write the promotion flow, run it once

Once the gap was measurable, the fix was mechanical. We wrote a small promotion script — scripts/promote-agentic-ecosystem.py — that reads the private custom dataset via Supabase, normalizes each row into our canonical taxonomy, writes a checked-in CSV, and upserts every company into our public od_businesses table:

Matches existing rows by domain first. Companies already in the catalog get patched with the curated super_category, source='agentic_ecosystem_curated', and data_tier='tier_1' — they're promoted, not duplicated.
Inserts net-new companies with the full curated payload (name, domain, description, city/state, category).
Queues the full enrichment pipeline for every row: agent readiness scoring, B2B classification, industry tagging, AI-native detection, agent role, and employee estimation. The same models that score every other business in OnlyData run against these too, so the Agentic landing page surfaces live AR scores within an hour of the ingest.

Idempotent by design — safe to re-run. The CSV gets committed to the repo as the canonical source of truth going forward, so future updates flow through code review.

Before and after

Here's what our public data looked like before the promotion ran, and what it looks like after:

Before

~22

agent-adjacent companies in public catalog

5 overlapping sub-categories inside AI Economy (frameworks, MCP infra, dev tools, workflow automation, AI-native SaaS)
Zero coverage of memory, security, agent infrastructure, marketplaces, or any vertical agent category
199 curated agentic companies sitting in one private custom dataset, visible only to me
Richer metadata (funding stage, differentiator) trapped in schemaless JSON

After

199+

companies in the public Agentic Ecosystem 2026 dataset

19 sublists covering the entire agentic stack — 11 of which didn't exist in any other OnlyData dataset before
Every company flagged data_tier=tier_1 and queued for the full enrichment pipeline
Live landing page at onlydata.club/datasets/agentic-ecosystem with stats, sublist tiles, company cards, and a "Biggest Plays" featured strip
Queryable via browse_dataset on the MCP — anyone with the OnlyData connector installed can now pull the full agentic map into any Claude conversation

~155

Net-new

Sublists

Zero-overlap

Promotion script

Why this is the real pitch for OnlyData Club

There's a reason I'm writing this as a blog post instead of a tweet. This is the full loop of what OnlyData Club is supposed to be, and this is the first time I've watched it run end-to-end on our own data:

Upload sloppy data — a CSV of 199 agentic AI companies with different column conventions than our main schema. Uploaded through the same custom dataset tool any user has.
Have it normalized, deduped, and matched — the promotion script maps 19 raw category names to canonical sublists, matches against od_businesses by domain, and decides for each company whether to patch or insert.
Analyze it with Claude — four MCP tool calls and 90 seconds of reasoning surfaced a structural gap in our public data that would have taken an analyst half a day to find by hand.
Publish the result as a shareable artifact — the overlap chart you saw above is a real artifact saved to my OnlyData profile. It has its own URL, its own OG preview, its own copy-link and LinkedIn share buttons. Anyone can see it; nothing lives inside a chat that evaporates when I close the window.

That sequence — upload → normalize → analyze → publish — is the thing every data platform has been trying to build separately for twenty years. The MCP is what stitches the analysis step to the data, and the artifacts surface is what stitches the analysis step to the durable output. OnlyData Club now has both. In one place. On your profile.

We didn't set out to build this by eating our own dogfood. We built the MCP for users, and the artifacts feature for users, and then accidentally used them to fix our own data. If it works this well on us, it will work this well on you.

See the Agentic Ecosystem 2026 live

199+ companies across 19 sublists — foundation models, frameworks, orchestration, protocols, memory, security, infrastructure, tools, data, marketplaces, agent-native CRMs, and vertical agents (legal, healthcare, sales, coding, HR, finance, marketing). Every company scored for agent readiness. First-party curated. Free to query.

Explore the dataset → Install the MCP