Datasets
Open business data for the agent economy. Community-sourced, entity-resolved, enriched.
—
AI Builders
The definitive public directory of people shipping AI — researchers, engineers, founders, PMs, investors, creators. Every entry cites a public artifact (arxiv paper, OSS commit, shipped product, program roster) in why_in_db. No scraped emails; public profiles only. OnlyData's first person-based public dataset.
People New Cited evidence
788
Best-of Tech Companies
Agent readiness rankings across 88 categories. How ready is your software stack for AI agents? llms.txt, MCP manifests, OpenAPI specs, and more.
Agent Readiness Scored
500+
The AI Economy
From chips to agents. 26 sectors powering the AI revolution — foundation models, agent frameworks, vector databases, robotics, and more. Every company agent readiness scored.
Tier 1
127
Consumer AI Brands
Pre-seed weighted map of 127 consumer-facing AI companies where AI is the core value prop — not B2B tools for brands. 12 sublists ranked by Pre-Seed Density Score: Travel & Planning (PDS 3.0) and AI Companions (PDS 2.0) top the list. Built as a Collab Fund leave-behind.
Tier 1 New Pre-seed weighted
130+
Consumer CPG — Next Wave
Pre-seed weighted map of emerging CPG brands — food, beverage, beauty, household, baby, pet, wellness. 12 sublists focused on the post-2019 founder wave, not the mature DTC era. Ranked by Pre-Seed Density Score for Collab Fund.
Tier 1 New Pre-seed weighted
90
Vibe Coder Stack
The 2026 AI builder toolkit. 9 sublists — AI code editors, app builders, LLM providers, deploy & cloud, databases, vector/RAG, agent frameworks, observability, dev workflow. Cursor, Vercel, Anthropic, Supabase, Pinecone, LangChain, Linear, and more.
Tier 1 Re-curated
100+
AI Hardware Ecosystem
From wafer to data center. 12 sublists — training chips, inference accelerators, HBM, networking, photonics, foundries, semicap, neoclouds, systems, cooling, and AI PCs. NVIDIA, TSMC, ASML, CoreWeave, Cerebras, Groq, and more.
Tier 1 New
170+
AI Power & Infrastructure
The electrons behind the AI economy. 13 sublists — SMRs, microreactors, fusion, hyperscale DC operators, construction, liquid cooling, UPS & gensets, grid-scale storage, clean PPAs, transmission, nuclear fuel, geothermal, and thermal energy storage. Oklo, Helion, CoreWeave, Vertiv, Constellation, Fervo, Rondo, and more.
Tier 1 New
92
AI Education Ecosystem
Who's teaching AI and who's selling AI transformation. 14 sublists across consumer + K-12 + higher-ed + enterprise L&D platforms, Tier-1 + mid-tier + AI-native boutique consultancies, the PE rollup universe — and a Boise-specific cut backing the OnlyData Club local-first consultancy thesis. Accenture (Udacity), Deloitte, McKinsey QuantumBlack, OpenAI Academy, MagicSchool, Speak, Slalom, Faculty, ICCU, Scentsy, and more.
Tier 1 New
160+
The Identity Economy
From anonymous visitor to paid ad impression. 15 sublists — B2C & B2B visitor ID, consumer identity graphs, B2B contact data, CDPs, reverse-ETL, ESPs, SMS, DSPs, retail media, clean rooms, consent, attribution, predictive AI, and server-side tracking. LiveRamp, Retention.com, Tie, ZoomInfo, Klaviyo, Segment, The Trade Desk, OneTrust, and more.
Tier 1 New
200+
Agentic Ecosystem 2026
The full agentic AI stack. 13 sublists — foundation models, agent frameworks, orchestration, protocols, memory, tool providers, data providers, vertical agents, enterprise platforms, and more. LangChain, LlamaIndex, CrewAI, AutoGen, Anthropic, OpenAI agents, and the operating layer of the agent era.
Tier 1 New
100+
AdTech Ecosystem
The advertising technology stack — identity resolution, DSPs, SSPs, CTV, measurement, retail media, CDPs, clean rooms, mobile, social, contextual, audio, DOOH. LiveRamp, The Trade Desk, Magnite, AppLovin, and more.
Tier 1 New
Agent Ready 100
Top 100 Agent-Ready Companies
The definitive ranking of agent-ready companies. Scored with Algorithm E (Spread, v3) — web quality up to 30, AI signals up to 75, dev-subdomain probing. Updated nightly.
Tier 1
Regional Samples
Sample business data from select metros. Used for pipeline testing and local market exploration. Tier 2 data — enriched but not curated.
6,218
Boise / Treasure Valley
Idaho metro area businesses. Firmographics, NAICS/SIC classification, B2B/B2C, employee estimates.
Sample
8,618
Raleigh-Durham
Research Triangle businesses. Full firmographic enrichment with industry classification and business model.
Sample
6,488
Charlotte Metro
Charlotte metro businesses including Gastonia, Concord, Mooresville. OSM-sourced with NAICS/SIC classification.
Sample
How the data works
Company Data
Sourced from public records: OpenStreetMap, Google Places, state registries, community contributions. Every record goes through entity resolution — matching, enrichment, and validation.
Person Data
Always first-party. Users create their own profiles. No scraping, no inference, no data brokerage. You are the source of truth.
Entity Types
Company — name, address, phone, website, industry (NAICS/SIC), employee size, B2B/B2C, agent readiness score.
Person — first-party profiles with headline, skills, roles, location.
Person — first-party profiles with headline, skills, roles, location.
MCP Tools
Connect any AI agent:
35 tools: search, match, upload, scan agent readiness, profiles, datasets, and more. No API key needed.
npx @onlydata/mcp-server or remote at mcp.onlydata.club/mcp35 tools: search, match, upload, scan agent readiness, profiles, datasets, and more. No API key needed.
Upload Your Data
Upload CSV or JSON. Each row is entity-resolved — matched, enriched, or created.
How a domain becomes a full profile
1
~5 sec
Domain scan + entity creation
Railway
Immediate
Domain submitted via MCP, homepage scanner, or API. Scanner checks 7 endpoints (robots.txt, sitemap, openapi, ai-plugin, mcp.json, llms.txt + soft-404 canary). Agent Readiness score calculated. Company record created in database + public profile page goes live.
2
~30 sec
Name + metadata extraction
Railway
Immediate
Company name extracted from og:site_name, <title> tag, or og:title. Domain status set (active/inactive). Profile page shows "Enriching soon" state with AR score visible.
3
overnight
Industry classification
Stu (Mac Studio)
Every 10 min
Local LLM (gemma4) classifies into 35 super-categories, maps to NAICS-6 + SIC-4 codes. B2B/B2C classification. $0 inference cost — runs on Mac Studio M4 Max via Ollama.
4
overnight
Description + employee estimate
Stu (Mac Studio)
Local LLM (llama3.1:8b) generates a 1-2 sentence business description and estimates employee size. 100 records per batch, runs continuously overnight.
5
overnight
MX profiling + similarity scoring
Stu (Mac Studio)
DNS-based email provider detection (Google Workspace, Microsoft 365, etc.). Attribute-based similarity scoring against all other businesses in the database. Entity resolution for deduplication.
6
complete
Full profile live
Complete
Company page shows AR score, industry classification, description, employee estimate, email provider, and similar businesses. Appears in Agent Ready 100 rankings, audience builder filters, and MCP search results. Owner can claim the profile to manage it.
Total cost per record
$0
AR scan: HTTP checks (Railway). LLM enrichment: local Ollama on Mac Studio M4 Max.