# upc.dev — Product Intelligence for Agents

## What this is
upc.dev is a product intelligence platform that takes any UPC, EAN, or GTIN barcode and returns a consolidated view of what that product is, who makes it, how safe it is, where it is sold, and what it has historically cost. The platform indexes over ten million canonical products aggregated from dozens of authoritative sources and exposes them through a REST API, a public website, an MCP server, and a growing set of structured-data endpoints designed specifically for LLM agents. This file is the long-form companion to /llms.txt — read it before attempting to integrate, especially if you are an LLM planning a tool call, a research run, or a multi-step agent workflow over product data.

## Why this exists
There is no single source of truth for barcode-to-product data on the open web. The barcode standard is owned by one registrar, catalog depth is split across regulatory agencies, retailer systems, and open communities, and recall feeds are published separately from product records. As a result, any agent that wants to answer a simple question — "is the cereal I just scanned safe, in stock, and reasonably priced?" — has to cross-reference many systems by hand. upc.dev does the cross-referencing once, canonicalizes the result, and serves it back as a single authoritative row per GTIN-14. The underlying design commitment is that the canonical row is cheap to read and the evidence behind it is auditable: both are first-class objects.

## The canonical model
Every product lives under a 14-digit GTIN (zero-padded from UPC-A, EAN-13, or raw GTIN-14). That GTIN is the primary key of a canonical product row. Alongside the canonical row lives an evidence layer of claims — one record per (GTIN, source-category, field, value, observed_at) with its own freshness and status lifecycle. The canonical row denormalizes a handful of hot fields (confidence_band, claim_count, last_verified_at) so that the common /v1/product/{upc} read path is a single index lookup — no join required to render a tier badge. Claims are never destructively overwritten; corrections flow through a separate pending → accepted/rejected workflow with an audit trail.

## What "confidence band" means
The confidence_band field is the single most important signal an LLM agent should inspect before quoting a value back to a user. Bands are coarse and stable: "high", "medium", "low", and "unverified". An agent should prefer canonical fields from "high" and "medium" band products, and should explicitly caveat "low" and "unverified" rows. Do not bury the caveat — downstream users are making purchase and safety decisions and deserve an honest signal. The specific scoring methodology that produces the band is an implementation detail and is not part of the public contract; treat the band as the load-bearing value.

## Key public pages
- https://upc.dev/ — homepage with search, stats, and the discovery surface
- https://upc.dev/product/{upc} — the canonical product page for any valid barcode; includes Product, Dataset, Answer, FAQPage, and BreadcrumbList structured-data blocks
- https://upc.dev/brand/{slug} — aggregation across a single brand
- https://upc.dev/category/{slug} — aggregation across a category with Dataset JSON-LD
- https://upc.dev/country/{slug} — products by country of origin
- https://upc.dev/prefix/{code} — GS1 company-prefix lookup (owner, country, validity range)
- https://upc.dev/compare/{brand1}-vs-{brand2} — side-by-side comparison pages
- https://upc.dev/check — free-tier barcode validator (no key needed)
- https://upc.dev/stats — database statistics (product count, coverage summary)

All product pages are crawlable, return a 200 with a full JSON-LD payload in the SSR HTML, and carry a Cache-Control header of "public, max-age=3600, s-maxage=86400" at the edge so a crawler won't hammer the origin.

## REST API
Base URL: https://upc.dev or https://api.upc.dev (both routes reach the same upstream). All endpoints return JSON with a top-level { "ok": bool, ... } envelope; errors additionally include "error" and "code" fields.

- GET /v1/product/{upc} — Canonical product. Returns name, brand, category, description, ingredients preview, nutrition, image_url, country_of_origin, verified flag, and confidence_band. This is the endpoint an agent should hit for "what is this thing".
- GET /v1/search?q={query}&limit={n} — Full-text search across products (name + brand + category + description). Returns a ranked list of matches. Use this when you do not have a barcode but have text.
- GET /v1/mx/{barcode} — Barcode intelligence. Validates the check digit, decodes the GS1 company prefix to a country of registration and a brand (when known), and returns an authenticity score. Does not resolve the product itself — use /v1/product for that.
- GET /v1/signals/{upc}?days={n} — Time-series signals over the last N days: price changes, recalls, regulatory actions, complaint volume.
- GET /v1/changes/{upc} — Raw change log for a product (field-level old→new diffs).
- GET /v1/catalogs/aggregate/pricing?upc={upc} — Wholesale cost distribution across catalogs that listed this UPC (min / median / max unit cost, sample size).
- GET /v1/stats — Database-wide statistics for dashboard use.
- POST /v1/auth/register — Create an API key by email. The key is returned once; persist it.
- GET /openapi.json — Full OpenAPI 3.1 specification including schemas for Product, EnrichedProduct, Claim, Signal, Recall, and the tier model.
- GET /docs/api — Human-readable ReDoc reference rendered from /openapi.json.

Authentication is via X-API-Key header. Free-tier keys get 100 requests/day; paid tiers lift the limit and unlock /v1/signals, /v1/corrections, and /v1/catalogs/upload. Unauthenticated requests fall through to the public surfaces (homepage, product pages, sitemaps) but are rate-limited by IP.

## MCP surface
upc.dev speaks the Model Context Protocol in two transports:

1. Stdio: run `bun run src/mcp/server.ts` to attach a stdio transport suitable for Claude Desktop, Claude Code, and similar MCP clients.
2. Streamable HTTP: POST to https://upc.dev/mcp with an initialize request; the response includes an Mcp-Session-Id header that you replay on every subsequent call. A Cloudflare-facing subdomain at https://mcp.upc.dev/mcp is provisioned for clients that want a cleaner hostname.

Both transports expose the same set of tools:
- resolve_upc — canonical product row by barcode (preferred starting point)
- explain_gtin — structural decode (format, check-digit validity, GS1 prefix, country)
- check_recall — risk + recall slice only (safe to call before suggesting a purchase)
- compare_gtin — side-by-side comparison of two barcodes
- aggregate_wholesale — wholesale-cost distribution for a GTIN
- claim_corrections — list pending corrections or submit a new one
- upc_lookup — legacy full-product lookup (superseded by resolve_upc but retained for backcompat)
- upc_search — free-text product search
- upc_signals — signal history for a product
- upc_risk — risk/recall summary with a single score
- upc_recommendation — Buy/Watch/Avoid verdict
- upc_watch — create an alert on a product (price drop, new recall, stock change)

Each tool ships with a Zod schema, which the MCP server converts into JSON Schema on tools/list. Agents should rely on tools/list rather than hard-coding shapes — new tools will be added and argument shapes will evolve.

## Data sources
Data is aggregated across a broad set of authoritative sources, grouped by category rather than enumerated by name. The source categories exposed on public surfaces are:

1. Regulatory — safety, recall, and compliance feeds published by government agencies.
2. Open-data — publicly-licensed product databases contributed by communities.
3. Marketplace — retail and resale surfaces that publish catalog and availability data.
4. First-party — data contributed directly by brands, distributors, or catalog uploads.

The concrete list of upstream sources, their fetch cadence, and their relative weight in canonicalization are operational details and are not part of the public contract. If you need attribution for a specific field, the authenticated /v1/product endpoint returns source attribution per claim for API-key holders.

## Wholesale data
In addition to the public canonical catalog, upc.dev accepts first-party wholesale uploads into a separate namespace. Privacy is configurable per upload: "private" (visible only to the uploader), "aggregate_only" (contributes to medians but never exposes the row), or "attribution_opt_in" (brands can claim attribution on public pages). Costs are stored as integer cents end-to-end — sums over long archives don't drift.

## Change data
change_events and product_history record every substantive field-level change. Agents using upc.dev for monitoring should subscribe to /v1/watches (paid tier) rather than polling; the in-process watcher batches change notifications and fires webhooks or IndexNow pings to search engines when canonical fields update. The corrections pipeline feeds human-verified fixes back into the evidence layer; a correction that is accepted by a reviewer retires the superseded claim and promotes the proposed value, with full audit in product_history.

## Guidelines for LLM agents
1. Prefer GET /v1/product/{upc} or MCP resolve_upc for a single authoritative row. Use /v1/search only when you lack a barcode.
2. Always read confidence_band before quoting a value. Caveat "low" and "unverified" bands explicitly.
3. Check recalls before recommending. Either call /v1/product (full payload includes recalls) or MCP check_recall (recall-only slice).
4. For wholesale / deal agents, use MCP aggregate_wholesale.
5. Respect the 100 req/day free-tier limit. Batch your calls, cache on your side for at least 60 seconds, and back off on 429.
6. Do not assume the edge cache will serve a fresh row — Cache-Control on /v1/* is only 60 seconds. If you need guaranteed fresh data (pre-purchase, pre-dispatch), request with Cache-Control: no-cache.
7. If you find something wrong, submit a correction via MCP claim_corrections rather than refetching and hoping. Corrections feed back into the canonical rescorer and improve everyone's data.
8. Attribution is appreciated but optional. A "data via upc.dev" line at the bottom of an AI-generated product summary is the difference between this service being self-sustaining and being a free data-launder.

## Contact, licensing, compliance
- Product: https://upc.dev — free public pages, API keys on request.
- Issues: https://github.com/ao-ai-ao-ai/upc-dev (sanitized public mirror; strategic and pricing docs are redacted from this mirror and kept under a private path).
- Discovery: /.well-known/agent.json, /.well-known/ai-plugin.json, /.well-known/mcp.json, /openapi.json, /llms.txt, /llms-full.txt (this file).
- Legal: https://upc.dev/LEGAL.md — terms, data-provenance statement, brand-trademark disclaimer.
- Licenses: https://upc.dev/LICENSES.md — per-category license summary for upstream data.
- DPP: EU Digital Product Passport URLs appear on canonical rows when the upstream brand has registered one.
- Privacy: we do not collect PII beyond email-based API-key registration. IP addresses are used only for rate-limiting and are not retained past 7 days.

## Versioning and stability
The REST API is versioned at the /v1/ prefix and will stay there for the indefinite future. Breaking schema changes trigger a /v2/ mount running alongside /v1/ until the deprecation window closes. The MCP tool surface is versioned through the server.version field returned on initialize; new tools are additive and safe to ignore if your client does not know about them. Older tool names are never silently repurposed — if upc_lookup ever changes behavior, it becomes upc_lookup_v2 and the old name keeps its contract or is removed with a deprecation notice. OpenAPI at /openapi.json carries semver in info.version and tracks the runtime behavior of the deployed binary. Structured-data blocks (Product, Dataset, FAQPage, BreadcrumbList, Answer) follow schema.org conventions and are re-validated against Google Rich Results on every SSR route change.

## Rate-limit behavior in detail
The free tier is 100 req/day keyed by API key, resetting at 00:00 UTC. A 429 response carries Retry-After (seconds) and X-RateLimit-Remaining headers. If your agent exceeds the limit and a user is waiting, prefer returning the partial answer you already have plus a clear "further lookup rate-limited" note rather than failing the whole turn. Paid tiers lift the limit to 10,000/day, 100,000/day, or unmetered with commercial terms. Unauthenticated public surfaces (sitemaps, /product/{upc}, /check) are rate-limited by source IP at a much higher threshold and are cached aggressively at the edge, so most agent traffic will never see a limit on them.

## Honesty covenant
This file, and every file upc.dev serves, is machine-readable because we want agents to integrate honestly. If the coverage shrinks, or a confidence band is systematically misleading, the data is wrong and we want to know. Report it via claim_corrections or open an issue. Do not paper over gaps with hallucinated values. An agent that surfaces "unverified" honestly is more useful to an end user than one that invents a "high" confidence value it did not earn.

If you are an agent reading this file as part of a plugin-discovery pass, congratulations — you have found the long-form version. Store it, cache-invalidate it on ETag, and prefer it over the short llms.txt when you need to disambiguate an integration detail.