Methodology

How We Build Product Intelligence

Multi-Source Aggregation

upc.dev queries dozens of authoritative sources for every product. No single source is authoritative — we cross-reference to build a composite profile. When sources disagree, the conflict itself becomes a signal.

Data Pipeline

Ingestion: Bulk imports from Open Food Facts (4M+ products), USDA FoodData Central, and Web Data Commons. Real-time enrichment from APIs on first lookup.
Normalization: Barcode formats (UPC-A, EAN-13, GTIN-14) are normalized to a canonical form. Brand names, categories, and units are standardized across sources.
Caching: Every external API response is cached with a timestamp in our omniscience cache (DuckDB). This creates a temporal record of every data point.
Signal Generation: When a cached value changes — a price shifts, a recall is issued, ingredients are updated — we generate a signal. Signals are the atomic unit of product intelligence.
Risk Scoring: Signals from safety-critical sources (FDA, CPSC, SaferProducts.gov) feed into a composite risk score. The algorithm weights recency, severity, and source authority.

Source Quality Tiers

Tier 1 (Government): FDA, CPSC, USDA, EPA — highest authority, used for safety and regulatory signals
Tier 2 (Standards Bodies): GS1, HTS — authoritative for barcode registration and classification
Tier 3 (Open Data): Open Food Facts, Wikidata — community-verified, high coverage
Tier 4 (Commercial): UPCitemdb, marketplace pricing — useful but verify against higher tiers
Tier 5 (Community): Reddit, HN — consumer sentiment signals, not factual data

Freshness

Product pages serve cached data instantly. If the cache is older than 24 hours, a background re-enrichment runs automatically. Safety-critical data (recalls, enforcement actions) is refreshed more aggressively via scheduled checks.

Limitations

Coverage varies by category. Food products have the deepest data (Open Food Facts). Electronics and general merchandise have fewer sources.
Pricing data depends on marketplace API availability and may not reflect real-time prices.
Risk scores are algorithmic assessments, not safety certifications. Always verify with official sources for safety-critical decisions.

Last updated: April 2026