Methodology
How We Build Product Intelligence
Multi-Source Aggregation
upc.dev queries 73 independent data sources for every product. No single source is authoritative — we cross-reference to build a composite profile. When sources disagree, the conflict itself becomes a signal.
Data Pipeline
- Ingestion: Bulk imports from Open Food Facts (4M+ products), USDA FoodData Central, and Web Data Commons. Real-time enrichment from APIs on first lookup.
- Normalization: Barcode formats (UPC-A, EAN-13, GTIN-14) are normalized to a canonical form. Brand names, categories, and units are standardized across sources.
- Caching: Every external API response is cached with a timestamp in our omniscience cache (DuckDB). This creates a temporal record of every data point.
- Signal Generation: When a cached value changes — a price shifts, a recall is issued, ingredients are updated — we generate a signal. Signals are the atomic unit of product intelligence.
- Risk Scoring: Signals from safety-critical sources (FDA, CPSC, SaferProducts.gov) feed into a composite risk score. The algorithm weights recency, severity, and source authority.
Source Quality Tiers
- Tier 1 (Government): FDA, CPSC, USDA, EPA — highest authority, used for safety and regulatory signals
- Tier 2 (Standards Bodies): GS1, HTS — authoritative for barcode registration and classification
- Tier 3 (Open Data): Open Food Facts, Wikidata — community-verified, high coverage
- Tier 4 (Commercial): UPCitemdb, marketplace pricing — useful but verify against higher tiers
- Tier 5 (Community): Reddit, HN — consumer sentiment signals, not factual data
Freshness
Product pages serve cached data instantly. If the cache is older than 24 hours, a background re-enrichment runs automatically. Safety-critical data (recalls, enforcement actions) is refreshed more aggressively via scheduled checks.
Limitations
- Coverage varies by category. Food products have the deepest data (Open Food Facts). Electronics and general merchandise have fewer sources.
- Pricing data depends on marketplace API availability and may not reflect real-time prices.
- Risk scores are algorithmic assessments, not safety certifications. Always verify with official sources for safety-critical decisions.
Last updated: April 2026