Reads raw Publix email receipts from a Gmail mbox export and extracts structured item-level data. Handles two distinct receipt formats (Club Publix / legacy Presto), BOGO attribution, digital coupons, voided items, and deduplication.
data/parse_receipts.pyReact/Recharts dashboard showing monthly spending vs. savings, savings rate trend line, store breakdown, and deal type distribution (BOGO, sale, multi-buy, coupons).
dashboard.jsxPlaywright-based scraper that loads publix.com in a real browser (bypassing Cloudflare), scrolls to lazy-load all deal cards, and extracts structured deal data. No public API exists — DOM scraping is the only reliable path.
scraper/scrape_weekly_ad.pyFuzzy-matches your purchase history staples (3+ purchases) against this week's deals. Expands receipt abbreviations (COMM COFF AMER CLS → "community coffee") using a curated expansion dict + rapidfuzz token scoring.
scraper/match_deals.pyConverts matched deals into a styled HTML alert showing which of your staples are on sale — BOGO badges, savings amounts, purchase frequency bars, and a quality filter to suppress false positives.
scraper/generate_alert.pyRuns every Thursday at 7am when the Publix weekly ad flips — scrapes current deals, runs the matcher, and drops a fresh alert HTML into this folder automatically.
Scheduled · Thursdays 7am| Decision | Choice | Why |
|---|---|---|
| Data source | Gmail mbox export | Publix emails receipts automatically to Club Publix members. mbox gives 14+ months of clean history with no manual entry. |
| Publix "API" | Browser DOM scraping (Playwright) | No public API exists. Internal endpoint (services.publix.com/storeproductssavings) is Cloudflare-protected. A real browser bypasses this cleanly. Kroger is the only major chain with a genuine public API — and they have no Florida stores. |
| Fuzzy matching | Token expansion + rapidfuzz | Receipt names are truncated to ~18 chars (COMM COFF AMER CLS). An abbreviation expansion dict + token set ratio bridges the gap to full product names on the weekly ad. |
| BOGO accounting | Two line items per pair | Receipts print both the paid and free item at full retail price. The parser attributes savings to the free unit, so Total Paid = Retail − Saved is accurate. Community Coffee: $220 retail, $110 paid, $110 saved. |
| Deduplication | (date, store, total) key | Publix sent 3 identical emails for one void transaction. Keying on date + store + total collapses duplicates without dropping legitimate same-day trips. |
| Product direction | Analysis layer, not alerts | Flipp, iHeartPublix, and Publix's own Club Publix app already do deal alerts. Item-level spending history — paid vs. retail, price inflation over time, savings rate per product — is genuinely novel and not available anywhere else. |
scraper/scrape_coupons.py is written and ready; run it with a logged-in Playwright browser to match available coupons to your staples using the same fuzzy logic as the weekly ad matcher.
| File | Description |
|---|---|
| data/parse_receipts.py | Core receipt parser — reads mbox, outputs receipts.json |
| data/receipts.json | 93 parsed receipts with item-level detail |
| data/publix.mbox | Source email archive from Gmail export |
| dashboard.jsx | React/Recharts spending & savings dashboard |
| items_report.html | All-items purchase history table (sortable, searchable) |
| alert_2026-03-05.html | Weekly deal alert — week of March 5, 2026 |
| scraper/scrape_weekly_ad.py | Playwright scraper for publix.com weekly ad |
| scraper/match_deals.py | Fuzzy matcher — staples vs. weekly deals |
| scraper/generate_alert.py | HTML alert generator from match results |
| scraper/scrape_coupons.py | Playwright coupon scraper — requires Club Publix login; matches available digital coupons to your staples |
| scraper/weekly_deals.json | This week's scraped deals (refreshed Thursdays) |
| scraper/matched_deals.json | Match results from last scraper run |
| docs/research/receipt-format.md | Receipt format documentation & parser edge case notes |
| index.html | This page |