Personal grocery spending intelligence — built from your actual Publix receipts.
📍 Gandy Commons · Tampa, FL 33611 · Store #1722
93Receipts parsed
577Unique items
$6,119Retail value
$4,509Total paid
$1,610Total saved
26.3%Savings rate
Nov '24 – Feb '26Date range

Reports & Outputs

What's Been Built

📬 Receipt Parser

Reads raw Publix email receipts from a Gmail mbox export and extracts structured item-level data. Handles two distinct receipt formats (Club Publix / legacy Presto), BOGO attribution, digital coupons, voided items, and deduplication.

data/parse_receipts.py

📈 Spending Dashboard

React/Recharts dashboard showing monthly spending vs. savings, savings rate trend line, store breakdown, and deal type distribution (BOGO, sale, multi-buy, coupons).

dashboard.jsx

🕷️ Weekly Ad Scraper

Playwright-based scraper that loads publix.com in a real browser (bypassing Cloudflare), scrolls to lazy-load all deal cards, and extracts structured deal data. No public API exists — DOM scraping is the only reliable path.

scraper/scrape_weekly_ad.py

🔍 Deal Matcher

Fuzzy-matches your purchase history staples (3+ purchases) against this week's deals. Expands receipt abbreviations (COMM COFF AMER CLS → "community coffee") using a curated expansion dict + rapidfuzz token scoring.

scraper/match_deals.py

🚨 Alert Generator

Converts matched deals into a styled HTML alert showing which of your staples are on sale — BOGO badges, savings amounts, purchase frequency bars, and a quality filter to suppress false positives.

scraper/generate_alert.py

⏰ Scheduled Task

Runs every Thursday at 7am when the Publix weekly ad flips — scrapes current deals, runs the matcher, and drops a fresh alert HTML into this folder automatically.

Scheduled · Thursdays 7am

Key Technical Decisions

DecisionChoiceWhy
Data source Gmail mbox export Publix emails receipts automatically to Club Publix members. mbox gives 14+ months of clean history with no manual entry.
Publix "API" Browser DOM scraping (Playwright) No public API exists. Internal endpoint (services.publix.com/storeproductssavings) is Cloudflare-protected. A real browser bypasses this cleanly. Kroger is the only major chain with a genuine public API — and they have no Florida stores.
Fuzzy matching Token expansion + rapidfuzz Receipt names are truncated to ~18 chars (COMM COFF AMER CLS). An abbreviation expansion dict + token set ratio bridges the gap to full product names on the weekly ad.
BOGO accounting Two line items per pair Receipts print both the paid and free item at full retail price. The parser attributes savings to the free unit, so Total Paid = Retail − Saved is accurate. Community Coffee: $220 retail, $110 paid, $110 saved.
Deduplication (date, store, total) key Publix sent 3 identical emails for one void transaction. Keying on date + store + total collapses duplicates without dropping legitimate same-day trips.
Product direction Analysis layer, not alerts Flipp, iHeartPublix, and Publix's own Club Publix app already do deal alerts. Item-level spending history — paid vs. retail, price inflation over time, savings rate per product — is genuinely novel and not available anywhere else.

Product Thinking

What's differentiated
The item-level spending history is the moat. Publix's Club Publix shows "Picked for you" deals but doesn't tell you that you've spent $110 cash on Community Coffee over 14 months, or that the retail price rose 29% since your first purchase. That's personal financial intelligence, not a coupon list.
Built
Price inflation tracker
14 months of item prices are in receipts.json. Community Coffee: $10.49 → $13.59 (+29%). A "personal grocery CPI" chart — your actual basket getting more expensive over time — is something no existing app produces, and it's politically resonant right now.
Next up
Spend velocity
You buy Community Coffee every ~18 days based on purchase dates. Knowing your repurchase cadence turns the alert from "it's on BOGO this week" into "you're going to run out in 5 days and it's BOGO right now." A meaningfully better signal.
Next up
Items never on sale
Which regular purchases have you consistently paid full price for, even though they go BOGO periodically? Pure found money — just requires a behavior change, not extra spending. Computable once the scraper has accumulated a few months of weekly ad history.
Next up
Deal capture rate
Of all the weeks Community Coffee was BOGO, what % did you actually shop that week? "You missed 3 BOGOs last year — that's $93 left on the table." Requires several months of accumulated weekly ad data from the Thursday scraper.
Later
Acquisition problem
As a consumer app: exporting a Gmail mbox and running a parser is high friction. The people willing to do it already use Flipp. The people who'd benefit most from the spending analysis don't think about grocery deals at all.
Open Q
The B2B angle
Item-level receipt data is valuable to CPG brands. 18 repeat Community Coffee purchases from one Tampa shopper driven by BOGO — that's a loyalty signal brands pay for. But that's a different company entirely, and requires scale.
Open Q
Digital coupon finder
Investigated: Publix digital coupons are behind Club Publix login — the page renders skeleton cards until authenticated. Historical analysis from receipts shows 9 coupon uses in 14 months ($15.24 saved): DC Pillsbury (3×), DC Stur (2×), DC Ben & Jerry's, DC Jimmy Dean, DC Bob Evans, DC Publix. That's <1% of total savings — BOGOs are where the real money is. scraper/scrape_coupons.py is written and ready; run it with a logged-in Playwright browser to match available coupons to your staples using the same fuzzy logic as the weekly ad matcher.
Next up

Immediate Next Steps

1
Price inflation chart
Plot price-over-time for the top 20 staples using existing receipt data. Show a personal grocery CPI index across the 14-month date range. All the data is already there — just needs the chart.
2
Spend velocity per item
Calculate average days between purchases for each staple. Combine with the weekly deal scraper to flag "buy now" vs. "can wait" on a per-item basis in the alert.
3
Items always paid full price
Cross-reference purchase history against the weekly ad data accumulating each Thursday. Surface items where you're consistently leaving savings on the table.
4
Keep receipts current
Re-run parse_receipts.py on a fresh Gmail mbox export periodically. The scraper handles the deal side automatically on Thursdays — the receipt side still needs a manual mbox refresh every few months.
?
Is there a product here beyond personal use?
The analysis layer is differentiated. The acquisition path for a consumer app is hard. Worth deciding whether the angle is personal finance (Mint for groceries), B2B data, or purely a personal tool before building further.

File Map

FileDescription
data/parse_receipts.pyCore receipt parser — reads mbox, outputs receipts.json
data/receipts.json93 parsed receipts with item-level detail
data/publix.mboxSource email archive from Gmail export
dashboard.jsxReact/Recharts spending & savings dashboard
items_report.htmlAll-items purchase history table (sortable, searchable)
alert_2026-03-05.htmlWeekly deal alert — week of March 5, 2026
scraper/scrape_weekly_ad.pyPlaywright scraper for publix.com weekly ad
scraper/match_deals.pyFuzzy matcher — staples vs. weekly deals
scraper/generate_alert.pyHTML alert generator from match results
scraper/scrape_coupons.pyPlaywright coupon scraper — requires Club Publix login; matches available digital coupons to your staples
scraper/weekly_deals.jsonThis week's scraped deals (refreshed Thursdays)
scraper/matched_deals.jsonMatch results from last scraper run
docs/research/receipt-format.mdReceipt format documentation & parser edge case notes
index.htmlThis page