Publix Saver — Project Overview

Reports & Outputs

What's Been Built

📬 Receipt Parser

Reads raw Publix email receipts from a Gmail mbox export and extracts structured item-level data. Handles two distinct receipt formats (Club Publix / legacy Presto), BOGO attribution, digital coupons, voided items, and deduplication.

data/parse_receipts.py

📈 Spending Dashboard

React/Recharts dashboard showing monthly spending vs. savings, savings rate trend line, store breakdown, and deal type distribution (BOGO, sale, multi-buy, coupons).

dashboard.jsx

🕷️ Weekly Ad Scraper

Playwright-based scraper that loads publix.com in a real browser (bypassing Cloudflare), scrolls to lazy-load all deal cards, and extracts structured deal data. No public API exists — DOM scraping is the only reliable path.

scraper/scrape_weekly_ad.py

🔍 Deal Matcher

Fuzzy-matches your purchase history staples (3+ purchases) against this week's deals. Expands receipt abbreviations (COMM COFF AMER CLS → "community coffee") using a curated expansion dict + rapidfuzz token scoring.

scraper/match_deals.py

🚨 Alert Generator

Converts matched deals into a styled HTML alert showing which of your staples are on sale — BOGO badges, savings amounts, purchase frequency bars, and a quality filter to suppress false positives.

scraper/generate_alert.py

⏰ Scheduled Task

Runs every Thursday at 7am when the Publix weekly ad flips — scrapes current deals, runs the matcher, and drops a fresh alert HTML into this folder automatically.

Scheduled · Thursdays 7am

Key Technical Decisions

Decision	Choice	Why
Data source	Gmail mbox export	Publix emails receipts automatically to Club Publix members. mbox gives 14+ months of clean history with no manual entry.
Publix "API"	Browser DOM scraping (Playwright)	No public API exists. Internal endpoint (services.publix.com/storeproductssavings) is Cloudflare-protected. A real browser bypasses this cleanly. Kroger is the only major chain with a genuine public API — and they have no Florida stores.
Fuzzy matching	Token expansion + rapidfuzz	Receipt names are truncated to ~18 chars (COMM COFF AMER CLS). An abbreviation expansion dict + token set ratio bridges the gap to full product names on the weekly ad.
BOGO accounting	Two line items per pair	Receipts print both the paid and free item at full retail price. The parser attributes savings to the free unit, so Total Paid = Retail − Saved is accurate. Community Coffee: $220 retail, $110 paid, $110 saved.
Deduplication	(date, store, total) key	Publix sent 3 identical emails for one void transaction. Keying on date + store + total collapses duplicates without dropping legitimate same-day trips.
Product direction	Analysis layer, not alerts	Flipp, iHeartPublix, and Publix's own Club Publix app already do deal alerts. Item-level spending history — paid vs. retail, price inflation over time, savings rate per product — is genuinely novel and not available anywhere else.

Product Thinking

What's differentiated

The item-level spending history is the moat. Publix's Club Publix shows "Picked for you" deals but doesn't tell you that you've spent $110 cash on Community Coffee over 14 months, or that the retail price rose 29% since your first purchase. That's personal financial intelligence, not a coupon list.

Built

Price inflation tracker

14 months of item prices are in receipts.json. Community Coffee: $10.49 → $13.59 (+29%). A "personal grocery CPI" chart — your actual basket getting more expensive over time — is something no existing app produces, and it's politically resonant right now.

Next up

Spend velocity

You buy Community Coffee every ~18 days based on purchase dates. Knowing your repurchase cadence turns the alert from "it's on BOGO this week" into "you're going to run out in 5 days and it's BOGO right now." A meaningfully better signal.

Next up

Items never on sale

Which regular purchases have you consistently paid full price for, even though they go BOGO periodically? Pure found money — just requires a behavior change, not extra spending. Computable once the scraper has accumulated a few months of weekly ad history.

Next up

Deal capture rate

Of all the weeks Community Coffee was BOGO, what % did you actually shop that week? "You missed 3 BOGOs last year — that's $93 left on the table." Requires several months of accumulated weekly ad data from the Thursday scraper.

Later

Acquisition problem

As a consumer app: exporting a Gmail mbox and running a parser is high friction. The people willing to do it already use Flipp. The people who'd benefit most from the spending analysis don't think about grocery deals at all.

Open Q

The B2B angle

Item-level receipt data is valuable to CPG brands. 18 repeat Community Coffee purchases from one Tampa shopper driven by BOGO — that's a loyalty signal brands pay for. But that's a different company entirely, and requires scale.

Open Q

Digital coupon finder

Investigated: Publix digital coupons are behind Club Publix login — the page renders skeleton cards until authenticated. Historical analysis from receipts shows 9 coupon uses in 14 months ($15.24 saved): DC Pillsbury (3×), DC Stur (2×), DC Ben & Jerry's, DC Jimmy Dean, DC Bob Evans, DC Publix. That's <1% of total savings — BOGOs are where the real money is. scraper/scrape_coupons.py is written and ready; run it with a logged-in Playwright browser to match available coupons to your staples using the same fuzzy logic as the weekly ad matcher.

Next up

Immediate Next Steps

Price inflation chart

Plot price-over-time for the top 20 staples using existing receipt data. Show a personal grocery CPI index across the 14-month date range. All the data is already there — just needs the chart.

Spend velocity per item

Calculate average days between purchases for each staple. Combine with the weekly deal scraper to flag "buy now" vs. "can wait" on a per-item basis in the alert.

Items always paid full price

Cross-reference purchase history against the weekly ad data accumulating each Thursday. Surface items where you're consistently leaving savings on the table.

Keep receipts current

Re-run parse_receipts.py on a fresh Gmail mbox export periodically. The scraper handles the deal side automatically on Thursdays — the receipt side still needs a manual mbox refresh every few months.

Is there a product here beyond personal use?

The analysis layer is differentiated. The acquisition path for a consumer app is hard. Worth deciding whether the angle is personal finance (Mint for groceries), B2B data, or purely a personal tool before building further.

File Map

File	Description
data/parse_receipts.py	Core receipt parser — reads mbox, outputs receipts.json
data/receipts.json	93 parsed receipts with item-level detail
data/publix.mbox	Source email archive from Gmail export
dashboard.jsx	React/Recharts spending & savings dashboard
items_report.html	All-items purchase history table (sortable, searchable)
alert_2026-03-05.html	Weekly deal alert — week of March 5, 2026
scraper/scrape_weekly_ad.py	Playwright scraper for publix.com weekly ad
scraper/match_deals.py	Fuzzy matcher — staples vs. weekly deals
scraper/generate_alert.py	HTML alert generator from match results
scraper/scrape_coupons.py	Playwright coupon scraper — requires Club Publix login; matches available digital coupons to your staples
scraper/weekly_deals.json	This week's scraped deals (refreshed Thursdays)
scraper/matched_deals.json	Match results from last scraper run
docs/research/receipt-format.md	Receipt format documentation & parser edge case notes
index.html	This page

Reports & Outputs

What's Been Built

📬 Receipt Parser

📈 Spending Dashboard

🕷️ Weekly Ad Scraper

🔍 Deal Matcher

🚨 Alert Generator

⏰ Scheduled Task

Key Technical Decisions

Product Thinking

Immediate Next Steps

File Map

🎫 Digital Coupon Finder