→ Live Report · → R3 vs R4 Comparison · → Dashboard
Autonomous Marketing Mix Modeling powered by Claude. Point an AI agent at program.md and it runs three independent MMM models, critiques its own analysis through a five-agent loop, iterates on model configuration, and produces a stakeholder report — without human involvement.
The three-model approach is deliberate: Ridge (fast, regularised), PyMC (Bayesian), and LightweightMMM (positive-constrained) each make different assumptions. Where all three agree, you can act with confidence. Where they disagree, that’s a diagnostic — thin data, collinearity, or a modelling assumption worth questioning. No single model can tell you this.
The five specialist agents keep roles separated so no single agent can both produce and approve its own output:
| Agent | Role | Output |
|---|---|---|
| Data Explorer | EDA before training — collinearity, anomalies, VIF, readiness score | rounds/R01_data_exploration.md |
| Tuner | Proposes one config change per round based on prior fit metrics | Updated config.json |
| Analyst | Interprets ROI, contributions, and model agreement into a business narrative | rounds/R{N}_analysis.md |
| Critic | Runs 6 quality checks — overfitting, sign correctness, plausibility, consensus honesty, collinearity, sample size | APPROVED or REVISE |
| Reporter | Rewrites the approved analysis in plain English for a CMO audience, generates deck | report.md + report.pptx |
| Proofreader | Final check — number accuracy, uncertainty language, jargon, consistency, omissions. Edits report directly if needed | PROOFREAD_CLEAN or PROOFREAD_CORRECTED |
Notion knowledge layer (field definitions, business context, per-channel benchmarks)
↑↓ every round (discover.py --no-overwrite-config)
metadata.json config.json ← written once by discover.py, then owned by Tuner
↓ ↓
ORCHESTRATOR (program.md) ← reads state.json, coordinates all agents
│
├── [every round] discover.py
│ Pulls latest Notion knowledge into metadata.json.
│ Does NOT overwrite config.json (Tuner owns that).
│
├── DATA EXPLORER (agents/data_explorer.md) ← Round 1 only
│ EDA on raw dataset: overview, KPI distribution, channel spend,
│ collinearity (VIF), anomalies, multi-entity check, readiness score.
│ Returns: EXPLORATION_DONE
│
├── TUNER (agents/tuner.md) ← Round 2+ only
│ Reads prior round fit metrics + metadata.json (Notion knowledge).
│ Proposes one config change per round — adstock decay, Hill slope,
│ or PyMC samples. Edits config.json directly.
│ Returns: CONFIG_UPDATED or NO_CHANGE
│
├── run_models.py
│ Runs all 3 MMM models in sequence:
│ • Ridge — regularised regression + 200-sample bootstrap CI
│ • PyMC — full Bayesian with DelayedSaturatedMMM
│ • LightMMM — Google JAX-based Hill + adstock
│ Fallback to scipy NNLS only if JAX / pymc-marketing not installed.
│ Saves results/latest.json + rounds/R{N}_results.json
│
├── ANALYST (agents/analyst.md)
│ Reads model output + EDA report + metadata.json (business context).
│ Covers ROI rankings, model agreement/disagreement, contribution %,
│ and what the data can and cannot support.
│ Returns: ANALYSIS_DONE
│
├── CRITIC (agents/critic.md)
│ Six-point quality gate — challenges the Analyst before anything
│ reaches the report:
│ 1. Overfitting (R²=1.0 on small samples)
│ 2. Sign correctness (negative ROI despite confirmed spend)
│ 3. Contribution plausibility (<5% or >80% attributed to media)
│ 4. Consensus honesty (did Analyst ignore model disagreements?)
│ 5. Collinearity (channels that co-moved, confusing attribution)
│ 6. Sample size caveat (limitation clearly communicated?)
│ On REVISE: Analyst fixes once, Critic re-reviews. Max one cycle.
│ Returns: APPROVED or REVISE: <reason>
│
├── REPORTER (agents/reporter.md)
│ Only runs after APPROVED. Rewrites findings in plain English —
│ no jargon, no model names in the headline, no unexplained CIs.
│ Audience: marketing director or CMO.
│ Runs report_builder.py → report.md + report.pptx
│ Returns: REPORT_DONE
│
└── PROOFREADER (agents/proofreader.md)
Final gate before delivery. Checks number accuracy, uncertainty
language, jargon, consistency, and omissions against the raw
results CSVs. Edits report.md directly if corrections needed.
Re-runs report_builder.py if corrected.
Returns: PROOFREAD_CLEAN or PROOFREAD_CORRECTED
Flow each round:
Round 1: Data Explorer → [skip Tuner] → Models → Analyst → Critic → Reporter
Round 2+: [skip Explorer] → Tuner → Models → Analyst → Critic → Reporter
PrismMMM uses a Notion knowledge layer as a live data dictionary. Business teams can update field descriptions, expected ROI ranges, and known data issues directly in Notion — no code changes needed. Every time discover.py runs it pulls the latest knowledge and merges it into metadata.json, which all five agents read.
Three Notion databases:
| Database | What it stores |
|---|---|
| Field Definitions | Column name, label, type (channel/kpi/control), unit, expected ROI min/max, description |
| Business Context | Brand, market, currency, seasonality notes, typical media share |
| Known Issues | Data quality problems with severity (high/medium/low) and recommended action |
... → Connections → connect your integrationpython discover.py --source csv --path ./data.csv \
--notion-token $NOTION_TOKEN
Store the token as an environment variable — never hard-code it:
export NOTION_TOKEN=ntn_...
The demo uses the Multi-Region MMM Dataset for eCommerce Brands published on Figshare under CC BY 4.0. Results are illustrative and not from a real brand.
prepare.py supports three data sources via config.json:
| Source | Config key | Install |
|---|---|---|
| CSV file | "source": "csv" |
none |
| BigQuery | "source": "bigquery" |
pip install google-cloud-bigquery |
| Google Sheets | "source": "gsheet" |
pip install gspread google-auth |
git clone https://github.com/ScarlettQiu/prismmmm.git
cd prismmmm
pip install -r requirements.txt
Install JAX and the full model libraries (recommended):
pip install jax jaxlib lightweight_mmm pymc-marketing
Both install cleanly on Python 3.10 (CPU, Apple Silicon and x86). Without them, LightweightMMM and PyMC fall back to scipy NNLS.
Option A — use discover.py (recommended for any new dataset):
python discover.py --source csv --path ./your_data.csv \
--notion-token $NOTION_TOKEN # optional — enriches with Notion knowledge
This auto-detects columns, generates config.json and metadata.json, and pulls your knowledge layer from Notion.
Option B — use the included Conjura eCommerce MMM dataset:
data.csv in this repo is a ready-to-use sample: 132 weekly observations for an Apparel brand (2021–2024), with 8 Google + Meta spend channels and revenue as the KPI. Already profiled and ready to run.
Source: Multi-Region Marketing Mix Modeling MMM Dataset for Several eCommerce Brands — Conjura via Figshare. Contains 93 brands across multiple regions and verticals.
# Already included — just run the loop:
Read program.md and run the loop.
Open a claude terminal session in this directory:
Read program.md and run the loop.
Claude orchestrates all five agents. Round 1 runs the Data Explorer first:
PrismMMM starting. Reading state.json…
Round 1: skipping Tuner (no prior results)
Spawning Data Explorer...
EXPLORATION_DONE: rounds/R01_data_exploration.md
Readiness score: 3/5 — 12 periods, August anomaly flagged
Running models... ridge ✓ pymc ✓ lightweight_mmm ✓
Spawning Analyst...
ANALYSIS_DONE: rounds/R01_analysis.md
Spawning Critic...
REVISE: 0% media contribution must be labelled as model failure, not neutral finding
Analyst revising...
APPROVED
Spawning Reporter...
REPORT_DONE: results/report.md + results/report.pptx
Round 1 complete.
Each round the Tuner tries one config improvement:
# New rows appended — re-profile, keep round history
python discover.py --source csv --path ./data.csv --notion-token $NOTION_TOKEN
Read program.md and run the loop.
# BigQuery source
python discover.py --source bigquery \
--query "SELECT * FROM project.dataset.mmm_weekly" \
--project my-gcp-project \
--notion-token $NOTION_TOKEN
Each round the agent tried one config change and the Critic evaluated whether the results were trustworthy. Here is what happened:
| Round | Change | Best MAPE | Key Finding |
|---|---|---|---|
| 1 | Baseline | 23.21% | Data Explorer flagged 5 KPI anomalies, 76% zero-spend on Google Shopping |
| 2 | adstock_max_lag 2 → 1 |
20.39% | Short lag better for digital channels — MAPE improved 2.8pp |
| 3 | hill_ec 0.5 → 0.3 |
13.05% | Lower saturation threshold unlocked attribution — MAPE improved 7.3pp |
| 4 | Per-channel adstock decays (from Notion) | 13.12% | Meta Facebook achieved ✅ High cross-model agreement (CV 71% → 7.9%) for the first time |
Rounds 1–3 used a single global adstock decay of 0.4 for all channels. But paid search decays in days (intent-driven), while video builds brand awareness over weeks. Applying a uniform decay rate misrepresents how different media types carry over — and the models can’t figure this out from data alone.
Round 4 connected a Notion knowledge layer via MCP. Business context was added to Notion (per-channel decay benchmarks, purchase cycle, seasonality) and discover.py pulled it into metadata.json automatically at the start of the round. The Tuner then applied the domain-informed decay rates:
"channel_adstock_decays": {
"google_search": 0.2, ← intent-driven, decays in days
"google_video": 0.7, ← brand building, multi-week carryover
"google_display": 0.6, ← awareness, medium carryover
"meta_facebook": 0.5, ← social, 2–3 week carryover
"meta_instagram": 0.5
}
Result: Meta Facebook’s cross-model disagreement dropped from CV=71% to CV=7.9% — the first channel to reach ✅ High agreement across models. This is domain knowledge the model could not derive from 132 rows of data on its own.
See the full comparison: Round 3 vs Round 4
No single MMM model is right in all situations. Each makes different assumptions about how media drives sales.
1. No model is always correct Ridge is fast and transparent but can shrink correlated channels to zero. PyMC captures diminishing returns and uncertainty but is slow and sensitive to prior choices. LightweightMMM enforces positive-only ROI but may over-attribute to correlated channels. Each has blind spots the others don’t share.
2. Agreement builds confidence, disagreement reveals risk When all three rank the same channel as top performer, you can act. When they disagree, that’s a diagnostic — thin data, collinearity, or a modelling assumption worth questioning. A single model can’t tell you this.
3. Different models suit different situations
| Situation | Best model |
|---|---|
| Quick first pass, any data size | Ridge — runs in seconds |
| Small dataset (<30 periods) | PyMC — priors compensate for thin data |
| Production budget decisions | PyMC — full credible intervals |
| Need positive-constrained estimates fast | LightweightMMM |
| Final validation | All three — consensus = trustworthy |
| Model | Method | Uncertainty | Requires |
|---|---|---|---|
| Ridge | Regularised regression + bootstrap (200 samples) | Confidence intervals | sklearn only |
| PyMC | Full Bayesian with DelayedSaturatedMMM | Posterior distribution | pip install pymc-marketing |
| LightweightMMM | Google’s JAX-based Hill + adstock | Posterior samples | pip install lightweight_mmm |
Ridge always runs with no extra dependencies. LightweightMMM and PyMC fall back to scipy NNLS only if JAX / pymc-marketing are not installed.
Role: EDA on the raw dataset before any model training — runs once per dataset (Round 1 only).
Produces a structured report covering: dataset overview, KPI distribution, channel spend analysis, pairwise collinearity (Pearson r + VIF), anomaly detection (z > 3σ), multi-entity check, and a 1–5 readiness score with specific recommended actions. The Analyst and Critic read this report every round to ground their interpretation in data quality facts.
Role: Iterates model configuration between rounds to improve fit.
Proposes exactly one config change per round — adstock decay, Hill slope, or PyMC sampling depth. One change per round keeps experiments comparable.
Decision rules:
adstock_max_laghill_slopeRole: Interprets raw model numbers into a business narrative.
Reads model output + EDA report. Covers ROI rankings, model agreement/disagreement, contribution plausibility, and what the data can and cannot support. Under 400 words, always cites actual numbers.
Role: Quality gate — challenges the Analyst before anything reaches the report.
Runs six checks. Issues REVISE with a specific reason if any check fails. Analyst fixes once, Critic re-reviews. Max one revision cycle — no infinite loops.
| Check | What it catches |
|---|---|
| Overfitting | R²=1.0 on small samples |
| Sign correctness | Negative ROI despite confirmed spend |
| Contribution plausibility | Media <5% or >80% of KPI |
| Consensus honesty | Analyst ignored model disagreements |
| Collinearity | Channels that co-moved, confusing attribution |
| Sample size caveat | Limitation not clearly communicated |
Role: Translates the approved analysis into stakeholder language.
Only runs after APPROVED. Plain English, no jargon, no model names in the headline. Uses business-friendly channel labels from the Notion knowledge layer. Produces report.md + report.pptx.
| File | Contents |
|---|---|
metadata.json |
Dataset profile + Notion knowledge layer (channels, ROI ranges, known issues) |
rounds/R01_data_exploration.md |
EDA report: collinearity, anomalies, readiness score |
results/report.md |
Final stakeholder report (plain English) |
results/report.pptx |
PowerPoint deck with model overviews, ROI charts, recommendations |
results/roi_comparison.csv |
Channel × model ROI table |
results/contribution_comparison.csv |
Channel × model contribution % |
results/model_fit.csv |
R², train MAPE, test MAPE per model |
results/latest.json |
Full raw results (latest round) |
rounds/R{N}_results.json |
Raw model output per round |
rounds/R{N}_tuning.md |
Tuner’s config change log |
rounds/R{N}_analysis.md |
Analyst’s interpretation |
rounds/R{N}_review.md |
Critic’s six-check review |
state.json |
Current round, best scores, run history |
prismmmm/
├── program.md ← orchestrator (start here)
├── discover.py ← auto-profiles dataset, fetches Notion knowledge layer
├── config.json ← dataset + model parameters (auto-generated by discover.py)
├── metadata.json ← dataset profile + Notion knowledge (read by all agents)
├── prepare.py ← multi-source data loader (CSV / BigQuery / GSheet)
├── run_models.py ← runs all 3 models, saves results
├── compare.py ← ROI/contribution comparison, agreement scoring
├── report_builder.py ← generates report.md + report.pptx
├── state.json ← round counter, best scores
├── data_dictionary.csv ← optional: your column descriptions (imported by discover.py)
├── agents/
│ ├── data_explorer.md ← EDA agent (Round 1 only)
│ ├── analyst.md ← interprets results, writes narrative
│ ├── critic.md ← six-check quality gate
│ ├── tuner.md ← iterates config between rounds
│ └── reporter.md ← plain-English report for stakeholders
├── models/
│ ├── ridge_mmm.py ← Ridge + bootstrap
│ ├── pymc_mmm.py ← Bayesian MMM (DelayedSaturatedMMM)
│ └── lightweight_mmm.py ← Google LightweightMMM / NNLS fallback
└── requirements.txt
claude) for the agent loopnumpy>=1.24.0
pandas>=2.0.0
scikit-learn>=1.3.0
scipy>=1.10.0
tabulate>=0.9.0
python-pptx>=0.6.21
statsmodels>=0.14.0
Recommended (enables full LightweightMMM and PyMC models):
jax>=0.6.0
jaxlib>=0.6.0
lightweight_mmm>=0.1.9
pymc-marketing
Optional (for non-CSV data sources):
google-cloud-bigquery # BigQuery
gspread google-auth # Google Sheets
MIT License
Copyright (c) 2026 ScarlettQiu
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.