Round 3 · Critic Approved ✓

PrismMMM

Autonomous Marketing Mix Modelling — three independent models,
five specialist agents, zero manual tuning.

13.1%
Best Test MAPE
4.0×
Meta Instagram ROI
3
Rounds Run
44%
MAPE Reduction
PrismMMM · Architecture

Architecture

Five specialist agents coordinate autonomously — no human involvement between rounds. Each agent has a single role so no agent can both produce and approve its own output.

The 5-Agent Loop
🔍
Data Explorer · Round 1 only
EDA on raw data — collinearity, anomalies, VIF, readiness score (1–5)
→ EXPLORATION_DONE
⚙️
Tuner · Round 2+
Reads prior fit metrics, proposes one config change per round (adstock, Hill slope, samples)
→ CONFIG_UPDATED or NO_CHANGE
🧮
Models · every round
Ridge + bootstrap · PyMC Bayesian · LightweightMMM/NNLS — run in sequence, results saved to JSON
→ results/latest.json
📊
Analyst
ROI rankings, model agreement, contribution plausibility, business narrative — under 400 words
→ ANALYSIS_DONE
🔎
Critic
6-point quality gate: overfitting, sign correctness, plausibility, consensus honesty, collinearity, sample size
→ APPROVED or REVISE
📝
Reporter · after APPROVED
Plain English for CMO audience — no jargon, no model names in headline. Generates report.md + report.pptx
→ REPORT_DONE
Why Three Models?
Ridge
Regularised regression + 200-sample bootstrap. Runs in seconds. Always runs — no extra dependencies.
🎲
PyMC (Bayesian)
Full posterior distributions. Handles small samples via priors. Gold standard for production budget decisions.
LightweightMMM / NNLS
Positive-constrained estimates — no negative ROI. JAX-accelerated when available, NNLS fallback otherwise.
🎯
Agreement = confidence. Disagreement = diagnostic.
Where all three agree on a channel, act with confidence. Where they disagree — that's the model telling you the data is thin, channels are collinear, or an assumption is wrong. No single model can tell you this.
To run
Read program.md and run the loop.
Notion Knowledge Layer
Notion
3 Databases
Field Definitions
Business Context
Known Issues
discover.py
Auto-profiling
Detects columns
Pulls Notion context
Generates config
metadata.json
Shared Context
Expected ROI ranges
Seasonality notes
Data quality flags
All 5 Agents
Live Knowledge
No code changes
Business teams update
Notion directly
Data Explorer · Round 1

EDA before any model training

✅ Readiness Score: 4 / 5 — 132 weekly periods · 0 missing values · no severe collinearity
Revenue over time (weekly, USD)

Mean $30.3M · Max $114M (holiday spikes) · CV 67% — right-skewed, seasonal pattern

Total spend by channel (2021–2024)
Anomalies detected
2022-05-30
Revenue $114M + google_shopping spend spike — synchronized outlier, likely promo event
z = 4.1σ
2023-05-29
Revenue $113M — possible annual promotional period
z = 4.1σ
2023-12-11
Revenue $104M — holiday season spike
z = 3.6σ
2023-05-22
Revenue $95M — keep with flag
z = 3.2σ
Channel correlation with revenue (Pearson r)

google_display (r=0.02) and google_video (r=−0.05) show near-zero KPI correlation — low identifiability.

Flags before modelling
⚠️ google_shopping — 76.5% zero-spend weeks · sparse · may not be identifiable
⚠️ 2022-05-30 — add binary event indicator before attribution
✅ No severe collinearity · max VIF 6.57 · no channel pair |r| > 0.7
01 — The Dataset

Conjura
eCommerce MMM

Weekly spend and revenue across 8 Google + Meta channels for an Apparel brand. KPI: first-purchase revenue in USD.

Source: Figshare · Multi-Region MMM Dataset · 93 brands

132
Weekly observations
128 train · 4 holdout
8
Media channels
Google Search, Shopping, PMax, Display, Video · Meta FB, IG, Other
$30M
Average weekly revenue
USD — peak $114M (holiday)
5
Revenue anomalies flagged
Seasonal spikes · z > 3σ · May & Dec peaks
02 — Three Models

Each model has a different blind spot

Agreement is signal. Disagreement is a diagnostic. Running all three is the only way to know which result to trust.

Ridge

frequentist · L2 regularisation
Strengths
Fast, runs in seconds
Best out-of-sample MAPE this run
Limitations
Alpha=109.85 zeroed all channel coefficients
Baseline-only model — no channel signal
13.1%
Test MAPE · R²=0.46

NNLS

non-negative least squares · fallback
Strengths
Only model producing channel-level ROI
Enforces positive attribution
Limitations
No uncertainty bounds
Sensitive to outlier spend weeks
13.1%
Test MAPE · R²=0.46

PyMC

bayesian · fallback BLM
Strengths
Principled uncertainty quantification
HalfNormal priors constrain positivity
Limitations
Full MMM timed out — no C compiler
Fallback BLM gives zero channel ROI
13.1%
Test MAPE · R²=0.46
03 — Autonomous Tuning

Three rounds, one change per round

The Tuner reads fit metrics, picks one parameter to change, models re-run, the Critic approves or requests revision before the next round starts.

R1
23.2%
Baseline
lag=2 · slope=1.5
APPROVED
R2
20.4%
adstock lag
2 → 1 ▼12%
APPROVED
R3
13.1%
hill_ec
0.5 → 0.3 ▼36%
APPROVED ✓
Best MAPE 23.2% 20.4% 13.1% ✓ — 43% total reduction · PyMC best model · 3 rounds
04 — Channel ROI

Meta dominates.
Two models agree.

First cross-model signal: Ridge and NNLS both rank Meta Facebook #1 and Meta Instagram #2. ROI shown is NNLS; Ridge values are similar.

2.9×
Meta Instagram ROI · NNLS · Round 3
Meta confirmed
by two
models.
Meta Facebook 37–42% · Meta Instagram 10–20% · Ridge + NNLS agree · Round 3
1.7×
Meta Facebook ROI (cross-model mean)
47–62%
Meta revenue share across models
2.9×
Meta Instagram ROI · best channel
05 — Contribution Breakdown

How each model attributes GMV

NNLS Contribution (Round 3)
67% MEDIA SHARE
Meta Facebook — 42.5%
Meta Instagram — 20.1%
Google Shopping — 4.5%
Baseline / Organic — 33%

First cross-model signal

Ridge and NNLS now both attribute positive revenue to Meta Facebook and Meta Instagram — the first agreement across two models. Ridge adds Google Search (19.5%) which NNLS misses. PyMC fallback still returns zero channel attribution. Google Display (−12.8%) and Google Video (−2.6%) show negative Ridge contributions — a sign-confounding artefact from correlated spend, not a real negative effect.

Ridge Contribution
NNLS Contribution
01
⚠️
Outlier Spend Week
2022-05-30: google_shopping spend spiked z=10.2σ, revenue spiked z=4.1σ in the same week. NNLS attributes the full revenue uplift to google_shopping, inflating its ROI to 22.3. This is almost certainly a data artifact — not a real response curve. Add an event indicator before trusting that estimate.
Needs event flag
02
🧮
Only One Model Has Signal
Ridge (alpha=109.85) and PyMC fallback both zero all channel coefficients. Only NNLS produces channel attribution — making cross-model agreement structurally impossible. Install gcc for PyMC's full MMM and fix Ridge's regularisation before trusting any ROI ranking.
3 models needed
03
📉
4-Period Holdout
Test MAPE is calculated on just 4 weeks. A single outlier week (holiday season, promo) can swing MAPE by 5–10 percentage points. Expand holdout to 8–13 weeks (one full quarter) for reliable out-of-sample evaluation.
Expand holdout
07 — Recommendations

Three actions, in priority order

01
Meta Instagram and Facebook are the only confirmed channels
Ridge and NNLS both rank Meta Facebook #1 and Meta Instagram #2 — the first cross-model agreement in this project. Meta Instagram delivers more per dollar (NNLS: 2.9×, Ridge: 1.5×) despite lower absolute spend ($226M vs $624M). This is directionally actionable within the Meta portfolio — consider shifting budget toward Instagram at the margin.
02
Do not act on Google channel ROIs yet
Google Display shows −$55 Ridge ROI, Google Video −$4.4 — sign confounding from correlated spend patterns. Google Search shows 30.7× Ridge ROI which is implausibly high (meta_other shows 291× for the same reason: tiny spend amplifies any positive coefficient). These signals need PyMC's priors to stabilise before any reallocation.
03
Install gcc to unlock full Bayesian MMM
Run brew install gcc — this enables PyTensor C compilation, cutting PyMC sampling from 58 minutes to ~5 minutes. With full PyMC running, cross-model consensus on Meta channels would be confirmed with posterior credible intervals — enough to justify a budget reallocation decision at the marketing director level.
Bottom line: First real finding — Meta Facebook and Instagram confirmed by two independent models. Meta Instagram is more efficient per dollar. Google channel ROIs are noisy and need PyMC to stabilise. Install gcc and run Round 4 for board-level confidence.

Dataset: Multi-Region MMM Dataset for eCommerce Brands, Figshare, CC BY 4.0. Results are illustrative and not from a real brand.