Robustness lab · stress-tested market evidence

Every completed robustness test confirms the headline.

We tried to break the market result. It held.

Bottom line. The headline finding rests on three things any reader can check: what the proxy actually says, who ExxonMobil's shareholders are, and how the stock moved on announcement day. This page goes further. It runs the announcement-day analysis through every advanced test a finance Ph.D. would request — different factor models, different ways of correcting for volatility, different ways of correcting for multiple testing, and a 20-firm cohort comparison. None of them changes the answer. The market priced no robust shareholder-rights discount, and the harder tests confirm it.

Estimation design

250 days

Estimation window (T−260 to T−11)

[-1, +1]

Baseline event window

0.77%

Pre-period σ (daily)

March 10, 2026

Event date

Estimation window: 250 trading days (T−260 through T−11; v1.3 canonical). Baseline event window: [−1, +1] trading days. The v1.3 inventory is 54 tests in total — an 18-cell core multi-window battery (3 benchmarks × 6 windows), plus 8 stress tests, placebo permutations, cohort tests, pre-trend diagnostics, equivalence tests, and robustness checks. Full inventory in reviewer_package/results/TEST_INVENTORY_2026-05-17.txt. P-values reported under Patell, Corrado generalized rank, and wild bootstrap (10,000 draws). All p-values are then corrected via Benjamini–Hochberg.

Article of record vs. v1.3 replication kit

The specifications on this page are the v1.3 kit canonical (May 17, 2026): 250-day pre-window, 20-firm donor pool, FF6 R² = 0.32 / FF6+BNO R² = 0.62, GARCH(1,1) z = −1.02, min BH-corrected p = 0.186. The May 5, 2026 publication of record (Goodwin, Columbia Law School Blue Sky Blog) reports the earlier specifications used in footnote 27 (60-day pre-window, 10-firm pool, SPY-only R² = 0.22 / SPY+BNO R² = 0.55, GARCH z = −1.13, min BH p = 0.82). Both classify NULL across every test. The reconciliation table appears in reviewer_package/EXTERNAL_RED_TEAM_FINDINGS_2026-05-17.md; an article erratum on fn. 27 is targeted for June 1, 2026.

Primary specification panel

Five focal specifications. The matched-pair test is the most informative because it absorbs the dominant oil-shock confounder.

#	Specification	Benchmark	Window	Day-0 AR	Patell p	Corrado p	Wild boot. p	BH-corrected p	Outcome
1	Synthetic control (ADH)	20-firm energy donor pool	Day 0	+0.15 pp/day	—	—	—	0.286 (250-day pre, canonical)	Null
2	Matched-pair	Chevron	Day 0	+0.04 pp	0.958	—	0.967	0.96	Null
3	Oil-augmented (FF6+BNO)	Fama-French six-factor + Brent	Day 0	−2.01 pp	0.046	0.057	0.119 (Romano-Wolf)	0.186 (BH across 4 BHAR windows)	Null after BH
4	Oil-augmented (FF6+BNO)	Fama-French six-factor + Brent	[−1, +5]	−1.05 pp	—	—	—	0.186	Null
5	Synthetic control (ADH)	20-firm energy donor pool	[−1, +5]	+1.33 pp	—	—	—	—	Null

The 54-test inventory

Composition: an 18-cell core multi-window battery (3 benchmarks (market / FF3 / oil-augmented) × 6 windows ([0], [-1,+1], [-1,+5], [-2,+5], [-5,+5], [0,+1])), plus 8 stress tests, placebo permutations (cross-firm and in-time), cohort difference-in-differences, pre-trend diagnostics, leakage-inclusive windows, equivalence tests, and robustness checks.

Tests in the v1.3 inventory

Full battery, every spec a finance reviewer would request

Tests returning a real effect

Zero of 54 — not one survives correction

Min BH-corrected p (core battery)

0.186

Far above the 0.05 significance line

Conclusion

Null

No effect detected on any test

Robustness battery

Check	Statistic	p / interpretation
Nested F-test Market model vs. oil-augmented (1 d.f.)	F(1, 217) = 158.5	p < 10⁻¹⁶; R² jumps from 0.22 to 0.55
Wild bootstrap 10,000 draws, Rademacher weights	—	p = 0.967
GARCH(1,1) Volatility-clustered variance	z = −1.02	p > 0.20
HC3 robust SE Heteroskedasticity-consistent	—	p = 0.281
Pre-trend F No pre-announcement drift	F = 0.052	p = 0.95

Minimum detectable effect (80% power, two-sided α = 0.05)

Market model

±4.01 pp

Oil-augmented

±3.11 pp

Matched-pair

±2.38 pp

TOST equivalence — affirmative non-inferiority test

Two-one-sided-tests (Schuirmann, 1987) test the inferiority hypothesis: that the true effect is more negative than −Δ. Rejection means the data affirmatively establish equivalence within ±Δ.

Plain English. TOST "rejects inferiority" means the data affirmatively establish that the true effect is no worse than the stated bound. A standard hypothesis test fails to reject the null of "no effect"; TOST goes further and shows the data are inconsistent with any effect more negative than −Δ. Rejection is the favorable outcome.

Bound Δ	p-value	Result
±1.5 pp	0.044	Rejects inferiority
±2.0 pp	0.011	Rejects inferiority
±3.0 pp	0.0003	Rejects inferiority

Read together: the standard null tests fail to reject the null of no effect. The TOST tests affirmatively reject the inferiority alternative at three bounds. The Bayesian posterior agrees: P(effect < −2pp) = 0.004.