The papers below represent published research across three practice areas: expert witness consulting, program evaluation, and regulatory analytics. Each paper is written for practitioners navigating real decisions — whether in litigation, program oversight, or regulatory governance — and addresses a specific analytical or accountability challenge within its domain. All papers are free to download and share.

Expert Witness Publications

Regulatory Analytics Publications

Program Evaluation Publications

Expert Witness Publications

Ten Questions To Ask Before You Trust an AI Expert Witness: A Due Diligence Guide for Legal Professionals

Before you retain an AI expert witness — or challenge one — you need to know what separates rigorous analysis from analysis that merely looks rigorous. This guide provides ten diagnostic questions, accessible to non-technical readers, that surface the difference.

Evaluating AI/ML Models in Legal Disputes: A Practitioner's Framework for Technical Scrutiny

A five-dimension framework — validity, feature appropriateness, fairness, robustness, and governance — for technically rigorous AI/ML model evaluation in legal proceedings. Includes guidance on discovery, expert evaluation, and Daubert considerations.

Proxy Discrimination and Insurance AI: Technical Standards for Detection and Legal Scrutiny

Explains why traditional disparate impact analysis fails for ML models in insurance, presents a rigorous detection methodology using external demographic data linkage and adversarial stress testing, and identifies the gap between NAIC Model Bulletin requirements and technically sound analysis.

The Vendor Said It Was Fair: Third-Party AI Models and the Governance Failures That Create Litigation Exposure

When an AI system causes harm, 'the vendor said it was fair' is increasingly failing as a legal and regulatory defense. This paper examines why vendor representations are not a substitute for independent governance, how to identify third-party AI governance failures in discovery, and what responsible oversight of vendor AI systems actually requires.

What the Model Remembers: Historical Bias in Training Data and Why Cleaning Inputs Doesn’t Clean Outputs

Removing race from an AI model does not make the model race-neutral — it makes the discrimination harder to see. This paper explains the mechanism by which historical inequality becomes structurally embedded in AI systems, why standard remediation approaches frequently fail to address it, and what the litigation and governance implications are for the cases that result.

Regulatory Analytics Publications

Regulating in the Dark: Why Agencies Don’t Evaluate Their Own Rules - and What It Costs

Regulatory agencies collect enormous quantities of data — and almost none of it is used to answer the question that matters most: is the regulation actually working? This paper examines why outcome evaluation is structurally absent from most state regulatory agencies, what it costs in concrete terms, and what agency leaders, legislators, and funders can each do to close the gap.

The Sophistication Gap: When Regulated Entities Know More Than Their Regulators

Regulated industries routinely submit analyses that are more technically sophisticated than the agencies reviewing them can fully evaluate — a structural asymmetry that erodes the substance of oversight without producing any single visible failure. This paper examines how the gap develops, why the standard responses fall short, and what a credible analytical capacity strategy for regulatory agencies actually requires.

The Average Is Not The Answer: Why Aggregate Metrics Fail Regulatory Oversight

Regulatory programs can hit every aggregate performance target while failing the populations they were most designed to protect — because market-level averages systematically conceal what is happening to specific consumers, geographic areas, and market segments. This paper examines the statistical mechanisms behind that failure, where it manifests across regulatory domains, and what agencies, legislators, and funders can each do to embed disaggregated analysis in how oversight actually works.

Program Evaluation Publications

Why Evaluations Fail Before They Start - Design Decisions That Compromise Findings

Program evaluation fails most often not because of bad analysis, but because of bad design — the vague question that cannot be falsified, the missing comparison condition, the evaluator engaged too late, the outcome metrics chosen for convenience rather than meaning. This paper identifies the five design decisions that most reliably compromise evaluation quality before the work begins, and what commissioners, program managers, and funders can each do to protect it.

Counting the Wrong Things - How Output Metrics Displace Outcome Evidence

Most programs measure what they deliver — participants served, sessions completed, dollars spent — and report those numbers as evidence that the program works. This paper examines why output metrics have displaced outcome evidence across government programs, nonprofits, and foundation-funded initiatives, what that substitution costs the populations programs are designed to serve, and what commissioners, funders, and program managers can each do to build accountability systems that actually answer whether programs work.