COEQ ML Model Testing Checklist

At COEQ, we believe AI quality is not optional — it is the foundation of trust. One of the most referenced frameworks in the AI testing space comes from Google: a structured set of 28 ML assertions. It covers four critical domains — Data, Model Development, Infrastructure, and Monitoring — and it remains the benchmark every serious AI team should measure itself against.

Here is the checklist in full, with COEQ commentary on why each domain matters.

ML Data

Poor data is the silent killer of ML systems. Before a model is ever trained, the quality, governance, and testability of your data pipeline determines everything that follows.

Feature expectations are captured in a schema
All features are beneficial
No feature's cost is too much
Features adhere to metalevel requirements
The data pipeline has appropriate privacy controls
New features can be added quickly
All input feature code is tested

Model Development

A model that performs well in the non-production and fails in production is a liability, not an asset. These checks ensure your model is robust, fair, and genuinely better than the alternatives.

Model specs are reviewed and submitted
Offline and online metrics correlate
All hyperparameters have been tuned
The impact of model staleness is known
A simpler model is not better
Model quality is sufficient on important data slices
The model is tested for considerations of inclusion

ML Infrastructure

Even a great model will fail if the infrastructure around it is fragile. Reproducibility, testability, and rollback capability are non-negotiable in production ML systems.

Training is reproducible
Model specs are unit tested
The ML pipeline is integration tested
Model quality is validated before serving
The model is debuggable
Models are canaried before serving
Serving models can be rolled back

Monitoring Tests

Deployment is not the finish line. ML systems degrade silently — through data drift, model staleness, and infrastructure regression. Monitoring is how you keep a model honest over time.

Dependency changes result in notification
Data invariants hold for inputs
Training and serving are not skewed
Models are not too stale
Models are numerically stable
Computing performance has not regressed
Prediction quality has not regressed

The COEQ Perspective

At COEQ, the ML model testing checklist sits at the core of how we assess, advise, and test AI systems for our clients. If your organisation is building or deploying ML systems without a structured test framework, you are not just taking a technical risk — you are taking a reputational one.

How many of these 28 checks can your team tick off today? If the answer is uncertain, that is exactly where COEQ can help.