ARTICLE 72 POST-MARKET HIGH-RISK AI

How to Implement Post-Market Monitoring for AI Systems Under the EU AI Act

Compliance with the EU AI Act does not end when a high-risk AI system is deployed. Article 72 requires providers to maintain a proactive post-market monitoring system throughout the system's operational lifetime, collecting performance data, detecting anomalies, and reporting serious incidents to national authorities. This guide explains how to design and operate that monitoring programme.

EU AI Act Reference

Article 72 requires providers of high-risk AI systems to establish and document a post-market monitoring system, proportionate to the nature of the AI technologies and the risks they present. The system must actively and systematically collect, document, and analyse relevant data from deployers and, where applicable, users, throughout the system's lifetime. Article 73 sets out the obligation to report serious incidents to market surveillance authorities without undue delay, and no later than 15 days after the provider becomes aware.

Why Monitoring Cannot Stop at Go-Live

AI systems are not static artefacts. The real world changes: user populations change, data distributions shift, adversarial actors probe for weaknesses, and the contexts in which the system is used evolve. Model performance that was adequate at deployment may degrade silently over months as the statistical properties of incoming data diverge from the training distribution, a phenomenon sometimes called concept drift.

Furthermore, rare failure modes may not surface during pre-deployment testing but will manifest at scale in production. A system processing tens of thousands of decisions per day will encounter edge cases that a testing dataset of thousands of examples cannot anticipate.

Post-market monitoring is the mechanism through which providers detect these issues before they cause harm and before regulators detect them first.

What to Monitor: Key Metrics

Accuracy / performance metrics

Measure the AI system's performance against the baselines established before deployment. Track trends over time.

Input distribution statistics

Statistical properties of incoming data: detect drift from training distribution before it degrades performance.

Output distribution statistics

Monitor the distribution of outputs. Unexpected skews in predictions or decisions may indicate model or data issues.

Error rates and confidence scores

Track the frequency of low-confidence outputs, errors, and anomalies over time.

Latency and availability

Infrastructure performance metrics that may indicate load issues, degraded service, or availability of fallback mechanisms.

Override and feedback rates

Frequency with which human operators override the AI output. A leading indicator of declining trust or performance.

User / deployer complaints

Formal and informal feedback channels that may surface issues not captured by automated metrics.

Near-miss and incident reports

Cases where the AI system produced an output that, had it been acted upon, would have caused harm.

Step-by-Step: Establishing Your Post-Market Monitoring Programme

Define the Monitoring Plan Before Deployment

Post-market monitoring cannot be retrofitted effectively. Define your monitoring plan as part of the pre-deployment compliance process. The plan should specify: which metrics will be measured, at what frequency, by what method, and by whom. Define alert thresholds for each metric, the levels at which an alert is triggered and a human review is initiated. Document the plan in your technical documentation (Article 11) so it forms part of the compliance record. Regulators will expect to see a monitoring plan and evidence that it is being followed.

Monitor for Accuracy Drift

Performance degradation is often gradual. A drift of a few percentage points per month that individually might seem insignificant can cumulatively represent a meaningful decline. Measure accuracy continuously against your pre-deployment baselines and plot trends over time, not just point-in-time snapshots. Where ground truth is available (e.g., loan default rates for a credit scoring model, actual health outcomes for a diagnostic system), use it. Where ground truth arrives with a lag, design your monitoring accordingly and account for the delay in your alert logic.

Monitor for Distributional Shift

Even before performance degrades noticeably, input distributional shift, where the statistical properties of incoming data diverge from the training distribution, is a leading indicator of future performance problems. Implement statistical tests to detect drift in input features. Common approaches include Population Stability Index (PSI) for continuous variables and chi-squared tests for categorical variables, applied periodically to a rolling window of recent inputs. Alerts triggered by significant distributional shift should prompt investigation of whether model retraining or recalibration is needed.

Set Up Automated Incident Detection

Define, precisely and in advance, what constitutes a "serious incident" for your high-risk AI system. Under Article 73, serious incidents include: any malfunction or use leading to death or serious harm to persons; any violation of fundamental rights obligations; or serious damage to property or the environment. In practice, many serious incidents will be evident from output anomalies detectable in the monitoring data. Automate alerting for threshold breaches and route alerts to the appropriate response owner with a defined response time SLA. Real-time visibility into AI inputs and outputs at the API layer is a practical foundation for this kind of automated detection.

Establish Incident Response Procedures

For each class of incident or threshold breach, define the response procedure: who is notified (named roles, not just job titles), what actions are taken (investigation, system suspension, model rollback, corrective action), and within what timeframe. Article 73 requires that serious incidents are reported to national market surveillance authorities without undue delay, and no later than 15 working days after becoming aware. Your incident response procedure must include this regulatory notification step, with clear ownership. Prepare reporting templates in advance. Drafting them under pressure during an active incident is error-prone.

Maintain Records of Monitoring Activities

Article 72 requires that post-market monitoring data is documented and analysed, and that records are maintained. Keep records of: monitoring metrics data over time, alert events and their dispositions, investigations conducted, corrective actions taken, and regulatory reports submitted. These records must be available to market surveillance authorities on request and must be retained for a minimum of ten years after the last high-risk AI system of that type has been placed on the market, unless sector-specific rules specify otherwise.

ARTICLE 73: SERIOUS INCIDENT REPORTING

Providers of high-risk AI systems must report serious incidents to the market surveillance authority of the Member State where the incident occurred. The report must be made without undue delay, and within 15 working days of becoming aware of the serious incident. Where the same AI system is deployed in multiple Member States and a serious incident occurs, each relevant national authority must be notified. Deployers must inform providers of serious incidents without delay to enable the provider to report to authorities.

API-Layer Monitoring as a Monitoring Foundation

For AI systems accessed via APIs, whether internal model deployments or third-party AI services, monitoring at the API layer provides a natural and comprehensive data source for post-market monitoring. Every AI inference request passes through the API; capturing inputs, outputs, latency, error codes, and metadata at this layer provides a complete dataset from which all the metrics described in this guide can be computed.

This approach is particularly valuable for organisations deploying third-party AI systems where direct access to model internals is not available. The API layer is the natural boundary at which the deployer can observe system behaviour and gather the data needed to fulfil Article 72 obligations.

← Back to Assessment