Do you believe in AI after launch? How to tackle the PMCF challenges of AI medical devices
For AI medical devices, post-market clinical follow-up (PMCF) is a crucial process to address the data-driven, dynamic nature of AI systems and their interaction with clinical environments. In this blog we will explore the peculiarities and challenges of PMCF for AI medical devices and look at strategies to address them.
General Requirements for PMCF Under EU MDR
PMCF is a continuous process that updates the clinical evaluation of a medical device after it is put on the market. It is an integral part of post-market surveillance (PMS) required by EU MDR. The regulation specifies that PMCF activities must be planned pre-market authorisation in the PMCF plan (which is assessed by Notified Bodies as part of the QMS and technical assessment for devices of risk class IIa and above).
The PMCF activities should:
- Confirm the safety and performance of the device throughout its lifecycle
- Identify previously unknown side-effects and monitor the identified side-effects and contraindications
- Identify and analyse emergent risks on the basis of factual evidence
- Ensure the continued acceptability of the benefit-risk ratio
- Identify possible systematic misuse or off-label use of the device, with a view to verifying that the intended purpose is correct
And do so by proactively and systematically gathering post-market and real-world clinical data.
Key PMCF activities include:
- Clinical trials
- RWE (real-world evidence) studies (see our blog post on RWE for clinical evaluation of AI devices)
- Analysis of device registry with clinical data on the manufacturer’s own device and/or on similar devices
- Surveys to collect data about the use of the device and user feedback
- Screening of scientific literature
The findings of the PMCF activities must be systematically reported in PMCF reports and the Clinical Evaluation Report (CER) must be regularly updated accordingly.
It is important to note that PMCF activities must be planned with a high level of detail regarding their objectives, design, data collection and analytical methods, and justification of their fitness for purpose.
For example, the PMCF activity plan for a clinical study must contain detailed information and justification about its objective, study design, endpoints, population, sample size calculation, statistical analysis, and expected quality of outcomes in light of the intended use and residual risk of the device. MDCG 2020-7 provides a useful PMCF plan template to help fulfil regulatory requirements.
How to deal with the PMCF challenges of AI medical devices
AI medical devices have specific characteristics that can be accounted for in their PMCF. Their data-driven reliance on diverse, often heterogeneous datasets for training and validation, their opaque and probabilistic nature, or their varying integrations into clinical workflows require tailored PMCF approaches. Let’s explore several of these challenges here.
Data dependency, temporal evolution, and generalisation challenges
AI systems rely on diverse, often heterogeneous datasets for training and validation. The data-driven nature of AI systems often results in performance variability across different clinical settings, patient populations, or imaging modalities. Models trained on limited or non-representative datasets may fail to generalise effectively, leading to discrepancies in performance.
Additionally, over time the real-world data encountered by the AI system may diverge from the data it was originally trained on, leading to reduced accuracy or relevance (“data drift”). The relationships between the input features and the predicted variable(s) learned during training may also change over time and/or across geographical areas (“concept shift”). Examples of data drift and concept shift, as well as more information on RWE for clinical evaluation of medical devices, can be found in this previous blog post.
The PMCF activities should thus enable close monitoring of model performance across the intended-use population with a limited temporal lag to quickly detect and resolve performance issues.
Examples of PMCF strategies:
- Design PMCF studies to actively collect real-world data from diverse and representative clinical settings, geographies, demographics, and workflows to monitor performance and identify potential performance issues across time, patient groups, and geographical locations
- Continuously gather and incorporate real-world data from underrepresented groups through PMCF initiatives to improve model robustness and real-world applicability
- Establish a process for periodic model retraining with updated datasets while ensuring compliance with change-management procedures in PMCF
- Implement robust PMCF data-monitoring systems to detect shifts in input data distributions
Probabilistic nature of AI models and model calibration
Many AI systems provide probabilistic outputs, such as confidence scores or risk predictions. These predictions may support clinical decision making and provide information to patients and healthcare professionals. The reliability of these scores depends on proper model calibration.
Model calibration refers to the process of ensuring that the probabilities or confidence scores provided by an AI model correspond accurately to real-world outcomes. For example, if a model predicts a 70% likelihood of a disease, this should mean that the disease is present in 70% of such cases in practice. Calibration is essential for building trust in the model's predictions and accurately informing clinical decision making.
Many sources may distort the calibration of risk prediction. Causes can be unrelated to the model development (e.g. varying disease prevalence) or related to it (e.g. model overfitting), leading to over- or under-confidence in predictions.
Examples of PMCF strategies:
- Regularly evaluate calibration quality during PMCF using a calibration plot or metrics such as the Expected Calibration Error (ECE), potentially across different patient subgroups and deployment settings
- Monitor how confidence scores influence clinical decision-making. Assess whether clinicians adjust their decisions based on over- or under-confidence in the model
- Plan measures to adaptively recalibrate the model in response to observed changes, ensuring that confidence scores remain clinically meaningful in real-world settings
- Test the model in diverse real-world settings to understand its generalisability and reliability in various clinical workflows
Opaque nature of AI models
Many AI systems operate as "black boxes"—a widely used term in AI that refers to their lack of interpretability—making it challenging to identify root causes of performance variations or unexpected outcomes.
Examples of PMCF strategies:
- Include domain experts in PMCF processes to review and interpret complex cases or cases where the model failed to provide a good prediction
- Establish qualitative and quantitative analytical frameworks for root-cause analysis of AI model failures
- Use Explainable AI (XAI) techniques in PMCF to improve transparency and interpretability of the model’s decisions
Clinical workflow integrations
The successful deployment of AI devices in clinical settings depends on their seamless integration into workflows and the ability of healthcare professionals to use them effectively. Poor integration can affect the device’s safety and performance, leading to errors which in turn impacts the benefit-risk profile. For AI devices, where predictions often influence decision making, monitoring user-device interaction in real-world settings is essential to ensure that the device performs as intended.
PMCF activities should focus on identifying whether usability issues or workflow challenges compromise clinical outcomes, safety, or the effectiveness of the device. These insights help maintain compliance with the intended use, optimise training, and ensure the reliability of the device in diverse clinical settings.
Examples of PMCF strategies:
- Continuously gather feedback from clinicians through targeted PMCF surveys or interviews to assess usability issues affecting the safety or performance of the AI device
- Assess the adequacy of user training by gathering feedback on clinicians’ proficiency and confidence in using the AI system. Identify gaps that could impact the device’s performance in practice
- Monitor the frequency and relevance of generated alerts or notifications. Measure override rates and gather clinician feedback to identify whether excessive or irrelevant alerts negatively impact clinical workflows or safety
- Establish a mechanism to report and analyse adverse events or errors related to user interaction with the AI device. Use this data to identify and mitigate usability-related risks
Unlocking AI’s power
The unique challenges posed by AI medical devices demand a tailored and robust approach to post-market clinical follow-up. By addressing issues such as data drift, model calibration, workflow integration, and the opacity of AI systems, manufacturers can ensure the continued safety, performance, and clinical benefit of their devices.
Proactive and well-planned PMCF activities not only support regulatory compliance but also foster trust among clinicians, patients, and stakeholders, ultimately enabling AI to deliver on its transformative potential in healthcare.

Want Scarlet news in your inbox?
Sign up to receive updates from Scarlet, including our newsletter containing blog posts sent straight to you by email.