Clinical evaluation for SaMD: cover the whole intended purpose
Clinical evaluation under the UK and EU MDR lives or dies on how well you evidence the intended purpose.
A neat way to plan a clinical evaluation strategy is to consider the key component of the intended purpose and then choose clinical evaluation activities that, together, cover them all.
Start with the intended purpose
- Patient population: Who is the software for? Age ranges, comorbidities, etc
- Clinical conditions: Which signs, symptoms, or diseases will the software address?
- Use environment(s): Where will it be used — home, ambulance, primary care, radiology workstation, operating theatre?
- User(s): Who will use it — lay people, nurses, radiographers, physicians, or mixed teams?
Map activities to components
Different activities naturally illuminate different parts of the intended purpose:
- Usability and human-factors testing usually involve the intended users working in the intended-use environments with realistic tasks and use errors
- Retrospective clinical-performance studies — for example, running the algorithm on curated records or images — primarily involve the patient population and the clinical conditions
- Clinical studies generally put the device in the hands of the intended users, and will recruit the intended patient population for the purpose of identifying, managing, and predicting the relevant clinical conditions
- The constrained nature of clinical investigations may mean the device is not used in a setting fully reflective of its intended-use environment
- Real-world studies (for example, prospective observational deployments in routine care) can cover all four elements at once
An example portfolio (imaging triage software)
Imagine software that flags suspected pneumothorax on chest X-rays to speed up radiologist review, intended for adult emergency patients in hospital radiology and emergency departments.
- Usability and human factors: Simulated reading sessions with radiologists and emergency physicians using the actual workflow in the picture archiving system and emergency-department viewing stations
- Goal: demonstrate safe, effective interaction, and risk control for use errors relevant to those environments
 
- Retrospective clinical performance: Multi-site dataset with ground-truth pneumothorax diagnoses from blinded consensus covering male and female adults (across all adult age brackets) presenting to emergency departments.
- Endpoints: sensitivity, specificity, time-to-notification distribution; pre-specified subgroup analyses (portable vs fixed X-rays, trauma vs non-trauma). This directly addresses the target patients and condition
 
- Prospective clinical investigation: Study where readers act on software outputs in real time, but within a controlled research protocol.
- Measures: impact on time to first read, change in reporting accuracy, and workflow effects. This adds evidence about real users acting in practice, though not in unconstrained environments
 
- Real-world deployment study (where feasible pre market, or planned as post-market clinical follow-up): Limited-scope rollout across emergency and radiology departments with routine staffing, real interruptions, and site variability.
- Collect effectiveness, safety signals, and usability in context — covering all four components together. Note that while real-world evidence may sound like a panacea, it has its own specific limitations to consider, including data representativeness, bias, confounding and difficulty demonstrating causal effects
 
Right sizing for pre-market certification
Regulators expect a clinically meaningful body of evidence that reflects the way the software will actually be used. The most efficient portfolios combine activities so that all four components are covered before certification.
When an activity can credibly cover more than two components at once — such as a well-run real-world study — use it.
Where a prospective trial cannot represent the use environment, balance it with usability work in real settings or with a pragmatic deployment study.
Common pitfalls to avoid
- Over-reliance on curated retrospective data without showing performance on messy, real workflows
- Ignoring the use environment (for example, assuming radiology and emergency department contexts are interchangeable)
- Choosing endpoints that do not match the clinical decision your intended users must make
- Study population not representative of the target population (e.g. in terms of patient demographics, condition severity or presentation)
Key takeaways
It's important to plan evidence against patient population, clinical conditions, use environments, and users.
Match activities to components: usability (users & environments), retrospective studies (population & conditions), prospective investigations (population & conditions & users), real-world studies (all four).
For pre-market certification, assemble a portfolio that covers every component; the more components each study addresses, the stronger — and usually more efficient — your case.

Want Scarlet news in your inbox?
Sign up to receive updates from Scarlet, including our newsletter containing blog posts sent straight to you by email.


