Alright! Now the MDR is here and nobody really has any clue on how to classify software. Well, we consultants typically pretend to understand it but the truth is, if you ask multiple consultants, you just get lots of different answers (and bills). But now the MDCG has released a guidance document on MDR medical device classification to clear up the confusion. Great!
What the hell is the MDCG? It’s a bunch of people who translate non-human-readable legislation like the MDR into barely-human-readable, badly formatted PDFs, called guidance documents. To their credit, I appreciate their initiative of actually providing concrete examples and implementation hints. Then again, shouldn’t that have been part of the actual legislation? Why do we need a separate group of people to translate legislation into guidance? And then you need yet another group to translate guidance into human language, those are consultants. Aren’t those also unnecessary? Oh wait, that’s us, and that’s how we earn money, so.. um, let’s continue. Let’s talk about the actual guidance document, MDCG 2021-24. And then let’s see if we can learn something about how to classify software as a medical device in general.
Update: We’ve published a list of MDR class I software medical devices on the market and their intended uses - take a look.
As a quick recap, if you want to bring a medical device to market, you need to comply with a bunch of regulations. This bunch of regulations varies in size, based on what class your medical device is. Put simply, more risky medical devices are typically higher classes. There are multiple classes, the most common ones being I, IIa, IIb and III. And, obviously, if your device is more risky, it’s in a higher class and you have to comply with more regulations.
Now, this guidance document, MDCG 2021-24, gives examples on what sort of medical devices belong in which classes.
As we’ll be only focusing on software here, I’ve omitted around 95% of the document because that relates to hardware. Further, I’ll only focus on rule 11 of the MDR here because that’s what we spend 95% of our time on when discussing classification stuff with our clients, and, generally speaking, everyone is crapping their pants over rule 11.
MDR Rule 11: What is it About?
What’s rule 11? It’s a section in the MDR which had the goal of classifying software. I’m not sure how well it achieved that goal, but here it is:
6.3. Rule 11
Software intended to provide information which is used to take decisions with diagnosis or therapeutic purposes is classified as class IIa, except if such decisions have an impact that may cause:
- death or an irreversible deterioration of a person’s state of health, in which case it is in class III; or
- a serious deterioration of a person’s state of health or a surgical intervention, in which case it is classified as class IIb.
Software intended to monitor physiological processes is classified as class IIa, except if it is intended for monitoring of vital physiological parameters, where the nature of variations of those parameters is such that it could result in immediate danger to the patient, in which case it is classified as class IIb.
All other software is classified as class I.
First off, did you also spot the probable typo? What the hell is a decision with diagnosis purposes? Did they mean diagnostic? That’s how they translated it in the official German version. I’m sure those people checked for typos before they released regulation affecting all medical devices in the EU.. right? Right?
But did you also notice the other huge bombshell? It says that software which is used to take decisions with diagnosis or therapeutic purposes is classified as class IIa. Typically, that sort of software was classified as class I under MDD.
The MDCG 2021-24 provides examples on many things, but the most interesting ones, for our purposes, are those for software under rule 11. So let’s have a look at those now! Quick notes: I replaced “MDSW” with “Software” because that’s more human-readable. I also removed the examples for “monitoring of physiological processes” because they’re not directly relevant here.
MDCG 2021-24 Examples With Comments
Class III: “Software intended to perform diagnosis by means of image analysis for making treatment decisions in patients with acute stroke.”
Okay, so this would be a typical Radiology AI startup developing some sort of machine learning model which detects abnormalities in images, in this case, stroke. Specifically, in this example, the software probably (?) detects whether a stroke exists and how long ago it happened (quick medical refresher: The treatment of stroke depends on how long it has existed when it is diagnosed). So the software tells the physician the stroke onset and the physician makes a decision on which treatment to perform.
It ends up being class III because it could lead to “death or an irreversible deterioration of a person’s state of health” as per rule 11. But can it really? I’m not so sure. If the result of the software (stroke duration) is shown alongside the actual (cat scan) brain images, well, then the final treatment decision still is done by the physician who can also review the original images. I suppose it depends on whether you could prove that physicians would always catch wrong software (every time!). In that case, it wouldn’t be class III but then again, I agree that this case is hard to argue.
If you’d imagine a slightly different software in which it only sends the result to a physician via text message without the possibility to review the original images, then yes, this is definitely class III.
But clearly, this example leaves room for many questions: Who makes the final decision when it comes to diagnosis of stroke? And what about treatment? It would make a huge difference if the software acts fully autonomously, analyzing all patient images and providing a final diagnosis and treatment decision to physicians, versus simply rendering some colorful square boxes into images to show where stroke lesions were detected.
If the software were fully autonomous (imagine a physician initiating surgery on a stroke patient simply because “the computer told them to” - wait, that doesn’t sound very realistic), then yes, this sounds like very risky software. But which software nowadays is fully autonomous? There’s always a physician in the loop somewhere. All ML-assisted Radiology software is somewhere on the spectrum between “fully autonomous” and “useless additional information”. It would have been really interesting to see whether classification differences exist in these cases.
But.. oh boy, this will be fun for all Radiology AI companies. I can imagine regulators jumping on this example and falsely trying to apply it to all Radiology AI companies. As soon as your software gets close to doing some sort of image analysis on life-threatening conditions, you run the risk of becoming class III.
Class IIb: “A mobile app intended to analyse a user’s heartbeat, detect abnormalities and inform a physician”
Okay, so this is confusing! Simply analyzing a user’s heartbeat doesn’t necessarily make a software a medical device at all. It depends what you do with it (we call this the Intended Use). If this would be for optimizing your meditation (yes, meditation, not medication), it wouldn’t be a medical device. But! The second half, “detecting abnormalities and informing a physician”, is what matters here. Because, I would assume, the software automatically makes some sort of diagnostic decision before informing a physician.
So this example makes sense.. initially. The software is clearly providing information used to take decisions with diagnosis or therapeutic purposes. Or, in human language, the software alerts a physician because it made an initial diagnosis.
But why does it become class IIb then, and not class IIa? Because it could lead to “a serious deterioration of a person’s state of health”. But.. doesn’t that depend on which diseases the software is intended to diagnose? If it’s for detecting weird anatomical variants of the heart with no implications whatsoever, I don’t see this being class IIb. If however it would be intended to detect atrial fibrillation, then, sure, it’s class IIb. But then again, atrial fibrillation doesn’t necessarily lead to serious deterioration of a person’s state of health. Confusing.
Talking about detecting atrial fibrillation - isn’t that exactly what the Apple Watch does? And, surprisingly, that’s not a EU MDD / MDR medical device at all. Something really doesn’t add up here.
Class IIb: “Software intended for diagnosing depression based on a score resulting from inputted data on patient symptoms (e.g. anxiety, sleep patterns, stress etc.).”
This is probably the first time someone has used the past tense of “input”: Inputted. Interesting.
So the idea of this software is probably that it computes a preliminary diagnosis. Something like, an online form which people fill out and, if the software thinks that someone has depression, it forwards it to a physician. Class IIb makes sense here because if the software misses a diagnosis, that could lead to a serious deterioration of a person’s state of health - worst case, someone is suicidal and the software doesn’t spot it. Wait, wouldn’t suicide make it class III? Hm.
But this is the first time I sense the huge elephant in the room. Do you also see it?
It’s this: What if the software doesn’t make any diagnosis and only forwards data to the physician to make the final diagnosis? Wouldn’t it be class I then? Or is it not even a medical device at all as it’s now only a data forwarding tool?
Or, let me make this more complicated: The software only calculates a (validated) depression score but leaves the final diagnosis up to the physician. Is the software still providing information used to take decisions with diagnostic or therapeutic purposes? I mean, the physician could also choose not to use the software and calculate the score manually by looking it up on Wikipedia.
And now, if you say “yes, calculating the score clearly is providing information for diagnostic purposes”, what about the data forwarding tool? With that line of argumentation, the data forwarding tool also sounds like a class IIa device because it also provides information for diagnostic purposes. But if the data forwarding tool is a medical device, wouldn’t WhatsApp and email also become medical devices because they’re also data forwarding tools? I don’t know, man.
This is a similar problem as the Radiology example above: There’s no clear differentiation based on the “level of autonomy” in which the software operates.
Class IIa: “Software that lists and ranks all available chemotherapy options for BRCA-positive individuals.”
This is, in my opinion, a hilariously bad example. Can you think of an example for a “Software that lists and ranks all available chemotherapy options for BRCA-positive individuals”?
I can, and it’s called Wikipedia, PubMed and Google Scholar. There are lots of places where you can read up on chemotherapy options for BRCA-positive patients. None of those are actual medical devices.
Giving the MDCG authors the benefit of the doubt, I think they were focusing on the feature of the software that lists and ranks the available therapies. In this case, I suppose the software would analyze a patient’s BRCA mutation and rank available chemotherapies based on their effectiveness (I’m making a lot of assumptions here!). In that case, yes, the software clearly is providing information for therapeutic purposes and I agree with that classification.
But as we’re talking about cancer treatment here, why is it class IIa and not IIb? Imagine the software suggests the worst possible chemotherapy which leads to cancer progress. That certainly is a serious deterioration of a person’s state of health which would make it class IIb.
Or.. is this classification assuming that the physician makes the final decision here and that’s the reason why this software is class IIa? We will never know. A good guidance documents creates more questions than it answers!
Class IIa: “Cognitive therapy Software where a specialist determines the necessary cognitive therapy based on the outcome provided by the Software.”
I have no clue what sort of software this should be. What the hell is an “outcome provided by the software”? If I message a Psychotherapist via WhatsApp and they determine I need cognitive therapy.. did WhatsApp “provide an outcome”?
Then again, it’s called “Cognitive therapy Software” in the example. But.. why would a specialist initiate cognitive therapy if the software is already providing cognitive therapy?
Making another gigantic ton of assumptions here, it could look like this: Patients use the software to do cognitive therapy - like some sort of gamification thing. The software also monitors their progress and calculates indicators (“outcomes”) for whether a patient may need additional, human cognitive therapy. Okay. That way the software is providing information used to take decisions with therapeutic purposes. I’d tend agree with that.
But.. ready for some chaos? How does this differ from the earlier example, “Software intended for diagnosing depression based on a score resulting from inputted data on patient symptoms” which was class IIb? Cognitive therapy is a valid therapy for depression. And if a depressed patient is using this software and it fails to detect that that person becomes suicidal, then suddenly this is class IIb, not IIa. Or even class III, as noted above?
I guess the main takeaways is that it doesn’t really depend on the cognitive therapy part, but 1) on what sort of patients are meant to be treated and 2) whether the software provides any indicators which influence diagnosis or therapy.
Or did they actually mean Cognitive Behavioral Therapy? I hope they asked some competent Psychiatrists before writing down such an example.
Class I: “Software app intended to support conception by calculating the user’s fertility status based on a validated statistical algorithm. The user inputs health data including basal body temperature (BBT) and menstruation days to track and predict ovulation. The fertility status of the current day is reflected by one of three indicator lights: red (fertile), green (infertile) or yellow (learning phase/cycle fluctuation).”
And finally! Our class I example. And yes, there is only one. I don’t know why. Because providing multiple examples would have actually reduced confusion?
I really like how they went into detail here, because this is the only example where further detail is really not helpful. I would have loved to have this level of detail on the other examples!
Unfortunately, this example is so besides the point that it’s hardly worth arguing about. By being about fertility planning, it cleverly circumvents doing any sort of diagnosis or therapy. So it’s class I and clearly not up-classified by rule 11. Not much to talk about here.
But that leads us back to the elephant in the room: Is there any sort software which is somewhat closer to diagnosis and therapy, yet still MDR class I? Like, you know, almost all medical apps out there?
What’s Really Class I Now?
When I started reading the guidance document, I was hopeful that I would finally be able to answer this question now and give startups some definitive examples for MDR class I software applications. That’s definitely not the case - well, unless you’re developing a fertility tracking app.
Still, if I read (heavily) between the lines of the guidance document, I can come up with a few ideas which might work:
Prevention of disease. An application which prevents disease still is a medical device (due to prevention of disease) but has solid chances of avoiding rule 11 because it’s somewhat distanced from providing information used to take decisions for diagnostic purposes. Like, an app to prevent depression in healthy people. However, there’s also a fine line which you shouldn’t cross, specifically, something like “alerting a physician if a person suddenly has the disease”. That would sound like making a diagnostic decision (or diagnosis decision? chuckle) and move you quite close to the above examples of the depression scoring and cognitive-therapy-software-forwarding-thing.
Direct therapy of disease. An application which directly provides therapy to the patient without influencing diagnostic or therapeutic decisions. Building on the depression-app example above, this could similarly be an app which provides an online depression course to patients. I admit that this may get dangerously close to providing information used to take decisions for therapeutic purposes. Or does it? If a person has already been diagnosed with depression by a Healthcare professional and the application doesn’t provide any further information which may influence any decisions, we could be good, or? It will be up to the local state authorities to assess that. This will be turn out to be very interesting!
Monitoring of disease. We’re moving closer to risky edge cases here. Something like a symptom tracker for rheumatism: People track what sort of pain they’ve had over time and can create helpful plots and data exports for their physician. But.. wouldn’t that be providing information used to take decisions for diagnostic or therapeutic purposes? Yes, I think it gets very close to that. It really depends on what that information is used for. If it’ll be used for changing your medication, then yes, that’s probably a therapeutic purpose. If it’ll be used for diagnosing disease progression, that may even be a diagnostic purpose. But maybe it’s only used to monitor the disease in general and optimize lifestyle and nutritional interventions? Then it’s still a medical device, because it monitors disease, but may have a chance of getting around rule 11.
And I think this is the core of the problem. It’s still unclear what providing information used to take decisions for diagnostic or therapeutic purposes is, because a diagnostic or therapeutic decision is made based on various pieces of information with varying levels of abstraction and riskiness. The rheumatism symptom tracker is a great example. That doesn’t sound very risky - the patient and physician look at the tracked pain levels and discuss whether to adapt the treatment. Is that already class IIa? Now, what if the rheumatism app provides direct treatment advice to the patient, like “increase your medication now!”? Is that class IIa again?! That doesn’t feel right.
Again, I’m not sure. It’ll be up to the state authorities. Speaking of which..
Let’s Take a Minute and Think About the State Authorities
Enforcement of the MDR for class I manufacturers rests upon the state authorities, at least here in Germany. And, oh body, I really wouldn’t want to be working there right now. The problem they’ll be facing is this: A random startup brings a supposedly-class-I-device to market. They hear about it in the news, because the startup reached unicorn status over night. They go there and audit them. It’s one of the edge cases above. Now they have to decide whether it’s really class I and, if not, make the startup take the device off the market. (Also, they read the guidance document and are as clueless as I am.)
The huge problem here is that you have no prior approval of class I devices - compared to class IIa and higher medical devices, you can be sure that your classification is correct because a notified body took a few months (yup) to check your documentation.
So, I’m not saying that we should introduce a pre-market audit process for class I devices (that would kill the last few remainders of innovation) but we need some sort of authority which gives definitive answers on class I edge cases before companies build them.
Other than that, contrary to my skepticism above, there are actually some helpful takeaways which surface after making some huge assumptions and reading between the lines:
If your software does direct diagnosis or influences therapeutic decisions, it’s at least class IIa. This is interesting because it includes ranking therapeutic options and making any sort of triage (in a wide sense), e.g. notifying physicians when patients fulfil certain criteria.
If a malfunction in that software could lead to suboptimal diagnosis or therapy which leads to an “serious deterioration of a person’s state of health”, it’s at least class IIb. This is much less clear-cut because the examples contradict each other. Based on those, if your software is triaging depression, you’re class IIb. But if you’re monitoring cognitive therapy treatment, which could also include depression, you’re class IIa?
Radiology AI companies may be in a class III situation as soon as their software handles any sort of life-threatening disease, even if their software doesn’t provide “risky” diagnostic / therapeutic information. Good luck with that.
All right, that’s it! I hope you’ve come away with slightly more knowledge after reading this. And even if it’s just a confirmation that we’re all puzzled by this classification chaos and are all asking the same questions - we’re all in this together :)