Alright, so you were probably reading the IEC 62366 summary and now you ended up here, wondering how the hell to actually do a summative usability evaluation for IEC 62366 compliance.
First off, if you haven’t read the IEC 62366 summary yet, go over there right now and read it first! And then feel free to come back to this article.
Before I get started: In our Wizard, I’ve actually recorded a few videos in which I show you how to do a usability test of a software as a medical device. It even has actual videos of the tests, so that’s pretty cool. Okay, end of “internal company ad so that I can earn some money to pay my rent”. Moving forward - where to start for a summative evaluation for IEC 62366 compliance?
Who Will Be Your Test Subjects?
First off, you need to find a group of people who will be your test subjects. They should resemble the actual people who will be using your medical device. A few quick examples:
- If your software is geared towards specialists of a certain area (e.g. radiologists), you probably need those for your test.
- If it’s more of a “consumer” app (e.g. a patient-facing app like those DiGAs in Germany), you probably can get away with recruiting a broad group of “normal” people, regardless of whether they suffer from the disease you’re treating (more on that below).
- If your app has multiple different user groups, you might have to recruit all of them. For example, if your app has features for doctors and other features for patients, you might have to recruit both of them and include them in your usability tests. You might even have to come up with separate test tasks for each of those groups.
Side Note: “Normal” People Instead of Sick People?
For many patient-facing devices, you might actually be able to do usability tests with people who don’t have the actual disease you’re treating. For example, if your app is treating people with obesity, well, you could just use any sort of people (obese and non-obese) for your usability training. You’d need to argue that, for the purpose of your usability test, there’s no difference between obese and non-obese people because both groups are equally capable of using smartphones, reading instructions, etc..
As always, “it depends”. If your app is targeted towards diabetics, those people actually might have some impairments using smartphones, e.g. when using touchscreens (diabetic neuropathy) and reading things (diabetic retinopathy). So, as always.. apply good judgement and document your reasoning. Chances are, your auditor is going to agree with you.
How Many Test Subjects Do You Need?
It’s somewhat hilarious that the IEC 62366 doesn’t mention a number in this context. From what I’ve seen in audits, the minimum seems to be five subjects, based on an ancient publication by Nielsen approximately a million years ago (that’s the market research dude who founded the company with the same name). His theory was that five people are sufficient to discover the biggest problems, and it seems like we haven’t re-visited that theory in the meantime, or at least we haven’t found any data which would refute it.
The FDA, on the other side, suggests at least 15 people in their guidance document on human factors engineering.
So now the choice is yours. 5 people for minimum compliance, 15 people for better (and FDA) compliance, or something in between. Your call.
I did the usability tests for a client once and recruited around 10 people. I think that was quite reasonable. Also, I did all the tests remotely via Google Meet, and each test only took 30 minutes or less. So I got everything done in one day, and I think you can, too!
What To Test?
Generally speaking, you want to test the main use scenarios of your product, and especially the hazard-related use scenarios. From experience, you can cover those with 10 tests or less, usually much less.
You’ll have those use scenarios documented somewhere as “user needs” or so. If you’re using our free templates (which I greatly recommend), you’ll add them to the User Needs List (template link). Or, if you’re using our eQMS, Formwork (which I recommend even more greatly), you’ll have separate forms for that, helping you exactly what to enter, and you can even ask the software to generate stuff for you.
Let me give you some examples. Let’s think about the user needs of a fancy Covid diagnosis app which diagnoses Covid based on selfie pictures of the users (just as an abstract example - hear me out):
- Users want to receive a Covid diagnosis by using their smartphone (hazard-related)
- Users want to receive a recommendation on what to do next
- Users want to review past diagnoses and their times.
Just as in the IEC 62366 summary article, I marked one of those user needs as “(hazard-related)”. Arguably, making a Covid diagnosis could be related to hazards, e.g. if you make the wrong diagnosis (false positive / false negative) or if you show no diagnosis at all, or if you show a diagnosis too late.
So, the summative evaluation requirement would now be to test all of those user needs. Generally speaking, you should have at least one test covering each of the user needs above. But you could also have multiple covering one. If you think this starts getting messy in spreadsheets, I agree with you, it’ll be a huge mess - that’s where using a software like Formwork helps a lot. Let’s just assume a one-to-one mapping for now and draft a user test for the first user need above.
- Give the user a smartphone with the pre-installed application.
- Ask them to open the app and perform a Covid diagnosis.
- Ask the user about the diagnosis result and what they’re going to do next.
For each of these tasks, you should define acceptance criteria. Something like:
- User finds diagnosis feature in app, takes a selfie and receives diagnosis
- User understands diagnosis and next steps.
So.. if you’re now thinking about how you always have to help your parents fix their smartphones on Christmas Eve, this is very similar. One of the main goals, at least in my opinion, of the summative evaluation is to uncover situations in which users get “stuck” because they don’t know what to do next (“which button should I push? I can find any button?”) or situations in which users misunderstand things significantly (“it says I have Covid, so I’ll meet all my old friends to play Chess tonight, that’s not dangerous, right?”). Again, if you’re thinking about your parents now, those are exactly the people to think of, because they’re usually not super tech-savvy and good software applications should be easily usable, even for them (which actually is a sad statement for our current software as it’s often rather hard to use).
By the way, a one-to-one mapping of user tests to user needs clearly doesn’t work, because we’d still need to test for the actual hazards of the first user need. For example we’d want to test what happens when the app returns a false positive diagnosis.
In other words, the app tells a user that they have Covid, while the user actually is healthy. What would you want the user to do? Probably you’d display some sort of warning stating “Dude, you might have Covid but we’re not sure because all of this is based on a selfie picture. Do a real Covid test now. Also, our app is rather useless”. So you’d expect the user to do a Covid test. That would be the expected outcome of your usability test - you’d run through the scenario with the user, possibly modifying the app so that it returns a false-positive result, and ask the user afterwards what they’ll do next. The answer should be “I would be doing a Covid test now”. Hopefully.
Alright, so you know who to recruit for testing and what to test. How do you perform the actual test?
How To Do The Actual Test?
You can do it either in person or remotely. Note that, when doing it remotely, the tech setup is not necessarily trivial: You might want to see the person (via webcam) but also their smartphone / computer scren (via screenshare) to really understand what they’re doing or why they’re stuck. All of these are solvable problems. You’re in a tech startup, so you will figure it out. I’m just telling you that it’s slightly more work than simply setting up a video call - but then again, often not much more than that.
At the very minimum, you’ll have a participant and a test instructor for each session. The test instructor can also observe how the participant is handling the tasks and write down all the relevant findings. If you have many people in your company, you could also split up these roles and have a separate instructor and observer. Regardless, that will mean that you’ll be doing tests sequentially, as you can’t simply leave participants on their own.
And then you just walk them through the test based on the test steps above.
What To Document?
Yeah, right, I almost forgot. You have to document all of this stuff. Generally speaking and based on our templates, you’ll write down your general setup and methodologies in your Usability Evaluation Plan. The actual tests and their results are noted in the Usability Evaluation Protocol, and the summary of all tests with its final conclusion is documented in the Usability Evaluation Report.
Record Summative Usability Evaluation As Videos?
Do you have to record those sessions as videos? The answer is no. Writing up the protocols is enough. But it might not hurt to record them if it’s easy to accomplish (e.g. in Google Meet). You never know whether those recordings could come in handy, e.g. if you want to revisit them to see if you missed something. All the usual GDPR stuff applies here, of course - among other things, you need consent from your test subjects.
And.. that’s it? Ah wait, one more question!
Summative Evaluation On Prototype Or Final Medical Device?
The most common question I get is whether you can do the summative evaluation on a version of your software (or device) which is not final yet, i.e. during development. Yes, technically that’s possible, but you’d need to argue that the usability-relevant parts of it will not change in the final release. In other words, the version you’re testing now will not differ significantly from the version you’ll be shipping. And as we all know, software tends to change a lot, so.. just test your final software.