If you just got started by reading some regulation, your disillusioned self will assume that something like this would be never stated explicitly. You’re not far off: it always depends. As a software-as-a-medical-device (SaMD) manufacturer, your first point of orientation on usability questions might be the IEC 62366 standard.
Here comes the first hurdle: you will not find any useful answer in the IEC 62366-1 norm. For guidance on sample sizes for usability testing, you will need to visit the technical report IEC 62366-2. Go to Annex K, which describes the relationship between the probability of occurrence of a usability defect and the number of test participants. On three pages, you can find the basics for your statistical determination of the appropriate sample size. To cut it even shorter though: a sample size of five to eight participants is regarded as the standard best practice. For even more context, the 62366-2 references another separate guidance document: AAMI HE75:2009.
If your software is geared towards radiologists, it may be difficult and expensive to find anyone to sacrifice their time for your garage project. If it’s however a consumer-facing app, it becomes easy: you could literally ask a bunch of friends. Naturally, your sample size selection therefore not only depends on statistical and regulatory considerations, but also on the availability of resources. So let’s squeeze this guidance a little more. Is it five or is it eight participants at minimum?
Research by Jakob Nielsen at Sun Microsystems suggests that you could even go for the lower end. He advocated for smaller, but more frequent usability tests with five participants each – quoting from Wikipedia:
“Elaborate usability tests are a waste of resources. The best results come from testing no more than five users and running as many small > tests as you can afford.”
Note, however, that he suggested multiple tests instead! So if you choose that route, you should consider doing more testing after your initial product is certified and you have more money. Again quoting Wikipedia:
It is worth noting that Nielsen does not advocate stopping after a single test with five users; his point is that testing with five users, fixing the problems they uncover, and then testing the revised site with five different users is a better use of limited resources than running a single usability test with 10 users. In practice, the tests are run once or twice per week during the entire development cycle, using three to five test subjects per round, and with the results delivered within 24 hours to the designers. The number of users actually tested over the course of the project can thus easily reach 50 to 100 people. Research shows that user testing conducted by organisations most commonly involves the recruitment of 5-10 participants.
The United States’ FDA provides another final datapoint. Have a look at their publication on ‘Applying Human Factors and Usability Engineering to Medical Devices’:
Based on study observations, they recommend a minimum number of 15 participants for your usability tests. But not without device-specific considerations: if your device is supposed to be operated by lay users or if you target large user groups, you’re supposed to recruit more users; in the case of multiple user groups, you should have 30 participants per group. And finally, the FDA also requires you to provide a rationale for your sample size determination.
Note that FDA guidance is obviously not binding for you if you bring your device to market with a CE sign in Europe. However, amongst all guidance out there, it might present rather an upper end of recommended sample sizes.