Errors and human attention in a world of AI note taking

One of the AI technologies increasingly being used by clinicians is Ambient Voice Technology (AVT), also known as AI scribes. These are AI tools that record a patient-clinician interaction, convert the audio into a transcription and then summarise it for the medical notes. They can also generate letters to convey information and referrals to other parts of the health and care system.

In some settings these tools have demonstrated time savings, though that is not necessarily the case everywhere. Clinicians report that AVT improves working conditions by reducing cognitive load and enabling better interactions with the patient. A potential positive implication is more of the clinician's attention focused on the patient, enabling better observational analysis and drawing on the tacit knowledge that comes from consultations.

As I wrote previously, there are unanticipated negative implications of AI use which we need to be alert to so they can be mitigated. In the case of AVT, for example, notes are getting longer, which can make it harder to find relevant information. The same applies to letters.

AVT is far from perfect, but neither are people. And so, clinicians are required to check summaries and letters for errors. The combination of AVT with clinician checking has the potential to improve the accuracy of medical notes and letters. Until, that is, you consider human behaviours. Such as what happens as AVT error rates fall and the pressure to see more patients remains high: checking is likely to become more cursory as we’ve seen across different sectors, and errors get missed. This begs the questions: what is the value of clinician checking if it is not catching errors? And at what point are AVT summaries accurate enough to be acceptable without it?

There is existing research on errors in medical documentation from before the use of AVT. In the research documentation errors include missing information, incorrect information and incorrect medical codes. The accuracy of medical records is highly variable across different providers and settings. A systematic study shows hospital records have median accuracies of approximately 83 per cent, interestingly regulatory requirements and payment incentives show the ability to improve this. Surveys of patients in the UK and US indicate that approximately 20 per cent have identified inaccuracies or missing information in their medical records, most commonly related to diagnosis and treatment errors, incorrect medical history and medication or allergy errors. It’s important to remember that this is not an exhaustive list or research publications and there’s a high degree of variation.

AVT, with its possibility for audio errors and hallucinations, will have imperfect accuracy but as we can see, so does the current accepted standard of documentation. If the AVT error rate can be brought below the average error rate in our current documentation process, it will reduce inaccuracies in medical records. There is also potential for AVT to reduce errors due to missing information that already exists in the records. For example, in follow-on communications after an appointment, AVT can ensure all the relevant information is included in a letter. But it won’t help collate information not discussed in the consultation. And prompting the clinician to add missing context is going to be less effective when they are no longer looking at their screen!

In an environment where there’s not enough staff time or money we need honesty and transparency to understand what’s the most important use of staff time and what we need the technology to be able to do. This means understanding:

What is the baseline current accuracy of medical notes?
What is the baseline current accuracy of AVT, and do errors cluster around specific words to increase or reduce harm?
What proportion of clinicians are fully attentive when reviewing AI summaries? Can alternative user interface designs (not alerts!) and incentives make a difference?

Some economists frame AI replacement as more likely to occur when ‘AI + People’ doesn’t add anything in addition to the AI alone. An NHS where clinicians are reviewing AI outputs without critical appraisal of the output is complying to regulatory requirements without actually improving the safety or quality. We have a choice on how good AVT needs to be before it runs without clinical oversight but we need the evidence, acceptance and the cultural shift first.

I hope you enjoyed this post, if so please share with others and subscribe to receive posts directly via email.

Get in touch via Bluesky or LinkedIn.

Transparency on AI use: GenAI tools have been used to help draft and edit this publication and create the images. But all content, including validation, has been by the author.

Errors and human attention in a world of AI note taking

Pritesh Mistry

Pritesh Mistry

An Emerging Hidden AI Tax on the NHS

Tug of war: the NHS, shiny tech and the foundations we stand on

The NHS Digital Shake Up - a deep dive into the next 3 years

An Emerging Hidden AI Tax on the NHS