AI v exams - medical education and common sense

2022 was an astounding year for AI, the functionality of generative AI that has become available to the general public is leaping years ahead of expectations.

Generative AI are a type of AI programs that generate text, images, audio and sound based on someone providing a text based request (prompt). This powerful software has been coupled with a reasonably intuitive interface making it very easy to use. These in combination opened the use of generative AI to the general public and sparked the imagination of many.

An AI model called ChatGPT3 in particular has captured the imagination for its potential to impact healthcare services and how staff work. ChatGPT3 is what’s known as a large language model (LLM). One way to think about this form of AI is that it is presented with huge amounts of text (large amounts of language). It doesn’t learn knowledge as people do. But we can think of it as software that creates a probability or statistical mapping of language and patterns of language. It is able to probabilistically map patterns of which words go together in a sequence (a probability model). So when requested by the user to answer a question or write a story in a particular style it uses the statistical mapping to provide a sequence of words that match that request. This means text on a particular subject matter in a particular style of prose can be produced.

The “answer” the LLM provides is essentially a composition of all the relevant text the software has been provided. By relevant it means the words from the statistical mapping with high correlation of occuring in sequence. What it can do is impressively broad, example it can write job applications, song lyrics in the style of Nick Cave, and university essays.

An irrelevant test

Once ChatGPT was released for public use there has been lots of experimentation and surprise at how well AI does at exams for Law, MBA and Medicine. Which are irrelevant ways of testing AI.

As mentioned one way of thinking about LLMs is as a statistical model with the ability to recognise patterns and create new text that matches to patterns. In reality all the LLM success in a medical examination tells us is that there’s sufficient information on the internet that is openly accessible to be able to provide high quality answers to exam questions. The medical examination doesn’t quantify the ability of AI to reason and deduce. It’s giving an answer based on all the data openly available on the web. Considering the existence of rich troves of health information like the NHS website, Mayo Clinic health information and many others (perhaps even Wikipedia) it shouldn’t be surprising AI with access to all this information can answer these questions.

But there's also a lot of poor quality information and this is the problem, the "answer" AI gives is a combination of the good and bad. Worse still there is no way to check what's from a trustable source. What we need to start asking is what is the application of the AI we are seeking and how is the best way to test and benchmark this?

Person + AI to rethink education and testing

It’s important to keep in mind that testing software has different constraints to testing people. Software doesn’t tire and can be fed orders of magnitude more questions. That’s exactly what we need to do to benchmark this tech, identify the application area and have appropriate testing. What we can and can’t test for then needs to become part of education, training, examinations and regulation.

There’s a huge potential for LLM to continue to transform health care services but applying the AI to medical education examinations designed for people doesn’t tell us much. Instead we need to think about what an AI assistive tool can do and how we educate staff with AI. Then, and very importantly, test people in combination with the AI. It is this, the augmentation of people with machines that can drive significant change. But it will need significant change to education, training and examinations to bring the best of people working with AI into healthcare.

Its incredible to have the cumulative information from the internet at your fingertips in an easily digestible and usable form. LLMs could mean wonderful changes to the existing bottlenecks and limitations that exist in healthcare services. But to do so we need to avoid falling into the trap of thinking about the tech and instead think about the problem and what different technologies enable.

Share your thoughts

If you’d like to help inform this thinking you can send me a voice message on whatsapp (+44 7732 190 482).

OR:

You can comment and vote on other peoples comments here.

AI v exams - medical education and common sense

Pritesh Mistry

Pritesh Mistry

An irrelevant test

Person + AI to rethink education and testing

Tug of war: the NHS, shiny tech and the foundations we stand on

The NHS Digital Shake Up - a deep dive into the next 3 years

The Past and Future Leader

Why the NHS Needs a Rethink on Digital Infrastructure - Part 2

Initial insights: Stubborn health tech