Artificial Intelligence (AI) has been on the edge of my consciousness as a great hope for solving many of the problems in clinical medicine for maybe a decade. In 2011 IBM created a program called Watson which was able to answer questions in plain English and search data sources for answers quickly enough that it beat 2 humans in a game of Jeopardy. After its success in a game show, the program was used to make a chatbot to help people buy diamonds, to write recipes for Bon Appetit and by various financial firms to increase profits. It also has healthcare applications, including diagnosis and treatment recommendations as well as decision support for imaging. But that's just Watson, which isn't the big name in AI right now.
ChatGPT came out in 2018 and by 2021 it was available to users. Last year I signed up for it and used it a little bit for what it seemed to be good for. I tried asking it questions, mainly medical ones, and I used it a little bit to generate text to explain complicated things to patients. My young family members told me it was great for writing emails, which I declined to do because I think I am just fine at writing emails. It was also going to be great for writing essays in general. Also blog posts, even ones in my style! That totally seems like cheating so I'll pass on that.
I liked the text generation a little bit, but stopped using it because there is so much good information already available to explain conditions that are relevant to my work, and I could never be sure the information that GPT spit out was accurate. When I go to a Mayo Clinic site or American Diabetes Association or an article from a journal I respect, the source of the information is clear and I even know what biases I might expect to see. If I copied text from these sources into a patient's note I could be confident it was accurate.
What is this "AI"? Much of what we think of as AI right now springs from large language models which produce answers for us based on what humans have written and that is available to the program. This means that they are very good at writing cover letters or even college entrance essays, but not good at answering questions that haven't already been answered. Also, if the question we ask has an answer that has changed in the last little bit, that may not be reflected in the response.
My hope for AI with regard to diagnosis was that it would be able to identify variables associated with certain diagnoses that were diverse enough that my human brain couldn't see the connections. Like for instance perhaps a combination of features of a person reliably predict a condition. Maybe we could use information we normally gather in a patient encounter to more accurately determine what condition they have and use historical records to determine what treatment would work best. These obvious game-changers for my profession will not be solved by the AI that I have access to. What happened in the past to patients, including their diagnoses and treatments, has been based on such flawed and ever changing practice that, even if it were possible to collate and examine the myriad data points, the output would likely be garbage.
How about magnifying biases? We have read about AI's tendency to perpetuate our historical biases. AI will definitely perpetuate our biases because neural nets which are the basis of AI train on data sets from the past. Neural nets look at massive amounts of data in order to be able to produce answers to our questions. If a chatbot produces an assessment and plan it will resemble all the previous assessments and plans it has read. If we have routinely underestimated the pain experienced by people who are black (we have, see link) assessments of patients who are black produced by a chatbot will underestimate their pain. Even if an AI is just using numerical data to detect associations (vital signs, lab tests, presence or absence of symptoms), as it might if it were being used to diagnose disease, that data will be overwhelmingly based on people with better access to care. Diagnoses based on that information will be less accurate for patients who are socially disadvantaged. That doesn't mean AI and large language models would be completely useless, but it does make them less reliable. These problems will require remediation and in the meantime we will need to examine their conclusions in light of what we know to be our historical biases.
What we would really like AI to do is take care of the jobs we hate. For instance, billing. Billing for our services is incredibly soul killing. Kevin Schulman et al from the departments of medicine and business at Stanford University wrote an article in JAMA about how AI will likely fail to relieve us of this burden because the problem is just too large. As I see it, ideally we would have AI just help us know what to include when we write our encounter notes (or make edits as needed) to allow us to bill for what we do. Then it would produce the billing documentation and we'd be done. No billers, no us trying to figure out the proper codes, just work, then get paid. But that's not going to be so easy.
According to the authors' research there are over 300,000 health care plans in the US, about one for every 1000 people and each of them may have different benefits and documentation requirements. There are up to 14 steps in processing each payment. There are nearly 600,000 different codes used to describe different services and the prices associated with these codes are negotiated differently so there are around 57 billion prices for all of these. This complexity funds (meaningless) jobs which (nevertheless) keep families fed and maintain the perceived need for health insurance corporations. The conflict between providers of medical care who want to be paid as generously as possible and health insurance companies who want to pay us as little as possible is longstanding and perpetuates the complexity. If AI just completed our documentation, which would be nice, I suspect insurance companies would quickly respond by requiring more information, especially since AI is known for making things up. Complexity is at the heart of the huge costs and effort related to billing and will need a legislative rather than a technological solution.
So what is it good for? AI is extremely useful in discovering patterns in data. That can be and is used in research, helping discover what medications work for what conditions among other things. That is AI in the broader sense, not large language models such as ChatGPT. Large language models are good at writing and doing research on questions which use databases of text. They can definitely write emails and make recommendations regarding what to eat when you have the stomach flu. They may be able to produce notes by listening to what people say during a medical encounter. That would be amazing, but would definitely require editing. There will undoubtedly be some ways they can make billing and referrals easier, if only by producing excellently worded correspondence. Eventually artificial intelligence will overcome limitations and we will probably simplify processes to make this possible.
It will be interesting to see what the next 10 years brings. I suspect I will be beyond caring when AI takes all of our jobs, kills most of us and turns the rest of us into slaves to produce the massive amount of electricity required for the upkeep of our robot overlords.
Comments