AI Mythbusting
EP. 09
In this episode, we tackle some common myths about how generative AI works, why this is the case, implications for healthcare and some quick fixes. These myths include 1) that LLMs can explain their reasoning 2) that LLMs can express uncertainty, 3) that LLMs can a) do maths, b) manage temporal data c) apply guidelines d) handle negation and finally that 4) that AI will replace clinicians.
02:00 Technical update - DeepSeek, other new models
10:00 - AI Mybusting
15:50 - LLMs can explain their reasoning
21:50 - LLMs can express uncertainty
26:40 - LLM blindspots
41:50 - AI will replace clinicians
Some resources and papers we discuss:
McCoy LG, Swamy, R, Sagar, N. et al :Do Language Models Think Like Doctors?” medRxiv https://doi.org/10.1101/2025.02.11.25321822
Cabral S, Restrepo D, Kanjee Z, et al. Clinical Reasoning of a Generative Artificial Intelligence Model Compared With Physicians. JAMA Intern Med. 2024;184(5):581–583. doi:10.1001/jamainternmed.2024.0295
Griot, M., Hemptinne, C., Vanderdonckt, J. et al. Large Language Models lack essential metacognition for reliable medical reasoning. Nat Commun 16, 642 (2025). https://doi.org/10.1038/s41467-024-55628-6
Ahn, Janice et al. “Large Language Models for Mathematical Reasoning: Progresses and Challenges.” ArXiv abs/2402.00157 (2024) https://arxiv.org/abs/2402.00157
Fatemi, Bahare et al. “Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning.” ArXiv abs/2406.09170 (2024): https://arxiv.org/abs/2406.09170
Wallat, Jonas et al. “A Study into Investigating Temporal Robustness of LLMs.” ArXiv abs/2503.17073 (2025): https://arxiv.org/abs/2503.17073
Zondag AGM, Rozestraten R, Grimmelikhuijsen SG, Jongsma KR, van Solinge WW, Bots ML, Vernooij RWM, Haitjema S. The Effect of Artificial Intelligence on Patient-Physician Trust: Cross-Sectional Vignette Study. J Med Internet Res. 2024 May 28;26:e50853. doi: 10.2196/50853. PMID: 38805702
https://medium.com/@avanib28264/no-elephants-in-the-room-ai-seems-to-think-otherwise-7d70d6a7d5a4