In-context: June 9, 2025
In-context: June 9, 2025
Here’s a quick wrap of the three papers we found interesting over the last few weeks with some take home points.
0:35 - Superhuman performance of a large language model on the reasoning tasks of a physician
06:20 - MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks
11:45 - Identifying and mitigating algorithmic bias in the safety net
Some resources and papers we discuss:
Brodeur, P.G. et al (2024). Superhuman performance of a large language model on the reasoning tasks of a physician. ArXiv, abs/2412.10849.
Bedi, S. et al (2025). MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks. https://arxiv.org/abs/2505.23802
Mackin, S., Major, V.J., Chunara, R. et al. Identifying and mitigating algorithmic bias in the safety net. npj Digit. Med. 8, 335 (2025). https://doi.org/10.1038/s41746-025-01732-w
https://medium.com/data-science/reducing-ai-bias-with-rejection-option-based-classification-54fefdb53c2e