Chat GPT-4 at the level of expert doctors can handle visual assessment - research

A recent study by the Cambridge School of Clinical Medicine at the University of Cambridge showed that OpenAI GPT-4 achieved almost the same results in ophthalmological assessment as experts in this field, Engadget reports, citing the Financial Times.

In a study published in PLOS Digital Health, researchers tested LLM, its predecessor GPT-3.5, Google's PaLM 2, and Meta's LLaMA with 87 multiple-choice questions. Five expert ophthalmologists, three ophthalmology interns, and two non-specialized junior doctors took the same exam. The questions were taken from a handbook for testing student interns on all issues, from light sensitivity to injuries. The content of the handbook is not publicly available, so researchers believe that the interns could not have studied with it before. ChatGPT, equipped with GPT-4 or GPT-3.5, had three attempts to respond definitively, otherwise its answer was marked as zero.

GPT-4 showed better results than interns and junior doctors, correctly answering 60 out of 87 questions. Although this is significantly higher than the average score of junior doctors (37 correct answers), it only slightly outperformed the average score of three interns (59.7). While one expert ophthalmologist correctly answered only 56 questions, the other five averaged 66.4 correct answers, surpassing the machine. PaLM 2 scored 49 points, and GPT-3.5 - 42. LLaMa scored the lowest with 28 points, lower than the junior doctors. It should be noted that these tests took place in the middle of 2023.

Although these results have potential advantages, there are also many risks and concerns. Researchers note that the study offered a limited number of questions, especially in certain categories, which means that the actual results may vary. Graduates also tend to have "hallucinations" or fantasies. It's one thing if it's an insignificant fact, but asserting the presence of cataracts or cancer is a completely different story. As with many uses of LLM, systems also lack nuances, which creates additional opportunities for inaccuracy.

Recall that previously on the eve of the four-day expulsion of the CEO of OpenAI, Sam Altman, several researchers from OpenAI (the laboratory of artificial intelligence research) wrote a letter to the board of directors, warning about a powerful breakthrough in the field of artificial intelligence that could pose a threat to humanity. Reuters writes about this, citing two sources familiar with the matter.

OpenAI refused to respond to journalists' requests, but in an internal message to employees, they acknowledged the project called Q* (pronounced Q-Star), which could be a breakthrough in the company's long-standing search for so-called universal artificial intelligence (AGI) - an autonomous AI system that surpasses humans in performing most economically valuable tasks. With the help of massive computing resources, the new model could solve certain mathematical problems.

Also read:

  • ChatGPT compared to a nuclear weapon in terms of danger
  • ChatGPT creates its own language: already communicates with users
  • Chat GPT: a leap into the future or a step into the abyss?