Artificial intelligence technology has advanced to the stage where it is could be considered as accurate as trained medical experts in detecting illness and disease, according to a paper published in the Lancet Digital Health journal.
In a systematic review of 82 existing studies dating back as far as 1951, the paper compared the diagnostic performance of deep learning models and healthcare professionals based on medical imaging for any disease.
Deep learning is a form of AI which employs algorithms, big data and computing power to emulate human intelligence.
In medicine, it allows computers to identify patterns of disease by examining thousands of images before applying what they learn to new individual cases to provide a diagnosis.
Deep learning offers considerable promise for improving the accuracy and speed of diagnosis through medical imaging.
Lead author of the paper and affiliate of the University of Birmingham’s NHS foundation trust Dr Xiaoxuan Liu said the findings were encouraging but did not suggest AI could replace humans.
“There are a lot of headlines about AI outperforming humans, but our message is that it can at best be equivalent,” she said.
First of its kind
Studies for the review were carefully selected from 13 different specialty areas including ophthalmic disease, trauma and orthopaedics, cardiology, neurology and cancers of the breast, skin, lungs, thyroid, stomach and mouth.
Letters, preprints, scientific reports, and narrative reviews were included while studies based on animals or non-human samples and ones which presented duplicate data were excluded.
The paper found that deep learning algorithms can correctly detect diseases in 87% of cases, compared to 86% achieved by healthcare professionals.
“To our knowledge, this is the first systematic review and meta-analysis on the diagnostic accuracy of healthcare professionals versus deep learning algorithms using medical imaging,” Dr Liu wrote.
“After careful selection of studies with transparent reporting of diagnostic performance and validation of the algorithm in an out-of-sample population, we found deep learning algorithms to have equivalent sensitivity and specificity to healthcare professionals.”
But there were also some deficiencies in comparing the two.
“Most studies took the approach of assessing deep learning diagnostic accuracy in isolation, in a way that does not reflect clinical practice,” Dr Liu wrote.
“Many studies were excluded at screening because they did not provide comparisons with healthcare professionals – ie human vs machine – and very few of the included studies reported comparisons with healthcare professionals using the same test dataset.”
Only four studies provided healthcare professionals with additional clinical information, as they would have in clinical practice; one study also tested the scenario in which prior or historical imaging was provided to the algorithm and the healthcare professional; and four studies also considered diagnostic performance in an algorithm-plus-clinician scenario.
Artificial intelligence technology has already started to transform daily life through applications such as photo captioning, speech recognition, natural language translation, robotics, and advances in self-driving cars.
Many people anticipate similar success in the health industry, particularly in diagnostics.
Some have even suggested that AI and deep learning applications will replace whole medical disciplines or create new roles for doctors to fulfil, such as “information specialists”.
But others, such as Assistant Professor of Radiology at the University of Pennsylvania Tessa Cook, are cautious of deep learning’s perceived superiority over humans.
Ms Cook resides at the Perelman School of Medicine and said the findings of Dr Liu’s paper require further investigation before being deemed conclusive.
“With increasing hype of the potential of AI in medicine, [the review’s results] could be misconstrued as machine diagnosis being better than human diagnosis: why have a human doctor when a digital one would be just as good, maybe better?,” she said.
“Given the extensive discussion surrounding the limitations of the review, claiming equivalence or superiority of AI over humans could be premature – perhaps the better conclusion is that, in the narrow public body of work comparing AI with human physicians, AI is no worse than humans, but the data is sparse and it might be too soon to tell.”
Speed and accuracy
Dr Liu’s paper acknowledged the role of healthcare professionals in disease diagnosis, as well as the “enormous potential” of deep learning algorithms in improving the speed and accuracy of the process.
“It is important to note AI did not substantially outperform human diagnosis,” she wrote.
“From this exploratory meta-analysis, we cautiously state that the accuracy of deep learning algorithms is equivalent to healthcare professionals, while acknowledging that more studies considering the integration of such algorithms in real-world settings are needed.”
In the US, more than 30 AI algorithms for healthcare have been approved to date by the Food and Drug Administration.
Concerns have been raised about whether study designs are biased in favour of machine learning, and the degree to which the findings are applicable to real-world clinical practice.