Artificial intelligence models for the diagnosis and management of liver diseases
Article information
Abstract
With the development of more advanced methods for the diagnosis and treatment of diseases, the data required for medical care are becoming complex, and misinterpretation of information due to human error may result in serious consequences. Human error can be avoided with the support of artificial intelligence (AI). AI models trained with various medical data for diagnosis and management of liver diseases have been applied to hepatitis, fatty liver disease, liver cirrhosis, and liver cancer. Some of these models have been reported to outperform human experts in terms of performance, indicating their potential for supporting clinical practice given their high-speed output. This paper summarizes the recent advances in AI for liver disease and introduces the AI-aided diagnosis of liver tumors using B-mode ultrasonography.
Introduction
The introduction of artificial intelligence (AI) in the medical field has been indispensable for improving the efficiency of clinical tasks and medical care with limited human resources. In clinical practice, AI is expected to process various types of data appropriately and provide necessary information to medical professionals in a form that is easily comprehensible. In other words, the purpose of medical AI is to prevent misinterpretation and support the medical decision-making process. In the field of liver disease, various AI models have been reported for several tasks, such as prediction of the most likely diagnosis and prognosis and proposals for the most suitable therapeutic approach (Fig. 1). This review introduces AI models for the diagnosis and management of liver disease. In particular, the paper reviews AI models for the ultrasonographic (US) diagnosis of liver disease [1].
AI Models Using Laboratory Data and US Images for Liver Disease
To date, several scoring systems for liver disease have been reported for the assessment of liver function and condition, such as the Child-Pugh score and model for end-stage liver disease (MELD) score. Since they are associated with the stage of liver disease, these scores are applicable for predicting the prognosis and outcome of treatments. Additionally, attempts have been made to evaluate the performance of AI models compared to these conventional scores [2].
AI for the Management of Hepatitis, Steatosis, and Liver Cirrhosis
Using laboratory data, Ahmad et al. [3] published a model for classifying the stages of chronic hepatitis B. Another study presented an AI model using time-series laboratory data of chronic hepatitis C combined with machine learning; the results showed a good area under the receiver operating characteristic curve (AUROC) for predicting fibrosis [4-6]. Maiellaro et al. [7] developed an AI model for predicting the effect of interferon-ribavirin combination therapy using blood chemistry data of chronic hepatitis C cases. Interestingly, by applying 48,940 B-mode US images from chronic hepatitis C patients as training data in four machine learning models, Park et al. [8] reported an AI model that predicted treatment failure of direct-acting antiviral agents.
Several models have also been used to distinguish non-alcoholic steatohepatitis (NASH) from non-alcoholic fatty liver disease (NAFLD) [9]. Yip et al. [10] showed high performance of the NAFLD ridge score using blood chemistry data and a history of hypertension. In addition, some models aimed to diagnose NAFLD and NASH using large-scale medical records and lipid profiles [11-15]. Oh et al. [16] applied a random forest algorithm to analyze the gut microbiota of patients with cirrhosis associated with NAFLD, indicating that NAFLD-related cirrhosis could be identified by the microbiome. Another study reported AI for diagnosis of NAFLD using comprehensive profiling of lipids, cytokines, hormones, and metabolites [17,18].
AI can also be used to predict complications of liver cirrhosis. Marozas et al. [19] used US elastography and blood tests as training data for AI models and discriminated cases with pressure differences between the portal vein and hepatic vein. Hong et al. [20] reported a prediction model for esophageal varix emergence using the platelet count, splenomegaly, and portal vein diameter for a neural network. For patients with hepatitis C virus (HCV)–positive liver cirrhosis, AI trained with clinical data from 3,972 patients could predict the presence of esophageal varix with an accuracy of 68.9% [21]. Dong et al. [22] reported the EVendo score for the prediction of esophageal gastric varices through machine learning using blood data and ascites findings. Furthermore, as a prediction model for the prognosis of primary sclerosing cholangitis (PSC), the PSC risk estimate tool (PREsTo) was used, in which physical findings and blood chemistry data were applied as training data for gradient boosting. The PREsTo model exceeded the MELD and Mayo PSC scores for predicting liver failure [23].
Prediction of the Emergence and Management of Hepatocellular Carcinoma
For predicting the emergence of hepatocellular carcinoma (HCC) in patients with cirrhosis, the accuracy of the AI model based on the decision tree and random forest algorithms trained with time-series data from 442 patients exceeded that of conventional regression analysis [24]. A neural network-based AI model that predicted HCC emergence was reported through training time-series blood data for more than 3 years from 48,151 patients with HCV-positive cirrhosis; the results showed high performance even in patients who achieved a sustained viral response [25].
Prediction of the Outcomes of Acute Liver Failure and Liver Transplantation
For the prediction of outcomes in patients with acute liver failure, AI models based on the gradient boosting algorithm have been reported using clinical data from 527 cases, where the probability of death within 29 days after admission was presented [26]. There have also been many reports on the prediction of fibrosis of transplanted livers and outcomes after transplantation using clinical findings and laboratory data from donors and recipients [27,28]. Ayllon et al. [29] attempted to match donors and recipients using the probability of graft survival 3 months after transplantation, as indicated by a neural network. Another report presented an AI model for estimating the risk of death within 3 months among patients who were on a waiting list for brain-dead donor liver transplantation [30]. Nitski et al. [31] reported a model that predicts survival at 1 and 5 years after liver transplantation with high accuracy using AI based on transfer learning. Furthermore, AI models for predicting tumor recurrence after liver transplantation in HCC cases have also been reported [32,33].
AI for US Diagnosis of Liver Disease
The interpretation and diagnosis of a large number of medical images are a challenge for medical professionals; this task can be affected by human error, especially under strict time constraints. Many AI models using medical images, including US, have been reported for the diagnosis of liver disease, and some studies aimed to predict the outcome of diseases from imaging data.
AI Models for Liver Tumors
US is a noninvasive imaging modality commonly used for diagnosis of liver disease in many medical facilities. However, extensive skills and experience are required for an accurate diagnosis under time constraints. Therefore, the development of AI for US diagnosis is an attractive possibility. There have been many reports on the diagnosis of liver tumors by AI using B-mode US images; the early reports were mostly retrospective studies, with small sample sizes and unverified with an external cohort [1,34].
Hwang et al. [35] reported a model with 96% accuracy for the classification of cysts, hemangiomas, and malignant tumors. Hassan et al. [36] showed that the accuracy of four-class discrimination of AI for normal liver, cyst, hemangioma, and HCC was 97.2%. Another study reported an AI model with 95% accuracy when classifying four types of liver masses: cysts, hemangiomas, HCC, and metastatic cancers [37]. Schmauch et al. [38] reported a deep neural network-based AI by adjusting the B-mode image intensity using the US intensity of the abdominal wall to standardize the difference in image conditions. They reported an AUROC of 0.9 or higher in tumor detection and 5-class classification (HCC, metastatic tumor, hemangioma, cyst, focal nodular hyperplasia) [38].
Meanwhile, Zhang et al. [39] developed a model trained with a small number of contrast-enhanced ultrasound (CEUS) images using transfer learning of B-mode US. In addition, an AI model with an accuracy of 96.3% and AUROC of 0.994 was reported for benign and malignant tumor discrimination through combined training with B-mode US images and other information from medical records and blood data [40].
As a more practical model for detecting liver tumors, Yang et al. [41] created a convolutional neural network (CNN) model using 24,343 B-mode US images of liver tumors from 2,143 cases and validated the AI model’s performance with an independent test cohort. This model showed considerable accuracy in the diagnosis of malignant tumors, and comparable performance to that of contrast-enhanced computed tomography (CT) and magnetic resonance imaging (MRI) diagnoses by a radiologist. In that study, three models were created: an AI model trained with tumor US images, a model trained with tumor and background liver US images, and a model trained with patient backgrounds in addition to US images. The performance was particularly high when both US images and patient backgrounds were used for training. Interestingly, the ability to discriminate between benign and malignant tumors was analyzed by stratification according to tumor size, and the size of the tumor did not significantly affect the performance. In contrast, the model trained with US images and clinical data showed the best performance for discriminating benign and malignant tumors. It is notable that this AI model can output a color map in which the color tone of image pixels changes according to the degree that it contributes to the diagnosis [41].
Dadoun et al. [42] reported an AI model trained using 2,551 B-mode US images of liver tumors from 1,026 cases and verified them in an external test cohort with 155 images from 48 cases. Using a detection transformer (DETR), this AI model showed 97% sensitivity and 90% specificity for the identification of US images showing liver tumors, with a positive predictive value of 77% and sensitivity of 84% for analyzing the localization of tumors, with a mean intersection over union of 0.69. This performance was almost equivalent to that of the two experts, and the sensitivity and specificity for distinguishing malignant from benign lesions were 82% and 81%, respectively, which were higher than those of the experts. In that report, six types of tumors (HCC, metastatic liver cancer, hemangiomas, focal nodular hyperplasia, hepatic adenomas, and cysts) were classified with an overall accuracy of 76%. The performance of DETR met or exceeded those of two experts and CNN-based models for these tasks [42].
In addition to B-mode examinations, CEUS is applied for the diagnosis of liver tumors, and a multi-kernel AI learning model for discriminating malignant tumors from benign lesions has been reported [43]. Furthermore, another report discriminated atypical HCC from focal nodular hyperplasia through an AI system pretrained with CEUS images; this type of discrimination is difficult for humans [44].
AI Models for Diffuse Liver Lesions
Many reported AI models have focused on classifying the stage of fibrosis and steatosis for the diagnosis of diffuse liver lesions [1]. A model where B-mode US images were applied for training a neural network showed an accuracy of 97.3%, sensitivity of 96%, and specificity of 100% for discrimination of normal liver, fatty liver, and cirrhosis [45]. Another report showed an AI model for estimating the presence of liver cirrhosis, portal hypertension, and esophageal varices through training a support vector machine (SVM) with images of shear wave elastography (SWE) [46,47]. A study also reported the use of AI for discriminating the stage of liver fibrosis by applying Doppler US parameters as training data [48]. Wang et al. [49] established a deep learning model using 1,990 images of SWE from 398 hepatitis B virus (HBV)–positive patients as training data; its performance for discriminating the fibrosis stage was superior to those of 2D-SWE, the aspartate aminotransferase to platelet ratio index (APRI), and the fibrosis-4 (FIB-4) index. AI models for discriminating the fibrosis stage using multiple B-mode and CEUS parameters have been reported as well [50].
Development and Perspective of B-mode US Diagnosis Support Model for Focal Liver Lesions
To accumulate training data for B-mode US diagnosis by AI, the authors have constructed a US image database. Based on these datasets, the development of AI models that support the diagnosis of liver tumors in B-mode US has been initiated. In addition, with a view to social implementation of the developed AI, evaluation tests for its performance are being conducted.
An AI system has been developed to assist in the diagnosis of focal liver lesions using B-mode US (Fig. 2) [51]. A total of 83,569 images of four types of liver masses (HCC, metastatic cancer, hemangiomas, and cysts) were used for training the 19-layer CNN. In the evaluation of the 10-fold cross-validation, the differential accuracy for the four classes of diseases was 90.8%, and the sensitivities for diagnosing HCC, metastatic cancer, hemangiomas, and cysts were 73.8%, 61.7%, 89.8%, and 99.1%, respectively. The accuracy, sensitivity, and specificity of malignancy discrimination were 94.2%, 85.6%, and 96.2%, respectively. Prior to this, AI models using 7,754, 24,575, 57,145, and 70,950 images for training were created, and the accuracy of four-class and malignant tumor discrimination improved according to the amount of training data (Fig. 3) [51].
The diagnostic capabilities of AI and human experts were also compared in an independent test cohort of tumor video images using a 70,950-image learning model (Fig. 4). For AI, the accuracy of the four-disease classification was 89.1% and for the diagnosis of malignant tumors was 90.9%. Meanwhile, the median accuracy of the four-disease classification by human experts was 67.3% (range, 63.6% to 69.1%), and for the diagnosis of malignant tumors by human experts, it was 80.0% (range, 74.5% to 83.6%). In addition, the probability of AI making the correct diagnosis increases with a larger amount of training data, indicating that a highly reliable estimation can be acquired by increasing the amount of training data [51].
AI for Other Imaging Modalities
AI Models for CT and MRI
Generally, a further examination is required for a definitive diagnosis of the lesions detected on US, and CT and MRI are recommended for this purpose [52]. Yasaka et al. [53] reported that an CNN-based AI model pretrained with CT images of 460 liver tumors for the diagnosis of malignant tumors had an accuracy of 84% and an AUROC of 0.927. Nayak et al. [54] performed three-dimensional image segmentation of the liver region and HCC from CT images using deep learning and reported that the prediction accuracy for each cross-section was 86.9% and the prediction accuracy for each case was 80%. Hamm et al. [55] used 434 MRI images as training data for a CNN to create an AI-aided detection and diagnosis model for HCC, which outperformed human radiologists. Feng et al. [56] used gadolinium ethoxybenzyl diethlenetriamine pentaacetic acid (Gd-EOB-DTPA)-MRI images to preoperatively estimate microvascular invasion in HCC cases based on the pathological findings of resected specimens, which reportedly surpassed the diagnosis of the radiologist. For diffuse hepatic lesions, Choi et al. [57] reported a CNN model for diagnosing the fibrosis stage using 7,461 CT images as training data, showing superior accuracy compared to that of radiologists, APRI, and FIB-4 index. In addition, several SVM-based AI models predicting liver fibrosis and liver stiffness based on MRI have been reported [58,59].
AI for Pathological Diagnosis of Liver Lesions
AI is also expected to support the pathological diagnosis of whole-slide images (WSIs), which is required for a definitive diagnosis of lesions. A CNN model for classifying nuclear atypia of HCC on biopsy was reported [60]. Liao et al. [61] used WSIs of hematoxylin and eosin (H&E) staining and tissue microarray to perform an automatic diagnosis of HCC and predict gene mutations. They showed that CTNNB1 (β-catenin) gene mutations could be predicted from pathological specimens of H&E staining. Chen et al. [62] reported a neural network for benign and malignant tumor discrimination with an accuracy of 96.0% and classification of well, moderately, and poorly differentiated HCC with an accuracy of 86.9% through training on WSIs of HCC and non-cancerous liver. They showed that the performance of AI was equivalent to that of a pathologist with 5 years of experience. They also reported that gene mutations from H&E specimens could be predicted using AI.
AI for Predicting the Prognosis in HCC
Because the selection of the best treatment based on the estimated prognosis is critical for the management of HCC, the outcome of the treatment is sometimes rather unpredictable in each HCC case. Therefore, several AIs trained with image and omics data have been reported for predicting the prognosis after the treatment.
Regarding AI based on CEUS images, deep learning radiomics models have been reported, which may optimize the treatments for very early/early-stage HCC. The models identified that 17.3% of HCC patients who underwent radiofrequency ablation and 27.3% of patients who underwent surgical resection should swap their treatment, so that their average probability of 2-year progression-free survival would increase by 12% and 15%, respectively [63]. Another report presented an AI model that predicted the effect of transcatheter chemoembolization (TACE) by analyzing video images of contrast medium inflow into tumors on CEUS [64].
Using EOB-MRI images, Kim et al. [65] reported an AI model based on a random forest algorithm to predict postoperative recurrence-free survival in solitary HCC cases with diameters of 2 to 5 cm. For predicting the early recurrence of HCC, it has been reported that an AI model pretrained with tumor images of EOB-MRI, including up to 3 mm outside the tumor margin, showed the same performance as a model trained with pathological images of HCC. CNN-based AI models trained with CT images of HCC before treatment and a random forest-based AI trained with MRI images reportedly predicted tumor response to TACE [66,67].
However, Saillard et al. [68] reported a CNN model that predicts postoperative survival time by learning the WSI of resected specimens. Using WSI and omics data from HCC cases, Shi et al. [69] reported a model that presented a "tumor risk score," which was associated with survival. Survival prediction based on this score is superior to known conventional scores and associated with a specific tumor immune environment and genetic mutations. Chaudhary et al. [70] developed a deep neural network model and found two groups with different prognoses using RNA sequencing, microRNA sequencing, and methylation data from 360 patients with HCC.
Conclusion
A variety of AI models for the diagnosis of liver disease have been reported; some of them reportedly outperform human experts. However, the value of the output from AI could be more informative for beginners and non-experts because AI generally shows much higher performance than beginners and non-experts, compared to experts, especially in the field of imaging diagnosis [42,51]. Another study indicated that AI for diagnosing focal liver lesions in B-mode US shows the potential to assist less-experienced radiologists in improving their performance and lowering their dependence on sectional imaging in liver cancer diagnosis [41], although medical staff bear the final responsibility for medical decisions.
However, the use of AI may still be limited in the clinical setting; the performance of AI is completely dependent on the quantity and quality of training data. For example, regarding AI for image diagnosis, an imbalance in the distribution of diseases among the training images would lead to a biased output—that is, a preferential diagnosis of the disease most contained in the training data set [2,37]. Therefore, the performance of AI may fluctuate among the test cohorts, and physicians must be careful about the value of output by the AI until the results of AI studies are proven through clinical trials. In this context, it is becoming important to understand the advantages and disadvantages of medical AI in clinical settings.
Notes
Author Contributions
Conceptualization: Nishida N, Kudo M. Data acquisition: Nishida N. Data analysis or interpretation: Nishida N. Drafting of the manuscript: Nishida N. Critical revision of the manuscript: Nishida N, Kudo M. Approval of the final version of the manuscript: all authors.
MK received scholarship grants form GE Healthcare Japan.
Acknowledgements
AI development of the Japan Society of Ultrasonics in Medicine is carried out with the support of AMED (JP18lk1010030, JP20lk1010035), and the tumor discriminator is being developed under the initiative of Dr. Makoto Yamakawa and Professor Takeshi Shiina, Kyoto University Graduate School of Medicine, Department of Human Health Sciences. The tumor detector is being developed under the initiative of Professor Yoshito Mekada, Graduate School of Engineering, Chukyo University. We would also like to take this opportunity to thank the members of the Japan Society of Ultrasonics in Medicine for their cooperation in carrying out the research.
References
Article information Continued
Notes
Key point
Recent advances in artificial intelligence (AI) for liver disease are summarized, and the AI-aided diagnosis of liver tumors using B-mode ultrasonography is introduced. The value of the AI output could be especially informative for beginners and non-experts.