Application of computer-aided diagnosis in breast ultrasound interpretation: improvements in diagnostic performance according to reader experience
Article information
Abstract
Purpose
The purpose of this study was to evaluate the usefulness of applying computer-aided diagnosis (CAD) to breast ultrasound (US), depending on the reader's experience with breast imaging.
Methods
Between October 2015 and January 2016, two experienced readers obtained and analyzed the grayscale US images of 200 cases according to the Breast Imaging Reporting and Data System (BI-RADS) lexicon and categories. They additionally applied CAD (S-Detect) to analyze the lesions and made a diagnostic decision subjectively, based on grayscale US with CAD. For the same cases, two inexperienced readers analyzed the grayscale US images using the BI-RADS lexicon and categories, added CAD, and came to a subjective diagnostic conclusion. We then compared the diagnostic performance depending on the reader's experience with breast imaging.
Results
The sensitivity values for the experienced readers, inexperienced readers, and CAD (for experienced and inexperienced readers) were 91.7%, 75.0%, 75.0%, and 66.7%, respectively. The specificity values for the experienced readers, inexperienced readers, and CAD (for experienced and inexperienced readers) were 76.6%, 71.8%, 78.2%, and 76.1%, respectively. When diagnoses were made subjectively in combination with CAD, the specificity significantly improved (76.6% to 80.3%) without a change in the sensitivity (91.7%) in the experienced readers. After subjective combination with CAD, both of the sensitivity and specificity improved in the inexperienced readers (75.0% to 83.3% and 71.8% to 77.1%). In addition, the area under the curve improved for both the experienced and inexperienced readers (0.84 to 0.86 and 0.73 to 0.80) after the addition of CAD.
Conclusion
CAD is more useful for less experienced readers. Combining CAD with breast US led to improved specificity for both experienced and inexperienced readers.
Introduction
Breast cancer is the most common malignancy in women, and the second leading cause of cancer-related mortality worldwide [1,2]. It is important to detect cancer early to reduce the mortality rate [3,4], and this requires accurate and reliable diagnoses [5]. In clinical practice, breast ultrasound (US) is an important modality for detecting breast cancer together with mammography [6]. Compared with mammography, breast US is easily available, does not involve radiation, and inexpensive; moreover, it has a superior ability to image dense breast tissue, and it allows serial biopsy. However, the main limitation is operator-dependence [7,8]. Many studies have applied computer-aided diagnosis (CAD) to breast US to demonstrate the efficiency of CAD systems and to evaluate the usefulness of CAD for improving diagnostic accuracy [9-15].
S-Detect (Samsung Medison Co. Ltd., Seoul, Korea) is a recently developed CAD system for breast US that provides assistance in the morphological analysis based on the Breast Imaging Reporting and Data System (BI-RADS) lexicon and final assessment [8]. A computer-based analysis based on the morphologic features of S-Detect may be very useful for improving the diagnostic performance of breast US [16]. S-Detect may be used as an additional diagnostic tool to improve the specificity of breast US in clinical practice, and as a guide in decision-making for breast masses detected on US for dedicated breast radiologists [8,17]. Moreover, S-Detect is known as a clinically feasible diagnostic tool with a moderate degree of agreement in the final assessments, regardless of the experience of the radiologists specializing in breast imaging [17]. To date, no study has evaluated whether the usefulness of S-Detect in conjunction with breast US depends on experience with breast imaging by comparing experienced and inexperienced readers.
The purpose of this study was to evaluate the role of CAD in breast US and the usefulness of combining CAD with breast US, depending on the operator’s experience with breast imaging.
Materials and Methods
This prospective study received approval from the ethics committee of our institution, and agreement from all patients was obtained in this study.
Study Population
This study was performed between October 2015 and January 2016. We enrolled patients who were planning to undergo ultrasonography for screening or for diagnostic purposes at our institution (a tertiary university hospital).
All suspicious or probably benign breast lesions were analyzed according to the BI-RADS lexicon and categories, meaning that lesions in BI-RADS categories 3, 4, and 5 were included. If one woman had multiple lesions, the most suspicious lesion or the largest lesion was included. Typical and multiple BI-RADS category 2 lesions were not included in this study because they are difficult to confirm or follow-up.
Two hundred cases were enrolled in this study. Table 1 describes the characteristics of the patients and lesions. The mean age of the participants in this study was 49.5±11.8 years old (range, 21 to 77 years old). This study included 81 patients (40.5%) who received screening US, 17 patients (8.5%) who received diagnostic US, 50 patients (25.0%) who received US for postoperative surveillance or screening, and 52 patients (26.0%) who received US for follow-up of a probably benign lesion. Among the 200 patients, there were 23 patients with palpable lesions, of whom 17 received diagnostic US, two received postoperative screening, and four underwent US imaging for follow-up of probably benign lesions. Patients with a personal history included 50 patients with a history of breast cancer surgery and 14 patients with a history of excision for a borderline lesion. There were no patients with a family history or a confirmed BRCA mutation. Of the patients, 102 (51.0%) were premenopausal and 98 (49.0%) were postmenopausal.
US and CAD System
We used US (Samsung Ultrasound RS80A, Samsung Medison Co. Ltd.) in conjunction with current routinely performed breast grayscale US, as well as the new technology of CAD (S-Detect).
When we identified the center of the breast lesion by touching the screen, a region of interest (ROI) was drawn along the border of the mass automatically by the US system. The US features of the lesion were analyzed according to the BI-RADS lexicon and the final assessment classifications were automatically performed by the US system. In this system (S-Detect), the final assessment classification was divided into “possibly benign” or “possibly malignant” (Fig. 1).
Analysis by Radiologists
Two experienced readers (breast radiologists with 5 years of breast imaging experience) consecutively performed general whole-breast grayscale US for screening and diagnostic purposes using a US machine with CAD (S-Detect). When using this US machine, they first evaluated lesions without other information, such as mammography and previous US findings. When there was no lesion or a typical benign lesion such as a cyst, they excluded the scan from this study and analyzed it according to the BI-RADS lexicon and categories as usual. When there was a suspicious or probably benign lesion, they analyzed it using the BI-RADS lexicon and categories on grayscale US and added CAD (S-Detect) simultaneously. As with the grayscale US, the representative image was analyzed after two or more areas were identified as ROIs through the touchscreen in CAD. If there was a meaningful difference among two or more CAD images, the reader selected the most appropriate CAD image and analyzed it. They then made a diagnostic decision subjectively, based on grayscale US with CAD (Fig. 1).
In the same cases, one of two inexperienced readers (first-year residents with 1 week of training in breast imaging) evaluated them using targeted grayscale US. An inexperienced reader scanned the lesions detected by an experienced reader in real time, chose the proper BI-RADS lexicon and categories on grayscale US, and added CAD by themselves. They then made a diagnostic decision subjectively, based on the grayscale US with CAD. The inexperienced readers only diagnosed the lesions detected by experienced readers.
Pairs of one experienced breast radiologist and one first-year resident per patient were randomly matched depending on the US schedule.
Next, an uninvolved expert radiologist with 17 years of breast imaging experience arrived at a conclusive diagnosis, incorporating information from mammography and old US findings. If the image had been lost or discordant findings were found in comparison with mammography and old US findings, the expert radiologist recalled the patient and obtained a new scan for use in clinical practice.
Statistics
We compared the diagnostic performance of grayscale US, CAD, and grayscale US with CAD (subjective, conjunctive, and disjunctive) between experienced and inexperienced readers. The confirmatory diagnosis was defined as a diagnosis made on the basis of histopathology, no change on the 2-year follow-up image, or a typical benign involuting fibroadenoma or fatty lesion on mammography.
The area under the curve (AUC), sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy were analyzed.
We analyzed the results of subjective, conjunctive, and disjunctive combinations of grayscale US with CAD. For the subjective combination, experienced and inexperienced readers made a diagnostic decision subjectively, based on the grayscale US with CAD. For the conjunctive combination, a finding of “not suspicious” on both the grayscale US (category 3) and CAD (possibly benign) was defined as negative, while a finding of “suspicious” on either the grayscale US (category 4 or above) or CAD (possibly malignant) was defined as a positive image. For the disjunctive combination, a finding of “not suspicious” on either the grayscale US or CAD was defined as negative, while a finding of “suspicious” on the both the grayscale US and CAD was defined as positive.
The combinations of CAD and grayscale US diagnostic parameters were compared using the McNemar test (sensitivity and specificity) or the generalized score statistic (PPV and NPV) for matched data. We used the Hanley and McNeil method to analyze the differences between pairs of AUCs and the chi-square test for overall percent agreement (accuracy) differences.
The degree of agreement between experienced and inexperienced readers was analyzed using kappa statistics. Agreement was categorized as poor (≤0.2), fair (0.21-0.4), moderate (0.41-0.6), good (0.61-0.8), or very good (0.81-1) [18]. We used the weighted least squares approach for comparing correlated kappa values.
Findings of “not suspicious” on both the grayscale US (category 3) and CAD (possibly benign) and “suspicious” on both the grayscale US (category 4 or higher) and CAD (possibly malignant) were defined as consistent results.
All statistical analyses were performed using the software package SAS Enterprise Guide 4 (SAS Institute, Inc., Cary, NC, USA). A P-value of <0.05 was considered to indicate statistical significance.
Results
Characteristics of the Participants and Lesions
Among the 200 breast lesions, 12 (6.0%) were pathologically confirmed as malignant after a core-needle biopsy upcoming surgery: seven were invasive ductal carcinoma, three were ductal carcinoma in situ, and two were mucinous carcinoma. The other 188 lesions were confirmed as benign, based on a histological diagnosis as benign after a core-needle biopsy (n=128) or remaining stable for more than 2 years (n=60). The mean size of the lesions was 1.2±0.8 cm (range, 0.2 to 4.4 cm) (Table 1).
Performance of Radiologists
The diagnostic performance of the experienced and inexperienced readers is summarized in Table 2. Although the sensitivity of CAD was lower than that of both the experienced and inexperienced readers, its specificity was higher than that of the experienced and inexperienced readers (Fig. 2).
The specificity significantly improved when CAD was subjectively and disjunctively combined by the experienced readers (76.6% to 80.3% and 88.8%, respectively) and the inexperienced readers (71.8% to 77.1% and 85.1%, respectively) (all P<0.05).
Accuracy also significantly improved when CAD was disjunctively combined by the experienced readers (77.0% to 88.0%, P=0.006) and the inexperienced readers (72.0% to 83.5%, P=0.006). The subjective and conjunctive combinations showed the highest sensitivity. The disjunctive combination showed the highest specificity and accuracy. In addition, the area under the curve improved for both the experienced and inexperienced readers (0.84 to 0.86 and 0.73 to 0.8) after the addition of CAD.
The subjective combination with CAD was found to improve the kappa value (k) of grayscale US between inexperienced and experienced readers from fair (k=0.337) to moderate (k=0.457) agreement (P=0.016). In 76.5% of the cases (153 of 200), the results of grayscale US and CAD were consistent for both experienced and inexperienced readers.
When the category was different between the inexperienced readers’ judgment and CAD, in 19 of the 47 cases (40.4%), the inexperienced readers preferred the CAD conclusion, and the lesions were ultimately confirmed to be fibrocystic changes (13), fibroadenomas (2), hamartoma (1), postoperative fibrosis (1), intraductal papilloma (1), and mucinous cancer (1) (Fig. 3). Experienced readers preferred the CAD conclusion in seven of 47 cases (14.9%), which were ultimately confirmed to be fibrocystic changes (4), fibrosis (1), intraductal papilloma (1), and fibroadenoma (1) (Table 3).
Discussion
In this study, the accuracy and AUC with subjectively combined CAD led to improvements for both the inexperienced readers (accuracy, 72.0% to 77.5%; AUC, 0.73 to 0.80) and the experienced readers (accuracy, 77.0% to 81.0%; AUC, 0.84 to 0.86). Moreover, the kappa value after a subjective combination with CAD between inexperienced and experienced readers significantly improved in comparison with the kappa value of grayscale US between inexperienced and experienced readers (P=0.016). According to Sahiner et al. [19], the accuracy of radiologists improved by using a CAD system, with the AUC increasing from 0.83 to 0.90 (P=0.006).
The specificity significantly improved when CAD was subjectively and disjunctively combined by the experienced readers (76.6% to 80.3% and 88.8%, respectively) and the inexperienced readers (71.8% to 77.1% and 85.1%, respectively) (all P<0.05). Some previously published studies have emphasized improvements in the specificity and accuracy of CAD, similar to the results of our study. Wang et al. [20] evaluated the effect of CAD for eight radiologists with different levels of experience, and the specificity improved by using CAD in both the senior group (67.1% to 76.5%) and the junior group (58.8% to 64.7%) [20]. In the study by Dromain et al. [21], the improved specificity of CAD allowed a reduction of up to 53% in unnecessary biopsies.
Subjectively combined CAD led to improved sensitivity (75.0% to 83.3%) in the inexperienced readers, but there was no change in sensitivity in the experienced readers (91.7% to 91.7%). This means that combined CAD can improve the sensitivity of the results reported by radiologists, especially less experienced radiologists. In some previous studies, the sensitivity of the US CAD system was reported to be high (between 88.9% and 100%) [6]. However, in our investigation, the sensitivity of CAD was lower (between 66.7% and 75.0%) than has been reported in other studies, although the sensitivity of combined CAD was higher (between 83.3% and 91.7%). This is likely because the cutoff for dichotomization of the final CAD assessment categories was set at BI-RADS category 4B [8]. The final assessment from S-Detect was divided into possibly benign and possibly malignant, and when the cutoff value was set to category 4B, the specificity increased, but the sensitivity decreased relative to category 4A [8]. However, we subdivided category 4 into categories 4A, 4B, and 4C according to the BI-RADS criteria, and the cutoff value was set to category 4A in grayscale US; consequently, the sensitivity was high, and the specificity was relatively low.
When the category differed between the radiologists’ assessments and CAD, the inexperienced readers chose the CAD result more often than the experienced readers did (19 cases [40.4%] for the inexperienced readers and seven cases [14.9%] for the experienced readers). This means that the less experienced radiologists relied more on the CAD results. This relates to the fact that using the CAD system led to improvements in sensitivity, specificity, and accuracy in the inexperienced readers. As such, CAD assistance can improve diagnostic performance and can be expected to play a role in providing a second opinion, especially for less experienced radiologists. Consequently, it can reduce misdiagnosis by less experienced radiologists, reduce the variability in radiologists’ interpretations, and overcome the effect of experience. These improvements in diagnostic performance by combining CAD and US may reduce unnecessary breast biopsies and the medical costs borne by patients.
There are certain limitations to this investigation. First, the data were derived from a small number of patients. This was a prospective study, and two examiners, an experienced reader and an inexperienced reader, evaluated each lesion, using both grayscale US and CAD. However, we analyzed real-time grayscale US and CAD data from both readers for the same lesion, which is a major advantage of this study. Second, when CAD is used to analyze any lesion, the radiologist must identify the center of the breast lesion that he or she scans, which can differ depending on the experience of the radiologist. Third, this study did not contain calcifications or non-mass lesions in the analysis because they were rarely detected on US during the study period. Fourth, this study did not contain typical benign lesions such as cysts due to the difficulty of confirmation and follow-up. These aspects of the present study may differ from how CAD is applied in clinical practice.
In this study, the newly developed CAD system (S-Detect) was useful for improving sensitivity and specificity, especially for less experienced radiologists. When CAD was combined with US, the specificity and accuracy improved for all radiologists. This improvement in the specificity with the use of combined CAD may reduce the number of unnecessary breast biopsies and the medical costs borne by patients.
Notes
No potential conflict of interest relevant to this article was reported.
Acknowledgements
This research was supported by a grant from the Korean Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HI15C0833).