The purpose of this study was to assess the reliability of automated breast ultrasound (ABUS) examinations of suspicious breast masses in comparison to handheld breast ultrasound (HHUS) with regard to Breast Imaging Reporting and Data System (BI-RADS) category assessment, and to investigate the factors affecting discrepancies in categorization.
A total of 135 masses that were assessed as BI-RADS categories 4 and 5 on ABUS that underwent ultrasound (US)-guided core needle biopsy from May 2017 to December 2017 were included in this study. The BI-RADS categories were re-assessed using HHUS. Agreement of the BI-RADS categories was evaluated using kappa statistics, and the positive predictive value of each examination was calculated. Logistic regression analysis was performed to identify the mammography and US findings associated with discrepancies in the BI-RADS categorization.
The overall agreement between ABUS and HHUS in all cases was good (79.3%, kappa=0.61, P<0.001). Logistic regression analysis revealed that accompanying suspicious microcalcifications on mammography (odds ratio [OR], 4.63; 95% confidence interval [CI], 1.83 to 11.71; P=0.001) and an irregular shape on US (OR, 5.59; 95% CI, 1.43 to 21.83; P=0.013) were associated with discrepancies in the BI-RADS categorization.
The agreement between ABUS and HHUS examinations in the BI-RADS categorization of suspicious breast masses was good. The presence of suspicious microcalcifications on mammography and an irregular shape on US were factors associated with ABUS yielding a lower level of suspicion than HHUS in terms of the BI-RADS category assessment.
Early detection of breast malignancies significantly improves their clinical outcomes . Extensive efforts have been made to facilitate the early diagnosis of breast cancer via breast cancer screening. Screening mammography is the method of choice, with a reported sensitivity of 85%, and has been proven to reduce the mortality rate of breast cancer . However, the benefits of screening mammography are limited in women with dense breasts, because greater breast parenchymal density reduces the sensitivity of lesion detection to 30% [3,4]. The value of ultrasound (US) has received attention based on large prospective study results demonstrating that adding whole-breast handheld ultrasound (HHUS) screening to mammography significantly increased the sensitivity of cancer detection [5,6]. However, HHUS has a few drawbacks that limit its use in a screening setting: it is operator-dependent, not reproducible, and requires considerable operator time and skill [7,8]. To overcome these drawbacks, automated breast ultrasound (ABUS) has recently been introduced. ABUS provides reproducible, high-resolution images and does not depend on the operator, since it is performed using an automated scanner with a larger field of view. Several prospective studies have reported that supplementing mammography with ABUS screening resulted in similar positive outcomes to those associated with HHUS screening, such as increased detection of invasive cancer and reduced rates of interval cancer [9,10]. Additionally, despite the technical differences between ABUS and HHUS, previous studies found no significant differences between these techniques in diagnostic performance for breast cancer or in terms of intertechnique and inter-reader agreement [7,11-16]. ABUS detected small benign masses less frequently, but malignant masses were rarely missed . Therefore, the overall diagnostic performance of ABUS was reported to be comparable to that of HHUS [12,15]. However, it has been reported that ABUS is inferior to HHUS in the detection of masses with an irregular shape or non-circumscribed margin, as well as those belonging to Breast Imaging Reporting and Data System (BI-RADS) category 4 or 5 .
However, given that the introduction of ABUS was relatively recent, little is known regarding the clinical application of ABUS assessments of BI-RADS categories solely for solid breast masses, and whether such results are in concordance with those of HHUS. In clinical practice, a precise categorization of suspicious breast masses provides important information for the interpretation of biopsy results in accordance with imaging findings. Discordance in BI-RADS categorization depending on the type of US used could significantly affect the quality of patient care, and potentially lead to delayed cancer diagnoses or unnecessary procedures. In addition, the appropriate use of assessment categories often affects the validity of diagnostic performance audits.
Therefore, we aim to provide data that shed light on this possible issue by comparing diagnostic results obtained using ABUS and HHUS. The purpose of this study was to assess the reliability of ABUS examinations for suspicious breast masses by comparing the resulting BI-RADS category assessments to those made through HHUS examinations, and to investigate the factors affecting discrepancies in categorization.
Materials and Methods
This retrospective study was approved by the Institutional Review Board. The requirement for informed consent was waived. In our hospital, the breast US protocol had been changed from HHUS to ABUS. All patients were treated with ABUS unless there was a technical problem. A total of 189 masses in 185 patients who underwent ABUS and HHUS at a single large tertiary medical center from May 2017 to December 2017 were retrospectively reviewed. Then, 147 masses in 147 patients that were assessed as BI-RADS categories 4A, 4B, 4C, or 5 and underwent subsequent US-guided core needle biopsy were included in this study. In order to evaluate both mammography and US, we excluded 12 cases in 12 patients who did not also receive mammography. Patients who did not undergo HHUS imaging before a subsequent treatment, such as no further biopsy (n=32), excisional biopsy (n=6), and stereotactic biopsy (n=1), were excluded. Poor-quality ABUS images due to technical problems (n=3) were also excluded. Finally, 135 masses in 135 patients were included in this study.
Imaging Technique and Interpretation
All ABUS exams were performed using the same ABUS system (Invenia ABUS, Automated Breast Ultrasound System, GE Healthcare, Sunnyvale, CA, USA). The patients were placed in the supine position with a sponge beneath the shoulders to evenly spread the breast tissue during the examination. Three items of volume data were obtained from each breast: the anteroposterior volume, which covered the central part of the breast; the medial volume, which covered the inner and inferior parts; and the lateral volume, which covered the upper and outer parts. The nipple marker was placed according to each patient’s anatomy. In patients with larger breasts, additional views were taken to cover the entire breast tissue. The ABUS examinations were performed by two radiology technologists with extensive US training. The volume images were automatically transferred to a dedicated workstation. During the interpretation, multiplanar images in three different planes (axial, sagittal, and coronal) were used. A slice thickness of 0.5 mm was used to acquire volume data. Two breast radiologists (S.M.K. and M.J.) with 12 and 16 years’ experience in breast imaging, respectively, analyzed the 3D ABUS data on the dedicated ABUS workstation and reported the BI-RADS category assessment .
For suspicious breast masses that were assessed as BI-RADS categories 4 and 5 by ABUS, HHUS was performed prior to US-guided core biopsy. HHUS images were acquired using a linear transducer at 7-15 MHz bandwidth (iU22 Ultrasound System, Phillips Ultrasound, Bothel, WA, USA). All HHUS examinations were performed by two breast radiologists (S.M.K. and M.J.) with 16 and 12 years’ experience in breast imaging, respectively. Findings using the BI-RADS lexicon (i.e., image findings of shape, margin, echogenicity, posterior echogenicity, calcification, orientation, and size of the mass on HHUS) were recorded in the database during the biopsy. Because only suspicious lesions above category 4A were compared, the lesions were not judged as probably benign or benign.
Image interpretation and clinicopathology data
For a comparative analysis, the BI-RADS category as determined by HHUS was re-assessed by two double-blinded radiologists (G.Y. and M.J.) with 4 and 12 years’ experience, respectively. The BI-RADS category assessment based on the HHUS examination was based on the static images taken during the biopsy. The radiologists were blinded to the Doppler and elastography images from the HHUS examination during the evaluation. If the mass required biopsy, it was judged to be category 4 or higher. When mammography was available at the time of interpretation, mammography findings were composited during the BI-RADS category assessment using both ABUS and HHUS. Patients’ medical records were reviewed for information including age, symptoms, mammography findings and density, final BI-RADS assessment categories, and the pathological results of the core needle biopsy or surgical excision. The lesion was categorized as either benign or malignant according to the core needle biopsy result. In cases of core needle biopsy followed by excision, the final result of excision was used as the reference.
Outcomes and Statistical Analysis
The BI-RADS categories as determined by ABUS and HHUS were cross-tabulated. Kappa statistics were used to analyze the the agreement in BI-RADS grading between ABUS and HHUS. Kappa values of <0.20 were considered to indicate slight agreement; 0.21-0.40 fair agreement; 0.41-0.60 moderate agreement; 0.61-0.80 substantial agreement; and 0.81-1.00 excellent agreement . All analyses were performed once for all cases and once for only cases of biopsy-proven malignancy. In order to compare the performance of ABUS and HHUS, positive predictive values (PPVs) were calculated using the pathological results as a reference standard and compared using the McNemar test. For the statistical evaluation of discrepancies in categorization, the exact and Monte Carlo symmetry tests were performed. Logistic regression was performed to test whether mammography and US imaging findings were associated with discordance in BI-RADS categorization between ABUS and HHUS. All statistical analyses were performed using statistical software (STATA 14.1, Stata, College Station, TX, USA). P-values of <0.05 were considered to indicate statistical significance.
This study analyzed 135 breast lesions in 135 patients (median age, 49 years; range, 35 to 82 years) who underwent both ABUS and HHUS followed by biopsy. Fifty-eight patients (43.0%) had symptoms (lump or nipple discharge), whereas 77 patients (57.0%) presented without clinical symptoms. The pathological analysis revealed 49 (36.3%) malignant lesions and 86 (63.7%) benign lesions. The average size of the masses was 18.6 mm (range, 4 to 66 mm). The characteristics of the study group are summarized in Table 1.
The overall agreement between ABUS and HHUS in all cases was 79.3% (kappa=0.61; P<0.001), while the agreement in confirmed malignancies was 55.1% (kappa=0.39, P<0.001). Table 2 presents the cross-tabulated data in detail. There were a total of 28 (20.7%) discrepancies in all cases and 22 (44.9%) in the confirmed malignancies. Among all discrepancies, there were 22 cases (78.6%) where a lower BI-RADS assessment category was assigned using ABUS than using HHUS. Among discrepancies in cases of malignancies, there were 16 (72.7%) cases where a lower BI-RADS assessment category was assigned using ABUS than using HHUS (Fig. 1).
The PPV of BI-RADS category 4A lesions based on ABUS and HHUS was 15 of 95 (15.79%) and nine of 83 (10.84%), respectively. The PPV of BI-RADS category 4B lesions based on ABUS and HHUS was nine of 15 (60%) and seven of 18 (38.89%), respectively. The PPV of BI-RADS category 4C lesions based on ABUS and HHUS was eight of eight (100%) and 15 of 16 (93.75%), respectively. For BI-RADS category 5 lesions based on ABUS and HHUS, the PPVs were both 100%. The PPVs for each category did not differ significantly between ABUS and HHUS. In the logistic regression analysis of the likelihood of a lower BI-RADS category being assigned based on ABUS than based on HHUS, microcalcifications on mammography (odds ratio [OR], 4.63; 95% confidence interval [CI], 1.83 to 11.71; P=0.001) and an irregular shape on US were found to be statistically significant predictors (OR, 5.59; 95% CI, 1.43 to 21.83; P=0.013) (Table 3).
In this study, the overall agreement in BI-RADS categorization between ABUS and HHUS was good. The presence of microcalcifications on mammography and an irregular shape on US were associated with a lower BI-RADS category being assigned based on ABUS than based on HHUS. Our study documented good overall interobserver agreement in BI-RADS categorization between ABUS and HHUS. In accordance with this result, a recent prospective study involving 1,886 patients reported excellent overall agreement (99.8%) between HHUS and ABUS, with a kappa value of 0.994 . In addition, Kim et al.  reported good interobserver agreement of HHUS and ABUS with regard to orientation, echogenicity, margin, shape, and BI-RADS categorization. On the contrary, the interobserver agreement was significantly lower in confirmed malignancies, for which a fair kappa value was obtained. Further studies are warranted to investigate this issue. Interestingly, the majority of the discordant cases involved a lower grading of the BI-RADS category by ABUS than by HHUS. Several studies have suggested that the mass size and the surrounding tissue change might affect the interpretation of malignant lesions on both types of examinations [15-17,22]. Our results demonstrated that US findings of an irregular shape on HHUS were associated with a lower BI-RADS category being assigned by ABUS. A well-known limitation of ABUS is posterior shadowing, which is related to the recall or false-negative rate. In addition to the findings for lesion size, no association was shown in our study group between margin, orientation, or posterior acoustic features and a lower categorization using ABUS. Although the underlying reason remains unclear, we speculate that the inherent differences between ABUS and HHUS might have affected these results. During HHUS image acquisition, it is difficult to adjust the probe orientation, degree of compression, and machine settings. Therefore, the general difference in the pressure while scanning might have affected the interpretation of suspicious findings in the US lexicon. Additionally, the presence of microcalcifications on mammography was associated with a lower BI-RADS category being assigned by ABUS. Previous studies have reported that neither HHUS nor ABUS could provide additional information regarding ductal carcinoma in situ or invasive ductal carcinoma presenting as microcalcifications [14,16]. In this retrospective review of cases exhibiting microcalcifications on mammography, uncertainty regarding the locational correlation between mammography and US findings and the conservative assessment of microcalcifications before magnification might have affected the BI-RADS grading.
To our knowledge, this study is the first to compare the reliability of suspicion of malignancy between HHUS and ABUS in suspicious breast masses. Both ABUS and HHUS are US modalities that provide relatively fast, radiation-free images. Nonetheless, the interpretation of the images acquired using these two different imaging techniques can vary depending on the reader’s experience. An increasing number of studies have attempted to validate the reliability of ABUS, and have reported comparable overall results between ABUS and HHUS [16,22]. As the role of ABUS as a breast cancer screening tool adjunctive to mammography is becoming recognized by multicenter clinical studies [9,10], it is imperative for radiologists to understand the characterization of positive findings on ABUS in order to provide suitable further recommendations. Furthermore, consideration should be given to potential issues regarding the differences in lesion characterization between ABUS and HHUS, including conspicuity. Due to the lack of an ABUS-guided biopsy technique, lesions detected on ABUS that are suspicious for malignancy should be scanned separately by HHUS for biopsy. This use of two different techniques can cause discrepancies with regard to the interpretation of breast lesions. In addition, after biopsy results are available, the interpretation of pathology and radiology discordance based on the two different US modalities could be complicated. Therefore, this study focused on comparing the reliability of suspicion of malignancy between ABUS and HHUS and on analyzing the findings associated with discordant cases.
Our study has several limitations. First, the enrolled study population was restricted to patients who underwent HHUS-guided biopsy after ABUS examinations, with a retrospective selection of cases. Thus, the HHUS interpretations might have been biased in that all cases were first assessed by ABUS and recommended for biopsy. Second, this study included a relatively small sample and might have only represented part of the general population. Third, our study used only static HHUS images, without elastography or Doppler images. In practice, including this additional information in the analysis of HHUS imaging findings is likely to improve the overall diagnostic performance . However, including only static 2-dimensional HHUS images reduced the amount of bias in the comparison of BI-RADS categories based on HHUS and ABUS.
In conclusion, BI-RADS categorization showed good agreement between ABUS and HHUS examinations in cases of suspicious breast masses. The presence of suspicious microcalcifications on mammography and an irregular shape on US were factors associated with a lower BI-RADS category being assigned by ABUS than by HHUS. Improved awareness of microcalcifications, which were found to be associated with a lower BI-RADS category being assigned by ABUS, might be beneficial for radiologists by promoting a better understanding of suspicion of malignancy based on imaging findings, with implications for further clinical decision-making.
Conceptualization: Jang M, Yun G. Data acquisition: Jang M, Yun G. Data analysis or interpretation: Jang M, Ahn HS. Drafting of the manuscript: Yun G. Critical revision of the manuscript: Kim SM, Yun BL. Approval of the final version of the manuscript: all authors.
1. Etzioni R, Urban N, Ramsey S, McIntosh M, Schwartz S, Reid B, et al. The case for early detection. Nat Rev Cancer 2003;3:243–252.
2. Shapiro S, Venet W, Strax P, Venet L, Roeser R. Ten- to fourteen-year effect of screening on breast cancer mortality. J Natl Cancer Inst 1982;69:349–355.
3. Pisano ED, Gatsonis C, Hendrick E, Yaffe M, Baum JK, Acharyya S, et al. Diagnostic performance of digital versus film mammography for breast-cancer screening. N Engl J Med 2005;353:1773–1783.
4. Kolb TM, Lichy J, Newhouse JH. Comparison of the performance of screening mammography, physical examination, and breast US and evaluation of factors that influence them: an analysis of 27,825 patient evaluations. Radiology 2002;225:165–175.
5. Berg WA, Zhang Z, Lehrer D, Jong RA, Pisano ED, Barr RG, et al. Detection of breast cancer with addition of annual screening ultrasound or a single screening MRI to mammography in women with elevated breast cancer risk. JAMA 2012;307:1394–1404.
6. Ohuchi N, Suzuki A, Sobue T, Kawai M, Yamamoto S, Zheng YF, et al. Sensitivity and specificity of mammography and adjunctive ultrasonography to screen for breast cancer in the Japan Strategic Anti-cancer Randomized Trial (J-START): a randomised controlled trial. Lancet 2016;387:341–348.
7. Schmachtenberg C, Fischer T, Hamm B, Bick U. Diagnostic Performance of Automated Breast Volume Scanning (ABVS) Compared to Handheld Ultrasonography With Breast MRI as the Gold Standard. Acad Radiol 2017;24:954–961.
8. Berg WA. Tailored supplemental screening for breast cancer: what now and what next? AJR Am J Roentgenol 2009;192:390–399.
9. Brem RF, Tabar L, Duffy SW, Inciardi MF, Guingrich JA, Hashimoto BE, et al. Assessing improvement in detection of breast cancer with three-dimensional automated breast US in women with dense breast tissue: the SomoInsight Study. Radiology 2015;274:663–673.
10. Wilczek B, Wilczek HE, Rasouliyan L, Leifland K. Adding 3D automated breast ultrasound to mammography screening in women with heterogeneously and extremely dense breasts: Report from a hospital-based, high-volume, single-center breast cancer screening program. Eur J Radiol 2016;85:1554–1563.
11. Kotsianos-Hermle D, Hiltawsky KM, Wirth S, Fischer T, Friese K, Reiser M. Analysis of 107 breast lesions with automated 3D ultrasound and comparison with mammography and manual ultrasound. Eur J Radiol 2009;71:109–115.
12. Wang HY, Jiang YX, Zhu QL, Zhang J, Dai Q, Liu H, et al. Differentiation of benign and malignant breast lesions: a comparison between automatically generated breast volume scans and handheld ultrasound examinations. Eur J Radiol 2012;81:3190–3200.
13. Barr RG, DeVita R, Destounis S, Manzoni F, De Silvestri A, Tinelli C. Agreement between an automated volume breast scanner and handheld ultrasound for diagnostic breast examinations. J Ultrasound Med 2017;36:2087–2092.
14. Hellgren R, Dickman P, Leifland K, Saracco A, Hall P, Celebioglu F. Comparison of handheld ultrasound and automated breast ultrasound in women recalled after mammography screening. Acta Radiol 2017;58:515–520.
15. Choi EJ, Choi H, Park EH, Song JS, Youk JH. Evaluation of an automated breast volume scanner according to the fifth edition of BI-RADS for breast ultrasound compared with hand-held ultrasound. Eur J Radiol 2018;99:138–145.
16. Vourtsis A, Kachulis A. The performance of 3D ABUS versus HHUS in the visualisation and BI-RADS characterisation of breast lesions in a large cohort of 1,886 women. Eur Radiol 2018;28:592–601.
17. Jeh SK, Kim SH, Choi JJ, Jung SS, Choe BJ, Park S, et al. Comparison of automated breast ultrasonography to handheld ultrasonography in detecting and diagnosing breast lesions. Acta Radiol 2016;57:162–169.
18. An YY, Kim SH, Kang BJ. The image quality and lesion characterization of breast using automated whole-breast ultrasound: a comparison with handheld ultrasound. Eur J Radiol 2015;84:1232–1235.
19. D’Orsi CJ, Sickles EA, Mendelson EB, Morris EA. ACR BI-RADS Atlas, Breast Imaging Reporting and Data System. 5th ed. Reston, VA: American College of Radiology, 2013.
20. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159–174.
21. Kim EJ, Kim SH, Kang BJ, Kim YJ. Interobserver agreement on the interpretation of automated whole breast ultrasonography. Ultrasonography 2014;33:252–258.
22. Chang JM, Moon WK, Cho N, Park JS, Kim SJ. Breast cancers initially detected by hand-held ultrasound: detection performance of radiologists using automated breast ultrasound data. Acta Radiol 2011;52:8–14.