False-negative results on computer-aided detection software in preoperative automated breast ultrasonography of breast cancer patients
Article information
Abstract
Purpose
The purpose of this study was to measure the cancer detection rate of computer-aided detection (CAD) software in preoperative automated breast ultrasonography (ABUS) of breast cancer patients and to determine the characteristics associated with false-negative outcomes.
Methods
A total of 129 index lesions (median size, 1.7 cm; interquartile range, 1.2 to 2.4 cm) from 129 consecutive patients (mean age±standard deviation, 53.4±11.8 years) who underwent preoperative ABUS from December 2017 to February 2018 were assessed. An index lesion was defined as a breast cancer confirmed by ultrasonography (US)-guided core needle biopsy. The detection rate of the index lesions, positive predictive value (PPV), and false-positive rate (FPR) of the CAD software were measured. Subgroup analysis was performed to identify clinical and US findings associated with false-negative outcomes.
Results
The detection rate of the CAD software was 0.84 (109 of 129; 95% confidence interval, 0.77 to 0.90). The PPV and FPR were 0.41 (221 of 544; 95% CI, 0.36 to 0.45) and 0.45 (174 of 387; 95% CI, 0.40 to 0.50), respectively. False-negative outcomes were more frequent in asymptomatic patients (P<0.001) and were associated with the following US findings: smaller size (P=0.001), depth in the posterior third (P=0.002), angular or indistinct margin (P<0.001), and absence of architectural distortion (P<0.001).
Conclusion
The CAD software showed a promising detection rate of breast cancer. However, radiologists should judge whether CAD software-marked lesions are true- or false-positive lesions, considering its low PPV and high FPR. Moreover, it would be helpful for radiologists to consider the characteristics associated with false-negative outcomes when reading ABUS with CAD.
Introduction
Ultrasound (US) imaging of the breast, in addition to mammography and magnetic resonance imaging (MRI), plays a pivotal role in the screening and staging of breast cancer [1,2]. Handheld ultrasound (HHUS) is a standard imaging modality of the breast; however, HHUS has several drawbacks, such as a long examination time and low rate of reproducibility [3-5]. Automated breast ultrasound (ABUS) can overcome the shortcomings of HHUS [6-8] while maintaining diagnostic performance [9-12]. Furthermore, ABUS provides a wide field of view and coronal sections of the breast, which has the advantage of showing spiculated margins of breast cancer and architectural distortion [10,13]. ABUS is a promising modality for breast cancer screening in women with dense breast tissues [14,15], and several attempts have been made to use ABUS in the diagnostic setting [16-18].
Several studies have reported that computer-assisted detection (CAD) software may enhance the diagnostic performance of ABUS [19-22]. Improvement in the performance of CAD software has been observed with advances in machine learning algorithms [19]. However, to the best of our knowledge, no previous research has analyzed the shortcomings of ABUS CAD software in identifying breast cancer. Knowledge of the imaging features of breast cancers missed by the CAD software (false-negative cases) is crucial for optimizing the accuracy of diagnoses based on ABUS.
The purpose of this study was to measure the cancer detection rate of CAD software in preoperative ABUS of breast cancer patients and to determine the characteristics associated with false-negative outcomes.
Materials and Methods
Our institutional review board approved this study. The requirement for informed consent was waived because of the retrospective nature of the study.
Patient Inclusion
We included 147 consecutive breast cancer patients who underwent preoperative ABUS at our tertiary urban teaching hospital from December 2017 to February 2018. We did not analyze recurrent breast cancer in our study. We included both symptomatic and asymptomatic patients. At our institution, bilateral ABUS was routinely performed for preoperative evaluation of breast cancer patients, whereas HHUS was performed only in patients for whom ABUS was technically infeasible. Therefore, from the initial sample of 147 patients, those for whom ABUS was technically infeasible, those whose breast cancer was not visualized on ABUS, and those with bilateral breast cancers were excluded (Fig. 1). Among the 147 patients who were referred for preoperative ABUS during the study period, three patients in whom ABUS was technically infeasible due to the large mass of their breasts were excluded. Eleven additional patients whose breast cancers were not visualized on ABUS were excluded; in detail, these patients included one with occult breast cancer, two with status post-excisional biopsy, and eight patients with lesions not delineated on ABUS. Four other patients with bilateral breast cancers were excluded. Thus, we analyzed 129 index lesions (median size [interquartile range], 1.7 cm [1.2 to 2.4 cm]; minimum, 0.4 cm; maximum, 6 cm) from 129 patients (mean age±standard deviation [SD], 53.4±11.8 years) with breast cancer visualized on ABUS.
ABUS Acquisition
All ABUS examinations were performed using one of two ABUS systems (Invenia ABUS, GE Healthcare, Sunnyvale, CA, USA) by one of two well-trained technologists. The examinations were performed with the patients in the supine position. A sponge was placed under the shoulder to help the breast tissue spread out evenly and the nipple face the ceiling. A nipple marker was placed for accurate coordination. An ABUS-specific lotion was applied to the breast to avoid contact artifacts. The level of breast compression was optimized for each patient by controlling the breast compression setting of ABUS, not only to spread out the breast evenly for better image quality, but also to maximize the patient’s comfort.
The ABUS scan was continuous and automated with a 6-15-MHz wide-aperture linear probe. Volumetric data were obtained in the axial plane with a slice thickness of 0.2 mm starting from the inferior portion of the breast. Coronal and sagittal images were reconstructed from the axial images. The field of view was set to 15.4 cm×17.0 cm×up to 5 cm of depth from the skin to the chest wall. For each breast, three volumes were obtained: (1) the central volume with the nipple at the center of the footprint; (2) the lateral volume, which included the upper outer part of the breast tissue with the nipple located in the inferior-medial corner; and (3) the medial volume, which included the inner and inferior part of the breast tissue. Additional views were selectively obtained in patients with large breasts to avoid exclusion of tissue.
ABUS CAD Software
All volumetric data of ABUS were loaded onto a workstation dedicated for a deep learning-based commercialized CAD software (QVCAD version 2.1.2, QView Medical, Los Altos, CA, USA). The CAD software was applied to all ABUS examinations.
The output of the CAD software could be presented in two forms: (1) markers intended to highlight potentially malignant lesions and (2) minimum intensity projection images of the coronal section in areas where CAD software detected abnormalities. We only used CAD markers to evaluate the diagnostic performance of the CAD software (Fig. 2). The CAD markers did not display an exact value of the probability of malignancy potential.
The number of CAD markers displayed per ABUS volume could be adjusted by changing the values of the false-positive rate (FPR) in the configuration setting of the CAD software. According to the manual from the manufacturer [23], FPR was defined as the total number of false-positive CAD markers in non-cancer volumes divided by the total number of non-cancer volumes. In this study, we set the FPR to 0.2 (i.e., 1 false-positive CAD marker in non-cancer volume per 5 non-cancer volumes), which was its default setting.
Image Interpretation
Two breast radiologists (M.J. and S.M.K. with 12 and 16 years of experience in breast imaging, respectively, and 2 years of experience in ABUS) analyzed the 3D ABUS volume data at a dedicated ABUS workstation (Invenia ABUS Workstation, GE Healthcare). They thoroughly reviewed the ABUS images and recorded the most suspicious findings in the axial, coronal, and sagittal planes and reached consensus on their readings [10,13].
First, the radiologists searched for an index lesion, which was defined as a mass confirmed as breast cancer by US-guided core needle biopsy. If a patient had multiple breast masses that were confirmed as breast cancer on the US-guided core needle biopsy, only the largest mass was regarded as the index lesion. We did not analyze non-mass lesions in this study. When searching for an index lesion on ABUS, the radiologists were not allowed to refer to the results of the CAD software. Instead, they were permitted to refer to clinical information and findings from available imaging such as HHUS performed during US-guided core needle biopsy, mammography, and preoperative MRI. After identifying the index lesion, the size, nipple-to-lesion distance, depth, shape, margin, and echo pattern of the lesion, as well as the background tissue echotexture and architectural distortion were recorded. If an index lesion was visualized in multiple ABUS volumes, the largest size of the index lesion and the shortest nipple-to-lesion distance were recorded. To determine the depth of the index lesion, the depth of the breast was divided into three parts: the anterior third, the middle third, and the posterior third of the fibroglandular tissue. In a lesion with more than 1 layer, the depth was determined by the layer on which the center of the lesion was located. The background tissue echotexture was measured using the method suggested by Kim et al. [24]. Otherwise, the findings on ABUS were assessed based on the Breast Imaging Reporting and Data System (BI-RADS) [25].
After the image interpretation, the radiologists checked whether the CAD marker correctly pointed to the index lesion on the ABUS volume data in the CAD software-dedicated workstation. For an index lesion visualized in multiple volumes of ABUS, they considered that the CAD marker indicated the index lesion correctly if the CAD marker was placed at the index lesion on at least one ABUS volume.
Electronic Medical Record Review
Through a retrospective review of patients' electronic medical records, we recorded patients' age, menopausal status, family history of breast cancer, and symptoms. The final histopathology was determined by combining the pathological results of US-guided core needle biopsy and the surgical specimen. For the histopathological analysis, the index lesions were classified as invasive ductal carcinoma, invasive lobular carcinoma, and ductal carcinoma in situ. A diagnosis of invasive cancer was made when an invasive component was found either in US-guided core needle biopsy or surgical specimens. Ductal carcinoma in situ was diagnosed if both US-guided core needle biopsy and surgical specimens were judged to be free from an invasive component. In cases wherein an immunohistochemistry examination was performed in both US-guided core needle biopsy and surgical specimens, only the results from the US-guided core needle biopsy were used in the present study. The molecular subtype was classified into (1) hormone receptor (HR) positive and human epidermal growth factor receptor 2 (HER2) negative, (2) HER2 positive regardless of HR status, and (3) triple-negative/basal-like types.
Data Analysis and Statistical Analysis
The detection rate of index lesions of the CAD software on ABUS was measured. The detection rate was defined as the number of index lesions correctly indicated by the CAD marker divided by the total number of breast cancer patients in this study. Additionally, the positive predictive value (PPV) and FPR were measured to evaluate the diagnostic performance of the CAD software. PPV was defined as the number of markers properly located in the index lesion divided by the number of total markers in both non-cancer and cancer volumes. To check whether the FPR measured in our patient group was similar to the preset value of 0.2 in the CAD software, the definition of the FPR was identical to that in the manual of the CAD software [23] (i.e., the number of false-positive CAD markers in non-cancer volumes divided by the number of non-cancer volumes). In the present study, we measured the FPR using the ABUS volume data obtained from the contralateral side of breast cancer that was categorized as BI-RADS category 1 or 2 on preoperative ABUS. During the calculation of FPR, contralateral breasts that were classified as BI-RADS category 3 or above were excluded from the analysis. The results of preoperative MRI and US-guided core needle biopsy performed on the contralateral side of the breast were also checked to ensure the absence of breast cancer in the contralateral breast.
Subgroup analysis was performed to identify characteristics associated with false-negative outcomes of the CAD software. The Student t test and Fisher exact test were used for continuous and categorical variables, respectively. The tested variables were patients’ age, menopausal status, family history of breast cancer, symptoms, histopathology, molecular subtype, and ABUS findings (size, nipple-to-lesion distance, depth, shape, margin, and echo pattern of the lesion, background tissue echotexture, and architectural distortion).
All statistical analyses were performed using the open-source statistical software R (version 3.5.2, R Foundation for Statistical Computing, Vienna, Austria). Two-sided P-values of <0.05 were considered to indicate statistical significance.
Results
The detection rate of index lesions of the CAD software was 0.84 (109 of 129; 95% confidence interval [CI], 0.77 to 0.90). The PPV and FPR of the CAD software were 0.41 (221 of 544; 95% CI, 0.36 to 0.45) and 0.45 (174 of 387; 95% CI, 0.40 to 0.50), respectively.
The baseline characteristics and histopathological results of the true-positive and false-negative groups are listed in Table 1. The presence of symptoms in the patients was significantly different between the two groups (P<0.001); asymptomatic patients were more frequent in the false-negative group (true-positive vs. false-negative, 20.2% [22 of 109] vs. 65.0% [13 of 20]). Otherwise, there were no significant differences in age, menopausal status, family history of breast cancer, histopathological findings, and molecular subtypes between the two groups.
The ABUS findings in the true-positive and false-negative groups are listed in Table 2. A smaller size of the index lesion was observed in the false-negative group (mean±SD, 2.1±1.1 cm vs. 1.3±0.7 cm; P=0.001). The depth of the lesion was significantly different between the two groups (P=0.002), as depth in the posterior third was more frequent in the false-negative group (5.5% [6 of 109] vs. 25.0% [5 of 20]). The margin of the index lesion was associated with a lower cancer detection rate using the CAD software (P<0.001), as angular margins (6.4% [7 of 109] vs. 35.0% [7 of 20]) and indistinct margins (28.4% [31 of 109] vs. 45.0% [9 of 20]) were more frequent in the false-negative group. Absence of architectural distortion was associated with a higher false-negative rate (24.8% [27 of 109] vs. 90.0% [18 of 20], P<0.001). A representative false-negative case is shown in Fig. 3.
Discussion
In this study, we measured the cancer detection rate of CAD software in preoperative ABUS of breast cancer patients and analyzed the characteristics associated with false-negative outcomes. The detection rate of index lesions of the CAD software was 0.84, and its PPV and FPR were 0.41 and 0.45, respectively. False-negative outcomes were more frequent in asymptomatic patients and were associated with the following US findings: smaller lesion size on ABUS, depth in the posterior third, an angular or indistinct margin, and absence of architectural distortion.
The low PPV and high FPR observed in our study can be explained by the following reasons. First, a heterogeneous background tissue echotexture on ABUS was noted in a considerable portion of our included patients, which may lower the detection rate and increase the FPR of CAD. Second, Asian women tend to have small and dense breasts. In a previous study performed in women with dense breasts, the PPV of the CAD software-based ABUS reading was 0.50 (95% CI, 0.45 to 0.55) [26]. Moreover, there is a possibility of artifacts owing to insufficient compression. Third, while we set the FPR of our CAD based on data assuming a screening setting of a Western general population [23], our study results reflect the hypothetical screening setting of Asian breast cancer patients.
The smaller size of the index lesions led to a decrease in the cancer detection rate of the CAD software. Generally, it is known that small cancers can be overlooked on ABUS used in the screening setting [27]. A recent study that utilized the same CAD software as ours showed that the cancer detection rate of the ABUS CAD software was associated with tumor size, and the detection rate was around 90% for invasive ductal carcinoma larger than 1 cm [28]. Furthermore, Kim et al. [29] reported that a small size of the tumor (<8 mm) was associated with false-negative outcomes of the CAD software, although their CAD software was different from that used in the present study. This trend is maintained even when ABUS is interpreted by radiologists. A previous study reported that the detection rate of malignancy in the ABUS read by radiologists increased in proportion to the size of the lesion [30]. When the size of the target lesion was larger than 1.2 cm, radiologists could reliably find the lesion [13]. Therefore, the results of the present study could reflect the weakness of ABUS itself in visualizing small masses.
Absence of architectural distortion was associated with a decreased cancer detection rate of the CAD software. Some articles demonstrated that 3D ABUS helped radiologists detect a spiculated margin of breast cancer or architectural distortion by showing the coronal section of the breasts [10,13]. Furthermore, it was suggested that the detection of architectural distortion may contribute to the timely detection of breast cancer on ABUS [27]. Thus, the absence of architectural distortion might lower the diagnostic performance of the CAD software in detecting malignancy on ABUS.
Depth in the posterior third was associated with higher false-negative rate of the CAD software. Lesions located in the deep posterior tissue are inherently difficult to visualize using transducers with a high insonating frequency due to attenuation of the US beam [31]. Moreover, detection of the masses in the deeper portion of the breast can be hindered by the nipple shadow or posterior shadowing of another lesion in a superficial location.
Angular and indistinct margins of the lesion were associated with a decreased cancer detection rate of the CAD software. Subtle non-circumscribed margins, artifacts, and architectural distortions may be difficult to capture on static images of US [31]. Moreover, an inability to freely adjust the degree of compression or scan angle could be a potential cause of false-negative outcomes on ABUS. Therefore, all available volume scan images must be sufficiently evaluated not only in the axial plane, but also in the coronal and sagittal planes, to overcome the false-negatives of the CAD software.
The CAD software used in the present study was based on a deep learning algorithm. Further improvement in the performance of the CAD software might be achieved by modifying it to focus on the challenging factors revealed by our study (i.e., factors associated with false-negative outcomes). Promising results have been reported in recent studies of CAD applications using the latest deep learning algorithms [32,33].
The present study has several limitations. First, this was a single-center retrospective study with inherent limitations regarding its generalizability. Second, patients with breast cancer undetected on ABUS were excluded from the statistical analyses; however, this was due to the limitations of the ABUS device itself. Third, the radiologists who reviewed ABUS in our study were not blinded to the fact that they were assessing BI-RADS category 6 lesions. Fourth, all the ABUS scans were performed after core needle biopsy. Thus, the interpretation of ABUS might have been influenced by the findings of HHUS performed during the core needle biopsy. Furthermore, the biopsy itself may affect the margin and shape of lesion and may influence the detection rate of CAD software. Fifth, it is possible that interobserver variability in the lesion description may have influenced our results and may have decreased their reproducibility. Finally, we only included the largest mass in patients with multifocal or multicentric breast cancers in our analyses.
In conclusion, the CAD software showed a promising detection rate of breast cancer. However, radiologists should judge whether a CAD software-marked lesion is a true- or false-positive lesion, considering its PPV and high FPR. Moreover, it would be helpful for radiologists to keep in mind the characteristics associated with false-negative outcomes when reading ABUS with CAD.
Notes
Author Contributions
Conceptualization: Kim Y, Rim J, Kim SM, Park SY, Ahn HS, Kim B, Jang M. Data acquisition: Kim Y, Kim SM, Jang M. Data analysis or interpretation: Kim Y, Rim J, Kim SM, Yun BL, Park SY, Ahn HS, Kim B, Jang M. Drafting of the manuscript: Kim Y. Critical revision of the manuscript: Rim J, Kim SM, Yun BL, Park SY, Ahn HS, Kim B, Jang M. Approval of the final version of the manuscript: all authors.
The CAD application (QVCAD; QView Medical, Los Altos, CA, USA) was used for research purposes only. We received no consulting fees from QView Medical, Inc.
Acknowledgements
This work was supported by Research Resettlement Fund for the new faculty of Seoul National University. The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.