Application of the Thyroid Imaging Reporting and Data System in thyroid ultrasonography interpretation by less experienced physicians
Article information
Abstract
Purpose:
To verify the usefulness of the Thyroid Imaging Reporting and Data System (TI-RADS) for thyroid nodule diagnosis by less experienced physicians.
Methods:
From March 2012 to May 2012, ultrasonography-guided fine needle aspiration was performed in 204 thyroid nodules in 195 consecutive patients by four less experienced radiologists (<1 year in thyroid imaging). The number of suspicious ultrasonography features and the total risk score of each thyroid nodule were calculated according to the previous two models suggested by Kwak et al. The Delong method was used to compare the areas under the curve (AUCs) of the two models. Associations between the two models and the risk of malignancy were analyzed using penalized B-splines and the Cochran-Armitage trend test.
Results:
Among 204 thyroid nodules, 65 were malignant and 139 were benign. The probability of malignancy tended to increase as the number of suspicious ultrasonography features, and the sum of risk scores increased. There was no significant difference in the AUCs of the two models (P=0.673). The Cochran-Armitage trend test demonstrated an increased risk of malignancy as the number of suspicious ultrasonography features and the total risk score increased (P=0.001).
Conclusion:
Both the number of suspicious ultrasonography features and the total risk score are applicable and show comparable results in the risk stratification of thyroid nodules by less experienced radiologists in thyroid imaging.
Introduction
The number of diagnosed nonpalpable thyroid nodules is increasing as a consequence of the widespread use of ultrasonography (US) in health surveilance and an increase in the use of US-guided fine needle aspiration (US-FNA) [1]. Thyroid nodules are found in up to 67% of adults by US [2]. However, fewer than 5.0%-6.5% of incidently discovered thyroid nodules are malignant [3]. Thus, it is important that criteria be established for selecting thyroid nodules for FNA according to malignancy risk.
US is an important diagnostic tool in predicting thyroid maligancy and selecting thyroid nodules that should be assessed by FNA [4-6]. Known suspicious US features include marked hypoechogenicity, microlobulated or irregular margins, microcalcifications, and a tallerthan-wide shape; a combination of these features is known to provide better diagnostic accuracy than a single feature alone [4,7,8]. Many organizations have recommended guidelines for the selection of thyroid nodules for biopsy using size criteria or suspicious US features [9-12]. However, various terms such as probably benign, indeterminate, low suspicion, and suspicious and different criteria hinder effective communication between reporting radiologists and clinicians. In breast imaging, the Breast Imaging Reporting and Data System (BI-RADS) is widely used to assess the probability of malignancy and need for biopsy [13]. Based on this system, breast nodules can be established with a certain malignancy rate for each category. Similar to BI-RADS, the Thyroid Imaging Reporting and Data System (TI-RADS) was developed for risk stratification of thyroid nodules using US features [14-16].
Although several studies have suggested that TI-RADS helps avoid confusion among physicians and patients and reduces unnecessary benign cytologic results, applying this approach is difficult in daily practice because of its complexity. In a study looking for novel approaches to overcome the complexity of TI-RADS, Kwak et al. [17] recently reported that as the number of suspicious US features increased, the fitted probability and risk of malignancy also increased. They also demonstrated that the following US features showed a significant association with malignancy: solid component, hypoechogenicity, marked hypoechogenicity, microlobulated or irregular margins, microcalcifications, and taller-than-wide shape [17]. However, this study was limited, in that each suspicious US feature was regarded with the same weight of malignancy probability. Thereafter, a diagnostic prediction model derived from the total risk score was proposed that reflected the different probablities of malignancy from each suspicious US feature, and the resulting model showed that US risk scores could predict thyroid malignancy well [18]. However, radiologists involved in these previous studies had more than 5 years of experience in thyroid imaging. Therefore, the purpose of this study was to verify the usefulness of TI-RADS among less experienced physicians with less than 1 year experience in thyroid imaging.
Materials and Methods
Study Population
This retrospective study was approved by the Institutional Review Board and required neither patient approval nor informed consent for review of patients’ images and medical records. However, written informed consent was obtained from all patients for US-FNA prior to each procedure according to our hospital's regular policy. From March 2012 to May 2012, 259 consecutive thyroid nodules of 248 patients were imaged with gray-scale US and US-FNA was performed by 4 less experienced radiologists. Of these, 55 thyroid nodules including indeterminate (n=15) or nondiagnostic results (n=40) at cytologic evaluation were excluded because they did not undergo surgery or repeat US-FNA. Inclusion criteria were as follows: (1) thyroid nodules in which thyroid surgery was performed (n=63); (2) benign or malignant results at cytologic evaluation (n=139); and (3) benign or malignant results at US-FNA or thyroid surgery after nondiagnostic cytologic results (n=2). Finally, 204 thyroid nodules in 195 patients (26 men and 169 women; mean age, 51 years; age range, 16 to 88 years) were enrolled in this study. Among 204 thyroid nodules, 65 nodules (32%) were malignant and 139 (68%) were benign. Pathologic results confirmed after operation are shown in Table 1.
Real-time Gray-scale US
Real-time gray-scale US was independently performed by one of four board-certified less experienced radiologists (<1 year experience in thyroid imaging), who were assigned arbitrarily according to the hospital’s daily schedule, using a 6- to 14-MHz linear array transducer (EUB-7500, Hitachi Medical, Tokyo, Japan) or a 5- to 12-MHz linear array transducer (iU 22, Philips Medical Systems, Bothell, WA, USA). Before this study, each radiologist had experience performing thyroid US for 8, 9, 7, and 7 months, respectively, during resident practice in different hospitals. During this study period, all four radiologists had undergone fellowship training in thyroid imaging for 1 month in the same hospital. Of 204 thyroid nodules in this study, 15, 39, 55, and 95 thyroid nodules were imaged by physicians 1, 2, 3, and 4, respectively.
The US features of all of the thyroid nodules were prospectively recorded for clinical use according to the internal component, echogenicity, margins, calcifications, shape, and final assessment by the radiologists who had performed US. The internal component was classified as complete solid, cystic portion greater than 50%, or cystic portion less than or equal to 50%. Echogenicity was classified as hyperechogenicity, isoechogenicity, or hypoechogenicity (with respect to normal thyroid parenchyma), or marked hypoechogenicity(defined as lower echogenicity than the strap muscle). Margins were classified as well-defined, microlobulated, or irregular. Calcifications were classified as microcalcification (less than or equal to 1 mm in diameter; tiny, punctuate, hyperechoic foci, either with or without acoustic shadows), macrocalcification, or no calcification. Shape was defined as taller-than-wide (ratio of the anteroposterior diameter to the transverse diameter ≥1) or wider-than-tall. Suspicious features of thyroid nodules on US included marked hypoechogenicity, lack of well-defined margins, microcalcifications, and taller-than-wide shape. All of the radiologists classified the thyroid nodules into two categories, as either positive for malignancy or negative for malignancy [4,7]. When thyroid nodules showed any of the suspicious features, they were classified as positive for malignancy. When thyroid nodules showed none of the suspicious features, they were classified as negative for malignancy.
US-FNA was performed on thyroid nodules with suspicious US features or the largest of thyroid nodules in patients with multiple thyroid nodules that did not have suspicious US features [4,7]. USFNA was performed with a 23-gauge needle attached to a 2-mL disposable plastic syringe, and each lesion was aspirated at least twice with the freehand technique. Samples obtained were expelled on glass slides, smeared, and placed immediately in 95% alcohol for Papanicolaou staining. The remaining material in the syringe was rinsed in saline for cell block processing [19]. Cytopathologists were not on site during the aspiration procedure, and the cytology slides were interpreted by an experienced pathologist to confirm the cytologic diagnosis. Based on the Bethesda System for Reporting Thyroid cytology, FNA cytology results were classified as nondiagnostic, benign, atypia of undetermined significance/follicular lesion of undetermined significance (AUS/FLUS), suspicious for follicular neoplasm or suspicious for a Hurthle cell neoplasm, suspicious for malignancy, and malignant [20].
The number of suspicious US features of each thyroid nodule was counted using our prospectively recorded US data. Based on the results of a previous study, a solid component, hypoechogenicity, marked hypoechogenicity, microlobulated or irregular margins, microcalcifications, and a taller-than-wide shape were counted as suspicious US features [17]. We applied TI-RADS category 3 (probably benign) to nodules with no suspicious US feature (Fig. 1), category 4a (low suspicion for malignancy) to nodules with one suspicious US feature, category 4b (intermediate concern for malignancy) to nodules with two suspicous US features, category 4c (moderate concern but not classic for malignancy) to nodules with three or four suspicous US features (Fig. 2), and category 5 (highly suggestive of malignancy) to nodules with five suspicious US features (Fig. 3), respectively, according to the results of a previous study [17].
Another study proposed a new prediction model derived from the total risk score that took into consideration the different risk scores of each suspicious US feature [18]. They estimated the risk score for each suspicious US feature by calculating the odds ratios of thyroid malignancy. Based on this recent study by Kwak et al. [18], we applied a score of 2 to hypoechogenicity, a score of 6 to marked hypoechogenicity, a score of 1 to a taller-than-wide shape, a score of 5 to microlobulated or irregular margins, and a score of 2 to microcalcifications. Then, the total risk score of each thyroid nodule was calculated by summing the scores of each suspicious US feature, and the percentage chance of malignancy was obtained according to the total risk score.
Statistical Analysis
To determine the association between malignancy and patient age and nodule size, an independent two-sample t-test was used. Patient gender was compared between benign and malignant nodules using the chi-squared test. The rate of malignancy in thyroid nodules according to the TI-RADS category were also calculated. The ROC analysis was performed to assess the accuracy of two models predicting thyroid malignancy derived from the number of suspicious US features and total risk score. The Delong method was used to compare the AUCs of the two prediction models. The associations between the two models and the risk of malignancy were determined using penalized B-splines and the Cochran-Armitage trend test. Analysis was performed with SAS ver. 9.2 (SAS Institute, Cary, NC, USA). Statistical significance was assumed when the P-value was less than 0.05. All reported P-values are 2-sided.
Results
Table 2 shows the patient demographics. The mean size of the thyroid nodules was 16.2±11.6 mm (range, 2 to 65 mm). The malignant nodules were significantly smaller than the benign nodules (mean size, 12.6±11.2 mm vs. 17.9±11.6 mm, respectively; P=0.003). Gender did not differ significantly between the malignant and benign nodules (P=0.514), nor did age (P=0.350).
Of the 204 thyroid nodules, 4% (1/23), 0% (0/43), 13.5% (5/37), 56.1% (46/82), and 68.4% (13/19) of the carcinomas were classified into TI-RADS categories 3, 4a, 4b, 4c, and 5, respectively (Table 3). Of the cases of 65 thyroid cancer, 1 (2%), 5 (7%), 46 (71%), and 13 (20%) nodules were subgrouped into TI-RADS categories 3, 4b, 4c, and 5, respectively (Table 3). The percentage of malignancy according to the sum of risk scores for each thyroid nodule is shown in Table 4. The malignancy rate was 2.2% (1/46) in the thyroid nodules with the sum of risk scores presenting as zero (Fig. 4). The predictive power (Az=0.827) of the model using the number of suspicious US features of each thyroid nodule was not significantly superior to that (Az=0.833) of the model using the total risk score of each thyroid nodule (P=0.673). Penalized B-splines demonstrated that the risk of malignancy tended to increase as the number of suspicious US features and the total risk score increased in each thyroid nodule (Fig. 5). According to the Cochran-Armitage trend test, as the number of suspicious US features and the total risk score of each thyroid nodule increased, the risk of malignancy increased (P<0.001).
Discussion
Although multiple guidelines suggest that suspicious US features and nodule size should be considered when selecting thyroid nodules to be biopsied [4,6,9-12], a standardized lexicon for the characterization of thyroid nodules and guidelines for the selection of thyroid nodules needing FNA is still required to avoid unnecessary US and repeated FNA, and to reduce confusion among physicians and patients. Multiple studies have developed TI-RADS into a standardized US categorial system for predicting thyroid maligancy [14-18]. Horvath et al. [15] demonstrated 10 US patterns that encompass all types of thyroid nodules and retrospectively validated TI-RADS. However, it can be ambiguous and lack efficiency to select a specific category among 10 descriptive US patterns in everyday clinical practice. Park et al. [14] proposed an equation for predicting thyroid malignancy using a multiple logistic regression analysis. For using this method, calculation using a regression equation itself is difficult to apply in any circumstance due to the complicated equation and 12 variables of US findings. To overcome these limitation of previous studies, Kwak et al. [17] used the number of suspicious US features and calculated the fitted probability of malignancy. Nevertheless, there was a limitation to their approach in that each suspicious US feature was weighted identically [17]. Thereafter, they established a TI-RADS model applying a different risk score to each suspicious US feature [18].
Though US is noninvasive, a major limitation is its user dependency. Several studies have reported relatively good interobserver agreement in the final assessment for thyroid nodules [21,22]. However, diagnostic performance of US can greatly improve with the experience of the physician [23-25]. Therefore, experience can be a critical factor affecting the diagnostic accuracy of US. Most radiologists involved in previous studies about TI-RADS had more than 5 years of experience in thyroid imaging. Therefore, we wanted to determine whether these approaches were comparable when physicians were less experienced in thyroid imaging. Our study used two previous approaches in which the risk of malignancy increased proportionally to the number of suspicious malignant US features, where the predicted probability of malignancy had a tendency to increase along with the total risk score. We found that the results were comparable to previous studies even when the examinations were performed by less experienced physicians. The percentage of malignancy according to the number of suspicious US features is shown in Table 3. We found that as the number of suspicious US features increased, the risk of malignancy also increased (Fig. 5A). This result was comparable to a previous study by Kwak et al. [17] and also supported the conclusion reached by the American Association of Clinical Endocrinologists, which noted that the coexistence of two or more suspicious US features indicates a much higher risk of thyroid cancer than does a single suspicious US feature [9]. Of 23 thyroid nodules assigned to TI-RADS 3, 17 nodules were examined by three physicians (physician 1,2, and 3) and there was no cancer. The remaining 6 nodules were examined by physician 4 and one nodule (16.7%, 1/6) was malignant (Fig. 4). Overall, the risk of malignancy was 4% (1/23) in nodules classified as TI-RADS 3; this rate was in range of previous results (0%-9.6%) [14,16,17,26]. Of nineteen thyroid nodules assigned to TI-RADS 5, 2 malignant nodules (100%, 2/2), 9 malignant nodules (9/13, 69.2%), and 2 malignant nodules (2/4, 50%) were examined by physician 2, 3, and 4, respectively. Overall, the risk of malignancy was 68.4% (13/19) in nodules classified as TI-RADS 5 (Table 3). Table 4 shows the percentage of malignancy according to the sum of risk scores based on a previous study [18]. The risk of malignancy tended to increase as the risk score increased in our data and was again comparable to a previous study (Fig. 5B) [18]. The risk of malignancy in nodules with a total score of 14 was 80.1% (4 nodules of 5), which was comparable to the result of a previous study (90%) [18]. The risk of malignancy was 2.2% among nodules, with the sum of risk scores presenting as zero in this study, and previous studies reported a broad range of malignancy rates from 4.3% to 31.1% in nodules, which corresponds to those with a risk score of zero [14-16,18]. We calculated the area under the curve to assess the accuracy of two models predicting thyroid malignancy derived from the number of suspicious US features and total risk score, and the values were 0.827 and 0.833, respectively. These results were similar to those of Kwak et al. [18] (0.872) and Hambly et al. [16] (0.794-0.904).
TI-RADS may improve management and the cost-effectiveness of follow-up by unifying the language and codes among radiologists. In addition, the two approaches tested in our study are very easy to apply in clinical practice.
There are several potential limitations of our study. First, we did not determine interobserver variability or intraobserver reproducibility among the 4 radiologists since each radiologist prospectively recorded US features independently. Though interobserver variability was not assessed, our study results demonstrated the reproducibility of two previously proposed models. Second, the reference standard was cytologic results in some thyroid nodules with benign results, and false-negative cytologic results may have existed for these nodules without surgical confirmation [27].
In conclusion, both the number of suspicious US features and the total risk score are applicable and show comparable results in the risk stratification of thyroid nodules by radiologists less experienced in thyroid imaging.
Notes
No potential conflict of interest relevant to this article was reported
Acknowledgements
This study was supported in part by the Research Fund of the Korean Society of Ultrasound in Medicine.