Reproducibility of liver stiffness measurements made with two different 2-dimensional shear wave elastography systems using the comb-push technique
Article information
Abstract
Purpose
The purpose of this study was to retrospectively compare the technical success and reliability of the measurements made using two 2-dimensional (2D) shear wave elastography (SWE) systems using the comb-push technique from the same manufacturer and to assess the intersystem reproducibility of the resultant liver stiffness (LS) measurements.
Methods
Ninety-four patients with suspected chronic liver diseases were included in this retrospective study. LS measurements were obtained using two 2D-SWE systems (LOGIQ E9 and LOGIQ S8) from the same manufacturer, with transient elastography (TE) serving as the reference standard, on the same day. The technical success rates and reliability of the measurements of the two 2D-SWE systems were compared. LS values measured using the two 2D-SWE systems and TE were correlated using Spearman correlation coefficients and 95% Bland-Altman limits of agreement. Thereafter, Bland-Altman limits of agreement and intraclass correlation coefficients (ICCs) were used to analyze the intersystem reproducibility of LS measurements.
Results
The two 2D-SWE systems showed similar technical success rates (98.9% for both) and reliability of LS measurements (92.3% for the LOGIQ E9, 91.2% for the LOGIQ S8; P=0.185). Despite the excellent correlation (ICC=0.92), the mean LS measurements obtained by the two 2D-SWE systems were significantly different (LOGIQ E9, 6.57±2.33 kPa; LOGIQ S8, 6.90±6.64 kPa; P=0.018).
Conclusion
Significant intersystem variability was observed in the LS measurements made using the two 2D-SWE systems. Therefore, even 2D-SWE systems from the same manufacturer should not be used interchangeably in longitudinal follow-up.
Introduction
Chronic liver disease (CLD) is a well-known worldwide problem, and common etiologies of CLD include viral hepatitis (hepatitis B or C), alcohol abuse, and non-alcoholic fatty liver disease. Regardless of the cause of CLD, it is important to determine the stage of liver fibrosis in order to identify the most suitable treatment for preventing progression to liver cirrhosis [1-3]. So far, liver biopsy has been the gold standard for evaluating liver fibrosis. However, it has several limitations, such as sampling variability, complications due to invasiveness, and intraobserver or interobserver variability in fibrosis staging [4-8].
As an alternative approach to liver biopsy for hepatic fibrosis staging, various ultrasound-based shear wave elastography (SWE) techniques have been developed, including transient elastography (TE) (FibroScan, Echosens, Paris, France) using a mechanical push, point SWE (p-SWE), and 2-dimensional SWE (2D-SWE) [9-12]. Currently, many p-SWE and 2D-SWE techniques are commercially available, and they share the principle of using an ultrasound-induced acoustic radiation force impulse to generate shear waves [13,14]. In clinical practice, both the diagnostic performance of different SWE systems for fibrosis staging and their reproducibility are important issues for detecting severe fibrosis and longitudinal monitoring of the therapeutic response [15,16]. Several previous studies on the reproducibility of various SWE techniques have recommended that liver stiffness (LS) measurements obtained using different techniques should not be used interchangeably [11,17,18]. However, intersystem reproducibility using the same SWE technique has not yet been well evaluated.
The purpose of this study was to evaluate the intersystem reproducibility of LS measurements with 2D-SWE using the combpush technique for the evaluation of hepatic fibrosis, with TE as the reference method.
Materials and Methods
This study was performed retrospectively with approval from our institutional review board, and the requirement for informed consent was waived.
Study Population
Between May and September 2017, 94 patients (39 men, 55 women; mean age, 57.5±12.1 years; age range, 25-80 years; median body mass index [BMI], 24.9±3.59 kg/m2; BMI range, 18.5-24.9 kg/m2) with suspected CLD who were referred to the Department of Radiology for fibrosis screening were included in this study (Fig. 1). No patients had ascites. Patients younger than 18 years and patients who had undergone right hepatectomy were not included in this study. During the study period, a LOGIQ S8 ultrasound (US) platform (GE Healthcare, Wauwatosa, WI, USA) equipped with TE and a LOGIQ E9 US platform using 2D-SWE were tested to analyze the intersystem reproducibility of LS measurements. After explaining the use of two platforms during a clinical routine liver US examination, 2D-SWE exams and TE were performed with verbal agreement from the patients.
The etiologies of CLD in the study participants were chronic hepatitis B (n=62, 66.0%); chronic hepatitis C (n=10, 10.6%); chronic non-viral hepatitis, such as non-alcoholic steatohepatitis, alcoholic liver disease, and primary biliary cirrhosis (n=18, 19.1%); and others (n=4, 4.3%) (Table 1).
In addition, we classified the degree of liver fibrosis in patients based on LS values measured by TE, since TE is the best-validated method for evaluating liver fibrosis [19-21]. We used the LS cutoff values for TE proposed in the latest meta-analysis [20]: 7.9 kPa for moderate fibrosis (F≥2), 8.8 kPa for severe fibrosis (F≥3), and 11.7 kPa for liver cirrhosis (F=4). In the 64 patients who had reliable LS measurements from all examinations, the most common fibrosis stage was mild (F1) or no liver fibrosis (F0) (51 of 64, 79.7%), followed by liver cirrhosis (F4) (6 of 64, 9.4%), moderate (F2) liver fibrosis (4 of 64, 6.2%), and severe fibrosis (F3) (3 of 64, 4.7%) (Table 1).
SWE Examinations
All patients underwent US examinations after fasting for more than 6 hours. All examinations were performed with LOGIQ E9 and LOGIQ S8 US systems (GE Healthcare), and both systems were equipped with convex broadband (C1-6; 1.5-6 MHz; central frequency, 4 MHz). All US examinations were performed by one radiologist, who had 6 years of experience in US-based elastography (including 2D-SWE) and TE (>200 examinations), as well as 20 years of experience with abdominal US examinations. At first, conventional B-mode sonography was performed using a 4-MHz convex probe. LS measurements were then made using the same probe with an intercostal approach while patients maintained a supine position or right anterior oblique position, with the right arm in maximum abduction during the SWE examinations. For 2D-SWE using both systems, a 1×1-cm2 region of interest was placed in the right anterior segment of the liver, avoiding large vessels and areas with artifacts, 1.5-2.0 cm away from the Glisson capsule, and less than 6 cm deep from the capsule (Fig. 2). Finally, LS measurements were made with TE (FibroScan Echosens) using the LOGIQ S8 platform on each patient. All TE examinations were performed by the same operator who performed the 2D-SWE examination. The operator, who had previously performed more than 200 TE examinations, performed all FibroScan examinations according to the manufacturer's recommendations. The tip of the transducer probe (M+ probe or XL+ probe when prompted by the automatic probe selection tool) was placed on the skin between the ribs over the right lobe of the liver, and a valid LS measurement was obtained under the guidance of M-mode monographic images [12,13,22]. During LS measurement, patients were instructed to hold breathing while avoiding deep inspiration or expiration. At least 10 valid measurements of every patient were made using each method of SWE. To reduce recall bias, the summary of the serial measurements of each system was not made available to the operator until all examinations were completed [18].
Definition of Technical Failure and Reliable (or Unreliable) Measurements
Technical failure of SWE was defined as the failure to acquire 10 valid measurements after at least 15 trials [18]. If the ratio of the interquartile range to the median LS was greater than 30%, the result was regarded as an unreliable measurement [23].
Statistical Analysis
Continuous data are presented as means and ranges, and categorical data as counts and percentages. The Friedman test was used to compare the rates of technical failure and unreliable measurements between the two 2D-SWE imaging systems and TE. For comparison of the fibrosis stages between patients with reliable and unreliable LS measurements, the Mann-Whitney test was used. The Wilcoxon signed-rank test was used to compare the LS measurements obtained by the two 2D-SWE imaging systems. The Spearman correlation coefficient and 2-way mixed model intra-class correlation coefficients (ICCs) with 95% confidence intervals (CIs) were obtained to evaluate the agreement between the two 2D-SWE systems and TE. Correlation coefficients were classified using the following definitions: 0-0.19, very weak; 0.2-0.39, weak; 0.40-0.59, moderate; 0.60-0.79, strong; and 0.80-1.0, very strong [24]. Agreement based on ICCs was classified using the following definitions: 0-0.39, poor; 0.40-0.59, fair; 0.60-0.74, good; and 0.75-1.0, excellent [25]. Bland-Altman analysis was used to evaluate method-related variations using the mean value between the two 2D-SWE systems and TE. The 95% limits of agreement, as well as the coefficient of reproducibility (CR=1.96×standard deviation of bias), were determined to evaluate the intersystem variability in the liver LS measurements. The coefficient of variation (CV) of the LS values between the two 2D-SWE systems was also calculated. All statistical analyses were performed using commercially available software programs version 23 (IBM Corp., Armonk, NY, USA; or MedCalc version 16, MedCalc Software, Mariakerke, Belgium). P-values less than 0.05 were considered to indicate statistical significance.
Results
All 94 patients were included in the assessment of the technical success rate and reliability of the measurements of the two 2D-SWE systems and TE. However, the reproducibility and performance of the two 2D-SWE systems were only evaluated in the 64 patients who had reliable LS values from both 2D-SWE systems and TE. A total of 30 patients were excluded due to technical failure (n=3) or an unreliable result from any technique (n=27) (Fig. 1).
Technical Failure and Unreliable Measurement Rates
Technical failure of LS measurements
Among the 94 patients, a set of 10 LS measurements was made successfully in 91 patients (91 of 94, 97%) by both 2D-SWE systems and TE (Fig. 1). One case of LS measurement failure occurred with each of the LOGIQ E9, LOGIQ S8, and TE in different patients (1 of 94; 1.1% for each). There was no significant difference in the technical success rate between the two 2D-SWE systems and TE (P>0.990). Regarding the elastography method, the 2D-SWE systems and TE had the same failure rate (1 of 94, 1.1%).
Unreliable LS measurements
Among patients with technically successful LS measurements, unreliable LS measurements were obtained in 8.6% (8 of 93) of the patients using the LOGIQ E9, 9.7% (9 of 93) using the LOGIQ S8, and 16.1% (15 of 93) using TE. Two patients failed to demonstrate a reliable measurement with both the LOGIQ E9 and LOGIQ S8 and one patient failed to show a reliable measurement with both the LOGIQ E9 and TE. There was no significant difference in the unreliable measurement rate among the two 2D-SWE systems and TE (P>0.990). Among the 76 patients with reliable LS measurements by TE, there was also no significant difference in fibrosis stage between patients with reliable LS measurements and those with unreliable LS measurements (P=0.475).
Overall Correlations and Intersystem Reproducibility of LS Values across 2D-SWE Systems
In the 64 patients who had reliable LS measurements from the two 2D-SWE systems and TE, the mean LS values obtained by the two 2D-SWE systems were significantly different (LOGIQ E9, 6.57±2.33 kPa; LOGIQ S8, 6.90±6.64 kPa; P=0.018) (Table 2). In a subgroup analysis according to fibrosis grade by TE, a significant difference was found in the mean LS values between the LOGIQ E9 and LOGIQ S8 in F1 patients (P=0.037), while no significant difference was found in the mean LS values in ≥F2 patients (P=0.131-0.223) (Table 3).
The LS values for the two 2D-SWE systems demonstrated a very strong positive correlation (r=0.86, P<0.001). The correlations of the LS values between the two 2D-SWE systems and TE were also strong (Fig. 3). The ICC for the LS measurements between the LOGIQ E9 and LOGIQ S8 was 0.92, indicating excellent agreement (95% CI, 0.85 to 0.95). The CV between the two 2D-SWE systems was 15.9% (Table 4).
The Bland-Altman plots for reproducibility between TE and the two 2D-SWE systems showed a tendency for larger LS values to be obtained by the two 2D-SWE systems than by TE in patients with a lower fibrosis grade. However, in patients with a higher fibrosis grade, smaller LS values were obtained by the two 2D-SWE systems than by TE (Fig. 4).
Performance of the Three SWE Techniques in Detecting Significant Fibrosis (F≥2)
Using the LS values of TE as the reference standard, the LOGIQ E9 showed an area under the receiver operating characteristic curve (AUROC) of 0.92 (95% CI, 0.83 to 0.97) using a cut-off value of >8.33 kPa for the diagnosis of significant fibrosis, with a positive predictive value (PPV) of 61.5% and a negative predictive value (NPV) of 98.0%. The LOGIQ S8 demonstrated an AUROC of 0.94 (95% CI, 0.85 to 0.98) using a cut-off value of >7.8 kPa for the diagnosis of significant fibrosis, with a PPV of 53.3% and an NPV of 97.9%. In the pairwise receiver operating characteristic curve comparison, the AUROCs of the two 2D-SWE techniques for the prediction of significant fibrosis were not significantly different (P=0.436) (Fig. 5).
Discussion
In this study, we tested the intersystem agreement of the combpush 2D-SWE technique using two clinical US platforms from the same manufacturer. In recent years, various SWE techniques are gaining wide acceptance among clinicians as a noninvasive tool for the detection and precise staging of fibrosis and cirrhosis, which are very important for the timely initiation of appropriate therapeutic regimens [14]. Furthermore, the SWE technique could be widely used for monitoring liver fibrosis after the application of antiviral agents [26,27]. Therefore, for the longitudinal assessment or monitoring of liver fibrosis, the interchangeability of LS measurements made using different SWE techniques could be of significant value. In daily practice, however, more than one SWE method or US platform can be used for the longitudinal assessment of liver fibrosis during the management of CLD over time [18]. Therefore, an evaluation of the intersystem agreement of LS values would make a major contribution to this resolving this issue. Importantly, we found that there was a significant difference in mean LS values measured by the two US platforms (P=0.018) despite good to excellent correlations. Based on our study results, absolute LS measurements from multiple systems, even made with the same technique using systems from the same factory, should not be used interchangeably during the follow-up or monitoring of liver fibrosis.
The two SWE systems use the same SWE software and beamforming technique. Therefore, the difference in mean LS values measured by the same 2D-SWE technique using two different platforms could be attributed to machine-specific factors [28]. Indeed, according to an interlaboratory study comparing shear wave velocities obtained with four different machines (Fibroscan, Philips iU22, ACUSON S2000, and Aixplorer) using elastic phantoms by the Ultrasound Shear Wave Speed technical committee of the Quantitative Imaging Biomarker Alliance of the Radiological Society of North America, there was a statistically significant difference in shear wave speed estimates among systems [13,28]. The differences in LS measurements across different US systems could be related to variations across manufacturers in aspects of the hardware, such as the frequency of the transducer and beam-forming technology, or the algorithm used to measure shear wave velocity [28]. Moreover, several previous studies also reported that a significant difference in LS measurements or shear wave velocity measurements was consistently observed across different SWE techniques [11,18,29,30].
In addition, the two 2D-SWE systems showed excellent technical success and reliable LS measurement rates, similar to those of TE. Our study results correspond well with the results of the previous study by Bende et al. [31], who demonstrated a similar rate of reliable LS measurements for TE and 2D-SWE (LOGIQ E9). We also found that for the two 2D-SWE systems, the AUROCs with the optimal cut-off values (7.8-8.3 kPa) were not significantly different for the detection of significant fibrosis (P=0.436), when using the cut-off values of TE as the reference standard.
One of the limitations in our study was the relatively small size of the study population, with an uneven distribution of liver fibrosis grades. However, we believe that the sample size was large enough to estimate technical success, the reliability of LS measurements, and intersystem agreement in LS measurements. Second, there was no histological diagnosis of fibrosis staging in the patients. For the reference method, we used TE, which has been well validated in previous studies. Furthermore, the primary goal of our study was to evaluate intersystem variability in the measurement of LS. Another important limitation is that a single radiologist performed all examinations with the two 2D-SWE systems and TE. However, we did not allow the operator to see the summary of the serial measurements of each system until all measurements made using the three systems were completed. We think that recall bias was reduced to a reasonable level. Finally, our study was performed retrospectively. Further large prospective studies with histologic confirmation are needed.
In conclusion, the two 2D-SWE systems showed a similar technical success rate and reliability of LS measurements, but there was significant intersystem variability. Therefore, even 2D-SWE systems from the same manufacturer should not be used interchangeably in longitudinal follow-up.
Notes
No potential conflict of interest relevant to this article was reported.