Unsupervised speckle noise reduction technique for clinical ultrasound imaging
Article information
Abstract
Purpose
Deep learning–based image enhancement has significant potential in the field of ultrasound image processing, as it can accurately model complicated nonlinear artifacts and noise, such as ultrasonic speckle patterns. However, training deep learning networks to acquire reference images that are clean and free of noise presents significant challenges. This study introduces an unsupervised deep learning framework, termed speckle-to-speckle (S2S), designed for speckle and noise suppression. This framework can complete its training without the need for clean (speckle-free) reference images.
Methods
The proposed network leverages statistical reasoning for the mutual training of two in vivo images, each with distinct speckle patterns and noise. It then infers speckle- and noise-free images without needing clean reference images. This approach significantly reduces the time, cost, and effort experts need to invest in annotating reference images manually.
Results
The experimental results demonstrated that the proposed approach outperformed existing techniques in terms of the signal-to-noise ratio, contrast-to-noise ratio, structural similarity index, edge preservation index, and processing time (up to 86 times faster). It also performed excellently on images obtained from ultrasound scanners other than the ones used in this work.
Conclusion
S2S demonstrates the potential of employing an unsupervised learning-based technique in medical imaging applications, where acquiring a ground truth reference is challenging.
Introduction
Controlling speckle noise presents a significant challenge in ultrasound imaging. Typically, an ultrasound image is reconstructed by acquiring backscattered signals that result from the interaction between the tissue structure and the emitted acoustic wave [1]. During this process, Rayleigh scattering occurs when objects are smaller than the wavelength of the transmitted wave. This scattering causes the interference of randomly scattered coherent echoes, both constructive and destructive, which form unique speckle patterns. These patterns can obscure the detection of small or low-contrast lesions, thus complicating the diagnostic process [2]. Therefore, suppressing these speckle patterns is essential in most clinical applications to ensure an accurate diagnosis [2].
Conventional speckle reduction techniques include adaptive speckle filtering methods, such as moving-window filters, which are designed to suppress the multiplicative non-Gaussian characteristics of speckle patterns [3]. Filters developed by Lee and Kuan take into account local image statistics, whereas Frost's filter attempts to estimate noise-free images by convolving them with a spatially variable kernel [4-6]. However, these methods can become unstable when certain parameter combinations are used or in specific regions of the image [7].
Other methods include speckle-reducing anisotropic diffusion (SRAD) and optimized Bayesian non-local means filter (OBNLM). Conventional anisotropic diffusion mitigates speckle by modulating the diffusion rate according to the brightness gradient between each pixel and its neighbors. However, it is influenced by the size and shape of the filter window [7,8]. SRAD improves upon this technique by minimizing filter bias and enhancing performance, while also preserving edges [9]. OBNLM, an advanced version, adapts the non-local means filter specifically for speckle patterns, effectively reducing noise through averaging based on image similarity [10,11]. Specifically, OBNLM maintains edge definition and is more effective at speckle reduction. Nonetheless, this method necessitates empirical parameter selection, and its complex algorithm poses challenges for real-time implementation.
Recently, deep learning–based denoising techniques have effectively suppressed complex nonlinear noise. Zhang et al. [12] demonstrated that a convolutional neural network (CNN) architecture, based on a deep residual network, could learn to reduce speckle using conventional filtering methods. Lan and Zhang [13] enhanced image features and suppressed speckle noise in real-time using a mixed-attention based residual UNet. However, these supervised learning methods require ground truth data for training, and the quality of the training data significantly influences the results. Acquiring ground truths for medical images is generally challenging, and the difficulty in obtaining realistic ultrasound images without speckle patterns significantly hinders the application of artificial intelligence [14]. Therefore, medical image learning methods such as positron emission tomography image denoising [15], medical image translation using generative adversarial networks (e.g., computed tomography-positron emission tomography) [16], and anomaly detection in brain magnetic resonance imaging [17] are transitioning from supervised to unsupervised learning, which does not require reference images. Noise2Noise, introduced by Lehtinen et al. in 2018 [18], demonstrated that noise could be removed from images by training solely with corrupted (noisy) images, without the need for clean (ground truth) images, achieving results comparable to those of conventional supervised learning. Yin et al. [19] employed the Noise2Noise (N2N) technique by adopting a method that acquires independent speckle instances without changing coherent imaging scanning settings. Gobl et al. [20] generated independent speckle images using ultrasonic simulation data and applied this approach to Noise2Noise.
This study introduces an unsupervised learning-based framework named the S2S (speckle-to-speckle) network, designed to suppress speckle noise exclusively using in vivo data, without the need for reference data. The in vivo data are generated by leveraging the varying characteristics of speckle patterns, which change according to the steering angle of the ultrasound device during the image acquisition process. By applying the N2N concept, this method effectively removes speckle using only images that contain speckle. Both qualitative comparisons and quantitative evaluations with traditional speckle reduction methods demonstrate that this approach not only removes speckle efficiently but also significantly speeds up the speckle removal process. Furthermore, the present study assessed the performance of this method and aimed to establish its practical feasibility by using an extended (open-source) in vivo dataset, which includes images of various body regions randomly acquired with different scanners.
Materials and Methods
S2S: Unsupervised Learning–Based Framework
Background
Deep neural networks typically solve denoising problems by mapping corrupted and clean signals rather than using a priori statistical modeling. A conventional denoising network uses a regression model (e.g., a CNN) to learn the data of a pair of a corrupted input,
where fθ is the network function,
This function learns to minimize the arithmetic mean of the predicted value,
The input and the target are not mapped 1:1 in the learning process; accordingly, various target values exist. Therefore, L2 enables the network to learn the result of averaging all possible cases. The optimal prediction value is z=Ey{y}. In this case, y does not need to be a clean target. Even if each y were a random variable, the optimal z can be determined only if the mean predicted z value matches the target, y, as shown in Fig. 1B. Therefore,
where fθ is the network function,
S2S network design
The network architecture is based on the U-Net architecture, which consists of encoding and decoding stages. The encoding stage includes 3×3 convolution, leaky ReLU functions, and 2×2 max-pooling down-sampling functions. Initially, simple features are extracted by the front layer, followed by the extraction of high-semantic features in a subsequent layer. During this process, the network identifies the semantic attributes of an object. The decoding stage also utilizes 3×3 convolution, leaky ReLU functions, and 2×2 up-sampling functions. This up-sampling process helps in reconstructing features. Features that were previously extracted and then up-sampled are combined through channel-wise concatenation. By concatenating the input with the final stage of the decoder, the model is enabled to achieve high-quality image restoration. Therefore, features representing both global and local information are captured, as illustrated in Fig. 2.
Training and inference
Fig. 3 shows the learning and inference processes of the proposed network. To train the network using the proposed method, a dataset consisting of pairs of noisy images must be created, necessitating a variety of speckle pattern images. However, speckle possesses a unique characteristic in that it does not display typical noise patterns; it consistently follows the same pattern under identical acquisition conditions. Therefore, acquiring a range of speckle patterns poses a significant challenge.
Recently, researchers have attempted to address this challenge by artificially generating data through simulations that only vary in speckle [20]. However, since simulation data is not acquired in the same environment as real medical data, models trained on this simulated data often fail to produce consistent results when applied to in vivo data, which is influenced by various environmental factors [20].
Speckle can be generated by altering the ultrasound frequency or by adjusting the incident angle of the transmitted wave. While both the speckle and the imaging region change significantly with frequency, only the speckle pattern is affected by changes in the transmission angle. Fig. 4 shows the differences in speckle patterns at various transmission angles. Fig. 4A is an image of the carotid artery obtained at a 0° transmission angle. The red box represents the region of interest (ROI) for detailed observation of the speckle patterns. Fig. 4B illustrates the speckle patterns within the red box at transmission angles of -4°, 0°, and +4°, visually confirming the impact of transmission angle on the speckle pattern. Furthermore, Fig. 4C displays the differences between the speckle patterns at these angles as residual images, effectively highlighting the variations in the pattern due to changes in angle. The proposed method was designed to gather data through these phenomena. Therefore, the training dataset contained images acquired at multiple steering angles, and a single image (θ=0°) was used to infer the speckle-free state.
Experimental Setup
Data acquisition
Ultrasound data for learning and inference were collected by imaging the common carotid artery in 12 healthy volunteers. This study was approved by the Institutional Review Board of the University of Daegu Gyeongbuk Institute of Science & Technology (Study # DGISTIRB-202206-002) and followed Ministry of Food and Drug Safety (US) guidelines. The carotid artery was targeted in two orthogonal planes (longitudinal and transverse), and images were acquired at the same location nine times at nine angles (steering range: -4° to 4°). The dataset consisted of a total of 7,776 pairs (angle pairs×frames×volunteers=72×9×12=7,776). The acquisition data were organized as follows. For angle θ1, eight pairs were generated at angles θ2 to θ9, resulting in a total of 72 angle pairs across the nine angles. The datasets were constructed as follows: 80% of the total data, equivalent to 6,222 sets, were used for the training dataset. The remaining data were randomly assigned for validation and testing, with each receiving 10% of the total data, equivalent to 777 sets each. Open-source data were then used to verify the generalizability of the proposed model. These data consisted of images of different structures (e.g., abdomen, breast, and thyroid) that were acquired using different medical ultrasound systems and collected from Kaggle [21,22]. Beamformed I/Q data were used for learning. The image size was 2,752×128 pixels. Additionally, random flipping and random cropping were used for data augmentation. The mean image size of the open-source data was 390×495 pixels.
The ultrasound data were acquired using a 256-channel programmable research scanner (Vantage 256, Verasonics, Inc., Redmond, WA, USA) and a 192-element linear probe with an 8.9 MHz central frequency and a 0.2 mm pitch size (L12-3, Verasonics, Inc.). A coherent compound transmission scheme was used, with a ±4° steering angle range, nine angles, and a 300 Hz pulse repetition frequency [23].
Implementation details
The PyTorch 1.9.0 Python library was utilized to implement the S2S network, which underwent training on a four-way graphics processing unit server (GeForce RTX 3090, NVIDIA Corp., Santa Clara, CA, USA). The Adam optimization algorithm was employed [24], setting the learning rate at 1×10-4, with a batch size of 65, and conducting 100 training epochs. The L2 loss function was applied to minimize errors. Following the training phase, all tasks were executed and assessed on a desktop computer equipped with a central processing unit (AMD Ryzen 9 3900X 12-Core Processor, 3.79 GHz, AMD Corp., Brooklyn, NY, USA).
To evaluate the performance, the proposed S2S method was compared with other speckle reduction methods (i.e., SRAD and OBNLM). Initial filtering parameters for these two methods were fully consulted from Finn et al. [25], with appropriate adjustment according to the image input as shown in Table 1. For all studies, the number of iterations was modified based solely on the input image.
Evaluation metrics
The performance of several speckle reduction algorithms was compared using four performance metrics: the signal-to-noise ratio (SNR), contrast-to-noise ratio (CNR), mean structural similarity index measure (MSSIM), and edge preservation index (EPI). The SNR and CNR describe improvements in image quality, whereas the MSSIM and EPI indicate the preservation of edge information following speckle reduction.
Fig. 5 shows the original images utilized in this study. The red and blue boxes highlight the signal and background regions, respectively, with the edge boundaries marking the EPI target. Notably, the regions within the red boxes (Fig. 5A, B) correspond to the carotid artery. Despite this, these areas are treated as background regions due to their significantly lower signal intensity relative to the surrounding tissues. The MSSIM was computed across the entire image. The performance metrics are defined as follows.
The SNR, expressed in decibels (dB), is the ratio of the desired signal to the noisy signal. A higher ratio indicates less noise.
where μsig and σnoise are the mean and standard deviation, respectively, in the region of interest.
The CNR is similar to the SNR, but a term is subtracted before the ratio is calculated. This approach is useful for images with significant bias.
The MSSIM represents the ideal structural similarity as a value between 0 and 1.
where μx and μy, and σx and σy are the mean and standard deviation, respectively, in the region of interest, and σxy and Ci are the covariance and a constant, respectively [26].
Finally, the EPI reflects the extent to which edge details are maintained after denoising [27].
where
The EPI is derived from
where ∆s(i, j) is a high-pass-filtered version of s(i, j), obtained through a 3×3 pixel approximation of the Laplacian operator. The EPI is based on correlation; the nearer it is to 1, the more similar is the estimated image to the reference image.
Statistical Analysis
The significance of differences between the proposed speckle reduction technique (S2S) and existing methods (SRAD, OBNLM) were assessed using the paired t-test for metrics including SNR, CNR, MSSIM, and EPI. Data for these techniques were sourced from open ultrasound datasets on Kaggle, comprising 30 breast and 30 thyroid images. The statistical results, calculated from the raw data, were presented as mean±standard error. To further confirm statistical significance, the paired t-test were employed. For each test, the null hypothesis (H0) posited no significant difference between the input data and the speckle-reduced data. A P-value below 0.05 was established as a moderate criterion for rejecting H0 at a 5% significance level. Additionally, P-values below 0.01 and 0.001 were considered to indicate greater significance and very high significance, respectively. The statistical analysis was conducted using the "t-test" function built into Excel (Microsoft, Redmond, WA, USA).
Results
Evaluation Using the Training Dataset
The validation results of the training network are shown in Fig. 6. The top part of the figure illustrates that the proposed technique (Fig. 6E) performs comparably to other speckle reduction methods (Fig. 6C, D). The intensity variations of these techniques along the yellow dotted line in the top part of Fig. 6 are detailed in the bottom part of the figure. The 0-25 mm section was divided into four subsections: [a]-[d]; subsections [b] and [d] were identified as speckle regions due to their low and unstable intensities compared to [a] and [c]. The suppression intensities of SRAD (yellow dotted line) and OBNLM (purple dotted line) were evaluated against the unstable intensity of the original image. Additionally, the intensity of the proposed technique (red) was similar to those of the other speckle reduction techniques, indicating efficient speckle reduction by the method introduced herein. While SRAD significantly reduced speckle, it made the internal details of the original image difficult to discern. In contrast, both OBNLM and S2S effectively reduced speckle while preserving the structural details. Moreover, the evaluation metrics presented in Table 2 indicate that both this method and OBNLM successfully maintained the image structures. The SNR and CNR results demonstrate that S2S outperformed the other methods.
Evaluation Using the Validation Dataset
Fig. 7 displays the validation results. The proposed method's ability to suppress speckle was comparable to that of the methods it was measured against, as illustrated in Table 2 and Fig. 7. While SRAD achieved the highest CNR value, it also recorded the lowest EPI and MSSIM values. Therefore, although SRAD can improve image quality, it is not the best choice for medical image processing where detail preservation is essential. Moreover, due to its outstanding performance across multiple metrics, the proposed S2S method is especially appropriate for medical imaging applications that require the retention of fine details.
S2S Generalizability to Ultrasound Images Acquired Using Different Scanners
The speckle reduction algorithms were applied to abdominal data from various ultrasound scanners, which were acquired from Kaggle (Fig. 8). Qualitatively, the OBNLM algorithm effectively suppressed the speckle patterns, although it resulted in the loss of kidney details. The image produced by SRAD was blurrier than that generated by the proposed technique. Quantitatively, S2S outperformed all other techniques across all evaluation metrics, as shown in Table 2. Additional experiments were conducted to further verify the effectiveness of the S2S algorithm on diverse ultrasound data. Breast and thyroid data were acquired from Kaggle, comprising 60 cases in total—30 for breast and 30 for thyroid, each dataset originating from different sources.
Table 3 and Fig. 9 present the statistical analysis of each metric using breast and thyroid datasets. Overall, S2S demonstrated superior performance in speckle suppression and the simultaneous retention of edges and structures compared to existing methods. Specifically, it surpassed SRAD across all metrics and matched or exceeded OBNLM in all indicators, with the exception of the SNR.
When comparing S2S with OBNLM, the latter demonstrated a higher SNR by 0.53 dB, a difference that was statistically significant (P<0.05). However, S2S showed superior performance in terms of CNR and MSSIM, with increases of 1.08 dB and 0.05, respectively, both of which were highly statistically significant (P<0.001). Additionally, the difference in EPI between S2S and OBNLM was not statistically significant. These findings suggest that S2S and OBNLM have comparable abilities in reducing speckle.
Overall, although S2S may not always produce the best results across all datasets, it generally remains competitive when compared to existing methods. Fig. 10 illustrates the performance of each method on images from the breast and thyroid datasets.
Processing Time Comparison
In real-time image processing, it is crucial to compare computation times. Traditional methods typically depend on iterative processes that can greatly increase the computational burden, even when operating at higher speeds. In contrast, the model proposed herein, S2S, utilizes a non-iterative approach, processing data through the network in a single pass. This design allows S2S to achieve significant improvements in speed. As demonstrated in Table 4 and Fig. 11, S2S significantly outperforms conventional methods such as SRAD and OBNLM across various data types.
For a simple numerical comparison, S2S is 86 times faster than SRAD and 33 times faster than OBNLM in terms of processing times. While conventional methods such as OBNLM can take up to 2.57 seconds to process an image, S2S can accomplish the same task in less than 0.3 seconds, demonstrating its ability to effectively process images in real time.
However, since the existing algorithms were implemented in MATLAB and the model proposed herein was developed in PyTorch, it was crucial to ensure a fair comparison. To mitigate potential biases arising from the use of different libraries, the following measures were implemented.
Uniform computing environment: All algorithms were executed with identical CPU specifications, memory capacities, and operating systems to negate hardware advantages.
Standardized time measurement: To ensure consistency in recording computation times, employed language-specific functions with similar purposes were employed. In Python, the time module was utilized, and in MATLAB, the "tic" and "toc" functions were used. Timing was strictly measured from just before the processing began to immediately after its completion.
Single-thread execution: Both MATLAB and PyTorch are capable of multicore processing. However, to minimize the impact of environmental variations on performance, all tests were conducted in single-thread mode.
The performance metrics outlined in Table 4, along with the fairness measures implemented, underscore the S2S model's comparative advantage over traditional iterative image processing techniques. This underscores its potential for broad application in medical and other real-time image processing domains.
Discussion
The results demonstrated that the proposed unsupervised learning-based S2S network could effectively reduce image speckle. Furthermore, its processing time was 33- and 86-fold shorter than those of two compared methods; thus, this study confirmed that adopting a non-iterative approach significantly reduced computation time, making real-time implementation feasible. The common carotid artery was imaged multiple times, and the resulting data were used to construct a training dataset. This method is not necessary for speckle reduction because the pattern does not change. However, it provides meaningful information about system noise. For datasets configured as described, system noise can be removed according to the proposed theory. Therefore, configuring the dataset in this way increases the image SNR, thus improving image quality.
In this study, the existing N2N framework was applied to reduce speckle noise. Typically, N2N is optimized for uncorrelated noise using an unsupervised learning approach, which primarily focuses on the characteristics of white Gaussian additive noise. However, the authors’ research has shown through various experiments that this framework can also effectively handle speckle noise. This finding contrasts with previous studies [28,29] that recommended modifications to the N2N framework for dealing with noise types other than Gaussian additive noise, suggesting that this approach is also viable for speckle noise. In future research, the plan is to further validate the versatility and efficiency of these results by comparing them with existing studies that have implemented modified N2N frameworks.
Experiments were conducted to determine the optimal network depth and channel configuration for the proposed method. In these experiments, the network underwent four-to-sixfold down-sampling, which is typical for U-Net structures, network-to-network structures, and this proposed network, respectively. As illustrated in Fig. 12, there were no significant differences in performance between the various depths and channel configurations; however, sixfold down-sampling demonstrated the best performance (Table 5). As the network depth increased, so did the number of training parameters. Specifically, the difference in parameters between fourfold and sixfold down-sampling was approximately 6.3 times. It was anticipated that both the network depth and the number of channels would contribute to improved speckle reduction performance.
However, although improvements in speckle reduction were observed, the enhancements in the SNR and CNR values were considerably more noticeable. Therefore, the differences in system noise training, influenced by increases in network parameters, were reflected in the evaluation metrics. Furthermore, the network depth exceeded the number of channels. Accordingly, it is recommended to increase the channel depth to achieve better evaluation metrics in the images.
Fig. 13 shows the suppression intensities at various network depths and numbers of channel networks along the yellow dotted line shown in Fig. 12A. The networks exhibited similar reductions in speckle. Such reduction is not problematic for network depths of ≥4, a typical configuration in U-Net structures. Network performance is anticipated to decline at network depths of <4. With down-sampling depths of ≥4, the network depth can be optimized based on the performance of the equipment used to implement the method.
In practice, the present study focused on reducing speckle in clinical ultrasound images using the method proposed herein. To ensure the fairness of this approach, a simulation experiment was conducted, as depicted in Fig. 14. The simulation data were generated using the "Cyst phantom" in Field II software [30], using the default parameters specified in the example code for the simulation. The parameters of the methods applied to reduce speckle in the simulation data are detailed in Table 1. Each method was evaluated in two regions of interest (ROIs): ROI 1 (red box) and ROI 2 (green box).
Qualitatively, both S2S and the conventional methods (SRAD and OBNLM) effectively removed the speckle pattern. However, quantitatively, the SNR value in ROI 1 was highest for OBNLM, with only a marginal difference of about 0.2 dB compared to S2S. In all other metrics within the ROI areas, S2S outperformed the conventional methods (Table 6). Therefore, it was confirmed that the proposed method can effectively suppress speckle in simulation data.
S2S has two primary limitations. First, while it can be generalized using external abdominal data, the results are not entirely independent of the biases inherent in each system. Practical constraints also impede the collection and verification of data across different ultrasound systems. To address this, the authors plan to collect data from a variety of ultrasound systems and further investigate the generalizability of S2S through additional experiments. Second, the operational process of the network is challenging to clarify due to the complexities of deep learning. When deep learning is applied to general data, such as landscapes, numbers, or populations, ethical issues are minimal. However, in medical applications, where the technology directly impacts human health, trust in the technology is crucial. With the goal of implementing this technique in medical settings, it is essential to elucidate its processes. Explainable artificial intelligence, although still in its early stages, offers a solution to the opacity of deep learning. This emerging technology holds promise for enhancing S2S by addressing these 'black box' issues in future research.
This study proposes an unsupervised learning-based speckle reduction technique called the S2S network, which does not require clean (speckle-free) reference images for training. To confirm the effectiveness of the proposed method, in vivo studies were conducted and the performance of S2S was compared both qualitatively and quantitatively with existing speckle reduction techniques. Additionally, the processing time was significantly reduced compared to iterative speckle reduction algorithms, suggesting the potential for real-time implementation. The S2S network demonstrates the viability of using an unsupervised learning-based approach in medical imaging applications, where obtaining a ground truth reference is challenging.
Notes
Author Contributions
Conceptualization: Jung D, Kang M, Park SH, Yu J. Data acquisition: Jung D, Kang M, Guezzi N. Data analysis or interpretation: Jung D. Drafting of the manuscript: Jung D, Kang M, Guezzi N, Yu J. Critical revision of the manuscript: Jung D, Park SH, Yu J. Approval of the f inal version of the manuscript: all authors.
No potential conflict of interest relevant to this article was reported.
Acknowledgements
This work was supported by a National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (NRF-RS-2023-00211941, NRF-2018R1A5A1025511).
References
Article information Continued
Notes
Key point
Deep learning–based image processing has potential for addressing complex nonlinear artifacts and noise. However, acquiring the necessary clean reference images for training presents a significant challenge. The proposed unsupervised "speckle-to-speckle (S2S)" deep learning framework effectively models complex ultrasonic speckle patterns and noise without requiring clean reference images. S2S significantly reduces the time, cost, and effort required for manual annotation and outperforms existing techniques in terms of signal-to-noise ratio, contrast-to-noise ratio, and processing speed.