Hi All,
I wanted to start this thread here to see what people's views are about whether to use a fixed bin size (FBS) or a fixed bin number (FBN) for gray value discretization.
At PyRadiomics, we generally advise using a FBS (although we now also provide an implementation of FBN), whereas in the IBSI document a FBN is most often recommended, with in some cases FBS as a valid alternative.
Moreover, there also exist more alternative methods, which also have their pros and cons.
As far as I know there is no consensus on which method to use and IMHO not enough discussion to tackle this problem.
Therefore, I thought it was a good idea to start a thread. On the one hand to get a bit of discussion going, on the other, to provide a reference to give to users who are also struggling with this issue.
"
Fixed bin number.
In the fixed bin number method, intensities Xgl are discretised to a fixed
number of Ng bins.
(...)
In short, the intensity Xgl,k of voxel k is corrected by the lowest occurring intensity Xgl,min in
the ROI, divided by the bin width (Xgl,max − Xgl,min) /Ng, and subsequently rounded up to the
nearest integer.
The fixed bin number method breaks the relationship between image intensity and physiological
meaning (if any). However, it introduces a normalising effect which may be beneficial when
intensity units are arbitrary (e.g. raw MRI data and many spatial filters), and where contrast
is considered important. Furthermore, as values of many features depend on the number of grey
levels found within a given ROI, the use of a fixed bin number discretisation algorithm allows for a
direct comparison of feature values across multiple analysed ROIs (e.g. across different samples).
Fixed bin size.
Fixed bin size discretisation is conceptually simple. A new bin is assigned for
every intensity interval with width wb; i.e. wb is the bin width, starting at a minimum Xgl,min. The
minimum intensity may be a user-set value as defined by the lower bound of the re-segmentation
range, or data-driven as defined by the minimum intensity in the ROI Xgl,min = min (Xgl). In
all cases, the method used and/or set minimum value must be clearly reported. However, to
maintain consistency between samples, we strongly recommend to always set the same minimum
value for all samples as defined by the lower bound of the re-segmentation range (e.g. HU of -500
for CT, SUV of 0 for PET, etc.). In the case that no re-segmentation range may be defined due
to arbitrary intensity units (e.g. raw MRI data and many spatial filters), the use of the fixed bin
size discretisation algorithm is not recommended.
(...)
The fixed bin size method has the advantage of maintaining a direct relationship with the
original intensity scale, which could be useful for functional imaging modalities such as PET.
(...)
A comparison of discretisation methods.
As mentioned earlier, the discretisation method that leads to best feature inter- and intra-patient reproducibility is modality-dependent.
Recommendations for the possible combinations of different imaging intensity definitions,
re-segmentation ranges and discretisation algorithms are provided in Table 3.1.
The effect of the number of bins for fixed bin number discretisation was studied by Hatt
et al. (2015), in a large methodological study with 555 pretreatment FDG-PET images covering
a range of different tumours. They found that fixed bin number discretisation using
64 bins provides the best compromise between differentiation and resolution.
Leijenaar et al. (2015) compared the effect of fixed bin size and fixed bin number discretisation
methods on texture features from FDG-PET images recorded in a cohort of 35
non-small cell lung cancer patients. They concluded that fixed bin size may be more appropriate
for inter- and intra-patient comparison of texture feature values in a clinical setting.
In another methodological study van Velden et al. (2016) also assessed the effect of fixed
bin number versus fixed bin size methods. They concluded that texture features from FDGPET
images had better repeatability and lower sensitivity to delineation changes using the
fixed bin size discretisation method.
It should also be noted that several studies have used fixed bin size for CT images, e.g.
(Aerts et al., 2014; van Dijk et al., 2017). Both studies used a bin size of 25 HU. However, the
authors of both studies did not report on the minimum grey level used in the discretisation
process, which essentially precludes the reproducibility of their findings.
"
On the other hand, this is the rationale we follow at pyradiomics (available in the
FAQ section of the documentation):
"
What about gray value discretization? Fixed bin width? Fixed bin count?
Currently, although many studies favour a fixed bin count over a fixed bin width, there is no hard evidence favouring either a fixed bin width or a fixed bin count in all cases. Therefore PyRadiomics implements both the option for setting a fixed bin count (binCount) and a fixed bin width (binWidth, default).
The reason the a fixed bin width has been chosen as the default parameter is based in part on studies in PET that show a better reproducibility of features when implementing a fixed bin width [1]. Furthermore, our reasoning is best illustrated by the following example: Given an input with 2 images with 2 ROIs, with the range of gray values in the first being {0-100} and in the second {0-10}. If you use a fixed bin count, the “meaning” of 1 (discretized) gray value difference is different (in the first it means 10 gray values different, in the second just 1). This means you are looking at texture based on very different contrasts.
This example does assume that the original gray values mean the same thing in both images, and in case of images with definite/absolute gray values (e.g. HU in CT, SUV in PET imaging), this holds true. However, in case of arbitrary/relative gray values (e.g. signal intensity in MR), this is not necessarily the case. In this latter case, we still recommend a fixed bin width, but with additional pre-processing (e.g. normalization) to ensure better comparability of gray values. Use of a fixed bin count would be possible here, but then the calculated features may still be very influenced by the range of gray values seen in the image, as well as noise caused by the fact that the original gray values are less comparable. Moreover, regardless of type of gray value discretization, steps must be taken to ensure good comparability, as the first order features largely use the original gray values (without discretization).
Finally, there is the issue of what value to use for the width of the bin. Again, there are currently no specific guidelines from literature as to what constitutes an optimal bin width. We try to choose a bin width in such a way, that the resulting amount of bins is somewhere between 30 and 130 bins, which shows good reproducibility and performance in literature for a fixed bin count [2]. This allows for differing ranges of intensity in ROIs, while still keeping the texture features informative (and comparable inter lesion!).
"
Here, the reference [1] is the same paper by Leijenaar et al (2015) that is mentioned in the IBSI document
Regards,
Joost