Fixed Bin Size (FBS) vs. Fixed Bin Number (FBN)

Joost van Griethuysen

unread,

Jun 28, 2018, 8:11:01 AM6/28/18

to Image Biomarker Standardisation Initiative

Hi All,

I wanted to start this thread here to see what people's views are about whether to use a fixed bin size (FBS) or a fixed bin number (FBN) for gray value discretization.

At PyRadiomics, we generally advise using a FBS (although we now also provide an implementation of FBN), whereas in the IBSI document a FBN is most often recommended, with in some cases FBS as a valid alternative.

Moreover, there also exist more alternative methods, which also have their pros and cons.

As far as I know there is no consensus on which method to use and IMHO not enough discussion to tackle this problem.

Therefore, I thought it was a good idea to start a thread. On the one hand to get a bit of discussion going, on the other, to provide a reference to give to users who are also struggling with this issue.

Following a quotation from the IBSI work document (v6):

"

Fixed bin number.

In the fixed bin number method, intensities Xgl are discretised to a fixed number of Ng bins.

(...)

In short, the intensity Xgl,k of voxel k is corrected by the lowest occurring intensity Xgl,min in the ROI, divided by the bin width (Xgl,max − Xgl,min) /Ng, and subsequently rounded up to the nearest integer. The fixed bin number method breaks the relationship between image intensity and physiological meaning (if any). However, it introduces a normalising effect which may be beneficial when intensity units are arbitrary (e.g. raw MRI data and many spatial filters), and where contrast is considered important. Furthermore, as values of many features depend on the number of grey levels found within a given ROI, the use of a fixed bin number discretisation algorithm allows for a direct comparison of feature values across multiple analysed ROIs (e.g. across different samples).

Fixed bin size.

Fixed bin size discretisation is conceptually simple. A new bin is assigned for every intensity interval with width wb; i.e. wb is the bin width, starting at a minimum Xgl,min. The minimum intensity may be a user-set value as defined by the lower bound of the re-segmentation range, or data-driven as defined by the minimum intensity in the ROI Xgl,min = min (Xgl). In all cases, the method used and/or set minimum value must be clearly reported. However, to maintain consistency between samples, we strongly recommend to always set the same minimum value for all samples as defined by the lower bound of the re-segmentation range (e.g. HU of -500 for CT, SUV of 0 for PET, etc.). In the case that no re-segmentation range may be defined due to arbitrary intensity units (e.g. raw MRI data and many spatial filters), the use of the fixed bin size discretisation algorithm is not recommended.

(...)

The fixed bin size method has the advantage of maintaining a direct relationship with the original intensity scale, which could be useful for functional imaging modalities such as PET.

(...)

A comparison of discretisation methods.

As mentioned earlier, the discretisation method that leads to best feature inter- and intra-patient reproducibility is modality-dependent.

Recommendations for the possible combinations of different imaging intensity definitions,

re-segmentation ranges and discretisation algorithms are provided in Table 3.1.

The effect of the number of bins for fixed bin number discretisation was studied by Hatt

et al. (2015), in a large methodological study with 555 pretreatment FDG-PET images covering

a range of different tumours. They found that fixed bin number discretisation using

64 bins provides the best compromise between differentiation and resolution.

Leijenaar et al. (2015) compared the effect of fixed bin size and fixed bin number discretisation

methods on texture features from FDG-PET images recorded in a cohort of 35

non-small cell lung cancer patients. They concluded that fixed bin size may be more appropriate

for inter- and intra-patient comparison of texture feature values in a clinical setting.

In another methodological study van Velden et al. (2016) also assessed the effect of fixed

bin number versus fixed bin size methods. They concluded that texture features from FDGPET

images had better repeatability and lower sensitivity to delineation changes using the

fixed bin size discretisation method.

It should also be noted that several studies have used fixed bin size for CT images, e.g.

(Aerts et al., 2014; van Dijk et al., 2017). Both studies used a bin size of 25 HU. However, the

authors of both studies did not report on the minimum grey level used in the discretisation

process, which essentially precludes the reproducibility of their findings.

"

On the other hand, this is the rationale we follow at pyradiomics (available in the FAQ section of the documentation):

"

What about gray value discretization? Fixed bin width? Fixed bin count?

Currently, although many studies favour a fixed bin count over a fixed bin width, there is no hard evidence favouring either a fixed bin width or a fixed bin count in all cases. Therefore PyRadiomics implements both the option for setting a fixed bin count (binCount) and a fixed bin width (binWidth, default).

The reason the a fixed bin width has been chosen as the default parameter is based in part on studies in PET that show a better reproducibility of features when implementing a fixed bin width [1]. Furthermore, our reasoning is best illustrated by the following example: Given an input with 2 images with 2 ROIs, with the range of gray values in the first being {0-100} and in the second {0-10}. If you use a fixed bin count, the “meaning” of 1 (discretized) gray value difference is different (in the first it means 10 gray values different, in the second just 1). This means you are looking at texture based on very different contrasts.

This example does assume that the original gray values mean the same thing in both images, and in case of images with definite/absolute gray values (e.g. HU in CT, SUV in PET imaging), this holds true. However, in case of arbitrary/relative gray values (e.g. signal intensity in MR), this is not necessarily the case. In this latter case, we still recommend a fixed bin width, but with additional pre-processing (e.g. normalization) to ensure better comparability of gray values. Use of a fixed bin count would be possible here, but then the calculated features may still be very influenced by the range of gray values seen in the image, as well as noise caused by the fact that the original gray values are less comparable. Moreover, regardless of type of gray value discretization, steps must be taken to ensure good comparability, as the first order features largely use the original gray values (without discretization).

Finally, there is the issue of what value to use for the width of the bin. Again, there are currently no specific guidelines from literature as to what constitutes an optimal bin width. We try to choose a bin width in such a way, that the resulting amount of bins is somewhere between 30 and 130 bins, which shows good reproducibility and performance in literature for a fixed bin count [2]. This allows for differing ranges of intensity in ROIs, while still keeping the texture features informative (and comparable inter lesion!).

"

Here, the reference [1] is the same paper by Leijenaar et al (2015) that is mentioned in the IBSI document

Regards,

Joost

Martin Vallières

unread,

Aug 20, 2018, 4:12:17 PM8/20/18

to Image Biomarker Standardisation Initiative

Hi Joost,

Thank you very much for your message and for initiating a discussion about this crucial topic in radiomics feature computation.

In short, I believe that there is no one-fits-all solution here. In my view, the optimal type of discretization and optimal discretization bin width/count are both application-dependent (imaging modality, clinical problem investigated, etc.) AND feature-dependent. For that reason, I would definitely support the idea of defining new IBSI benchmarks for performing fixed bin width discretization for arbitrary intensities (e.g. raw MRI) via a normalization process (currently missing in the IBSI document). Could you please give more details about the normalization process that you use in PyRadiomics for performing fixed bin width discretization on arbitrary intensities? Thanks!

Best,

Martin

P.S. Apologies for the delay of my reply!

Joost van Griethuysen

unread,

Aug 24, 2018, 7:24:42 AM8/24/18

to Image Biomarker Standardisation Initiative

Hi Martin,

I fully agree with you that there is no one-size fits all solution here. One can even use histogram equalization to define the bins (which are then not equally spaced, such as in fixed bin number or size), as you've used in your paper from 2015. It is therefore not my intention for this topic to get at a sort of definitive answer, but more a place were we can discuss the different views, and have a central location where we can point other researchers with questions regarding this topic.

As to the normalization of arbitrary intensities, PyRadiomics currently achieves this by subtracting the mean and the dividing by the standard deviation. The mean and std here are calculated using the whole image as input (including background etc.)

Furthermore, there is an additional scaling factor possible, as well as the option to exclude outliers (currently by setting them to the specified outlier value, but it may be better to just exclude the voxels, i.e. resegmentation).

This is the link to the part of the pyradiomics code/documentation that handles normalization

However, there are of course other possibilities, such as linear rescaling, or using another source to determine how to scale. In this latter case, this can be e.g. using another ROI to determine the mean and std (thereby forcing the image to have the mean and standard deviation of the intensities comparable between those ROIs. An example of this is the oft-used normalization to muscle tissue in MRI. This removes the potential influence of artefacts and the amount of background voxels, which occur when using the mean and std from the entire image. Again, there is no one-size-fits-all solution and the method that is best is dependent on what kind of dataset it is to be used upon.

Regards,

Joost

Andrey Fedorov

unread,

Nov 4, 2018, 3:54:52 PM11/4/18

to Image Biomarker Standardisation Initiative

Hey guys, related to this, I recently came across this article on Wikipedia, and it struck me that the issue of discretizing the sample is something people studied for quite some time now: https://en.wikipedia.org/wiki/Histogram#Number_of_bins_and_width. I wonder if this earlier work is something that should be considered by the IBSI and radiomics community?

Martin Vallières

unread,

Nov 9, 2018, 8:10:33 AM11/9/18

to Image Biomarker Standardisation Initiative

Dear Andrey,

Many thanks for this link, this will be a useful reference. I will bring up this point at our next IBSI meeting and will let you know!

Best,

Martin

Reply all

Reply to author

Forward

Message has been deleted