Hi Oscar,
The thresholds are:
low_thresh <= low_clouds < mid_thresh
mid_thesh <= mid_clouds < high_thresh
high_thresh <= high_clouds
The code came from the WRF group and the developer that wrote the algorithm has moved on from NCAR, so I'm not sure where she got the equations from. But I suspect they are from a simplification of several empirical papers on the subject, or pulled directly from the WRF source code. For the 4.0 and 3.0 case, cloud fraction numbers start be be non-zero after 75% relative humidity is found for a grid volume. For the high clouds, this number drops to 60%.
In both cases, this algorithm is trying to estimate the cloud fraction based on one number for RH that represents a decent sized volume. If you have a 3.3km x 3.3km x 1 km, that's a pretty large volume. If you have 50% of that volume that is 100% RH (i.e. a cloud), surrounded by 50% RH, the entire volume will have an RH value less than 100% (in this case 75%). To an observer from the ground, how much of the sky does this one cloud cover? Probably not very much if that volume is high up. The developer used 75% at lower levels to begin indicating that cloud fractions appear above this threshold. At higher levels, the size of the grid volume gets larger because the vertical spacing increases, so that's my best guess as to why the lower numbers (2.5 and 1.5) are used.
In reality, this product should be used more qualitatively, because you're trying to create information where there isn't any (at sub-grid level). As another option, WRF has several radiation schemes that will produce cloud fractions, but most of the time the values are either 0 or 1.
So, the cloud fraction product is a good product to determine where there are probably clouds, but if you compare against a total sky imager, I have no idea how well the results match up.
(Also, if anything I've typed above is wildly incorrect, and you know the citation for this equation, feel free to chime in and correct me.)
Hope this helps,
Bill