Re: general question on training sets

1 view
Skip to first unread message

Rick Giuly

unread,
Feb 7, 2013, 8:48:18 PM2/7/13
to kurt weiss, cyt...@googlegroups.com

Good question. My experience is this: I use (A), within single datasets.
I prefer to normalize each slice. The brightness tends to vary as the
image process is physically running. This creates a very big difference
in the appearance (and the segmentation accuracy) of the slices.

(B) could work theoretically. You would want to have equal about number
of examples from each. You'd also want to make sure you have them
contiguous as possible: Like 10 slices of one and then 10 of the other
(not alternating). That's because it learns about pairs of contours that
are in adjacent planes.

Note that method (B) will present a greater challenge to the classifiers
and probably be less accurate overall.

If I were you, I'd try to match the training data with the data to
process as much as possible. If you can make them look the same with
brightness/contrast adjustments, that's a good way to make it work.
However if they (the two datasets) cannot be matched - it's probably
best to have different training sets for the different types.

On a related note: I've seen some mito within a given dataset that are
so different than others (like in different cells) that it might be best
to treat them as totally different types of objects and train for them
separately.


Best,
-Rick


kurt weiss wrote:
> Hi Rick,
> I would like to use the same training set on all of my data-sets
> (various animals) because (1)making a training set takes time and
> (2)training all cytoseg runs to the same training set should ensure a
> more standardized classifier. However, not all data sets look exactly
> the same. Sometimes mitochondria have very visible cristae, sometimes
> they are very dark, etc. Thus, using the same training set on all data
> sets produces variation in the output accuracy. My question is how to
> best deal with this. Obviously getting the staining and imaging
> protocols down would be the best solution, but aside from that, one
> could either:
> (A) play with contrast/brightess and histograms to process the images to
> the point that they are very very similar.
> (B) add a few 10? images from each data set to your training set, thus
> broadening the range images seen by the classifier/learning algorithm.
> (C) ?
>
> I am curious if you'd favor one approach or the other, and if the second
> approach would work at all. In attempting to do this, I did notice that
> all the images in the training folder must be the same size to make
> Cytoseg run at all, but I am curious if introducing variability here
> will strengthen or weaken the learning algorithm?
>
> As always, thanks for your help.
> Kurt
>
> --
> .

Reply all
Reply to author
Forward
0 new messages