Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Remove speckles in bilevel (bw) image

29 views
Skip to first unread message

Laurent S.

unread,
Jun 10, 2009, 10:12:18 PM6/10/09
to

Scanned bw images of print often have tiny speckles that are
"undesirable noise". Despeckling is the operation of
removing them while leaving the rest of the image intact.

Is there a freeware or shareware utility that does this
efficiently?

Conceptually, the problem is simple to solve. For
example, it is nearly a special case of the thin curve
removal problem of the recent thread
"Remove thin curves in image".

But my question is purely practical!

Cheers,

Laurent S.

Raj

unread,
Jun 11, 2009, 5:32:20 AM6/11/09
to

Can you post a sample image, Please? Will help to understand the
problem in hand.

iman islam

unread,
Jun 11, 2009, 2:10:21 PM6/11/09
to
> problem in hand.- Hide quoted text -
>
> - Show quoted text -

Understanding Islam
August 30th, 2008
What you are about to read might sound unusual but it could be very
enlightened.


Understanding Islam

by Fr. Theodore Pulcini
Muslims now constitute a significant minority in Western countries,
most notably France, Britain, Germany, Canada, and the United States.
Consequently, those in the West engaged in theological discourse and
pastoral work can no longer consign Islam to the outer limits of their
universe of religious concerns. Islam is no longer just "over there,"
an exotic feature of distant cultures; it is a well-established
component of our own religious landscape and deserves attention from
all who work to further the Reign of God in our culture.

Having taught courses in Islamic civilization as part of the religious
studies curriculum at both secular and church-related institutions, I
can give ample testimony to the antagonistic images of Islam obtaining
in, and actively perpetuated by, many Western circles. In some cases,
it is alarmism that fuels the antagonism ("Muslims are taking over the
world!"); in others, the indignation of post-modern Westerners who
resent the very existence of a powerful religious tradition which
seems to foster "unenlightened" values ("Islam is intolerant, it
oppresses women, etc."). It is a situation fraught with the real
possibility of bigotry and violence.

As "people of religion," we can be particularly effective in shaping
religious sentiment toward Muslims in our society. We can either stoke
the fires of antagonism, feeding into the dominant societal trend of
"demonizing" Islam and Muslims; or we can fight those fires,
challenging people to come to a well-informed, balanced appreciation
of this "other" in our midst. Most of us, I assume, would affirm the
desirability of the latter option. I would like to offer a few
suggestions as to how that option might be realized.

First, expose the caricatures — both our own and those of others. Such
caricatures are usually based on the assumption that Islam is
monolithic and that Muslim communities are homogeneous. Both
assumptions are false. Just as there are many "Christianities," there
are many "Islams" and most have very little to do with "Islamism,"
that militant, extremist fringe of Islam which, despite its claim to
"traditionalism," actually violates such perennial Islamic values as
tolerance, forbearance, hospitality, and broad-mindedness. A number of
excellent resources can help you in this process — see the attached
reading list. All the recommended authors are Christians who have done
much to dispel the rampant misinformation concerning Islam.

Second, reflect on what underlies our tendency to caricature Islam.
Many in the Christian world have thrown themselves headlong into the
process of challenging the traditional shape of our society and want
to eradicate the very memory of its "oppressive" structures. Modernity
is uncomfortable with the demands of tradition. When Islam presents
itself — unabashedly, unashamedly — as a traditional religion, i.e.,
as a religion based on the structures and values of a traditional
cultural system, those who are shaped by secular culture wince. They
are reminded of what our own communities once affirmed (and in some
quarters, still do affirm) to be true and what was once imposed (and
in some quarters, still is imposed) as obligatory. Moreover, I think
many recognize, even if only reluctantly, that in dismantling the
traditional shape of our religious life, in many ways our religious
communities have been debilitated. Islam’s vitality and self-
confidence reminds us of what we have lost. In short, the growing
strength of Islamic identity and the resurgence in Islamic practice
only serve to underscore the progressive weakening of Christian
identity and the steady diminishment of Christian practice in
secularized Western societies. We resent Islam’s newly found vitality
because it draws attention to our present malaise.

Third, appreciate the practical, external expressions of faith that
typify Islamic life. We have much to learn in this regard from Islam.
A few years ago even Pope John Paul II pointed to the Muslim fast
during the month of Ramadan as an example of the kind of zeal and
discipline Christians should, but today rarely do, bring to Lenten
fasting. Islam also requires regular prayer — at least five times a
day for the observant Muslim. (While at the University of Pittsburgh,
I would regularly chance upon a Muslim student in a quiet corner of a
library "making salat" on a prayer rug.) How many Christians can claim
to set aside time for prayer so regularly? Muslims must give alms
(zakat), not just when they feel moved to do so but as a requisite
part of their religious practice; year by year they return a certain
percentage of their wealth to the community to even up the
inequalities that separate the "haves" from the "have nots." Do we
feel so obliged to bridge the gap between the rich and the poor in our
communities? Islamic life requires pilgrimage, an experience now
largely de-emphasized in modern Christian life. It requires bodily
acts of worship like bowing and prostrating, gestures often dismissed
as archaic to the "sophisticated" modern Christian. In short, for all
of our talk of "incarnational" Christianity, we are becoming a
religion less and less likely to enflesh our religious sentiments in
external expression. We stress thought and emotion over physicality,
enforcing a kind of neo-Gnosticism that sees religion primarily as a
"spiritual" sentiment, having little to do with bodily performance.
This is, I would say, a most unfortunate trend. Islam reminds us of
the need for physical religious enactment.

Fourth, highlight the Islamic emphasis on community life and on the
individual’s accountability to community standards. As Christianity in
the Western world becomes more atomized and Christian spirituality
more privatized, Islam provides a strong testimony to the power of
community. One of my Muslim students once remarked, "Wherever I go,
whether in the Islamic world or outside it, even if I cannot find a
local community of Muslims, I am always aware that I am part of a
worldwide community. This is always at the forefront of my mind. It
forges my whole identity. It guides my every action. The umma [Islamic
community] gives me strength, and I willingly give it my loyalty." In
a culture where commitment to religious community is becoming
increasingly rare, and accountability of any sort (whether to a
religious tradition or any other "authority") is seen almost as an
infringement of personal rights, the communocentric emphasis of Islam
can seem somewhat archaic. It should, however, challenge us Christians
in particular to revitalize our communal structures, even if that
means drawing boundaries between ourselves and "the world," boundaries
that have been blurred by encroaching secularization. In re-thinking
our definition of religious communities and re-shaping the dynamics of
life within them, we can learn some valuable lessons from the Muslim
experience.

Fifth, use dialogue with Islam as a way not only to increase our
appreciation of the Islamic tradition but also to deepen our
appreciation of the distinctive features of our own. Make no mistake
about it: despite sizeable areas of "common ground," there is a wide
theological chasm between Islam and Christianity. It was largely in
reaction to an often distorted presentation of Christian doctrine that
Islam formed its own doctrinal heritage. Islamic doctrine challenges
us to embrace anew those facets of Christian theology which
differentiate us from Muslims — especially the mystery of the Trinity
and the divine Sonship of Christ — and then to find new and ever more
insightful ways of articulating these dogmas. Simple repetition of
traditional formulas usually does not suffice to foster greater
understanding of Christianity among Muslims (or among Christians, for
that matter)! In questioning the central Christian doctrines, Islam
serves us well: it requires us to focus specifically on those
distinctive beliefs that are constitutive of our view of God and the
world and to find more effective ways of proclaiming and explaining
them both to those within the "household of Christianity" and to those
without.

Sixth, and finally, make personal contact with Muslim communities and
individuals. It is much more difficult to caricature people we know
than those we keep at a distance. Call the local Islamic center and
ask to be put on the mailing list. These centers often sponsor
lectures of public interest; attend one and talk to members of the
host community. Groups from the mosque and your church may want to
exchange visits. Social service programs can provide opportunities for
mosque and church to join together in a common cause. The
possibilities for such encounters abound and, if realized, usually
bear much good fruit.

Conclusion: On their course evaluation forms, two students in my
"Introduction to Islamic Civilization" wrote remarks that I found
especially gratifying. The first wrote, "When I signed up for this
course, I had nothing but disdain for Muslims; now I am actually able
to see the beauty of their religion." The other wrote, "Studying Islam
has made me better able to see what it means for me to say that I am
Christian." These students articulated well what I consider the two
main reasons for us to come to an appreciation of Islam. Doing so will
enable us not only to affirm this important "other" in our midst and
but also to clarify our own identity.


———————


For more information about Islam

http://english.islamway.com/

http://www.islamhouse.com/

http://www.discoverislam.com/

http://www.islambasics.com/index.php

http://english.islamway.com/

http://www.islamtoday.net/english/

http://www.islamweb.net/ver2/MainPage/indexe.php

http://www.sultan.org/

Contact Us At

Imanwa...@gmail.com

ImageAnalyst

unread,
Jun 11, 2009, 4:09:28 PM6/11/09
to

------------------------------------------------------------------------------
Laurent S.
Several ways of approaching it. You can do a morphological closing
operation. Or, if you really don't want to change at all the larger
particles, then you can do an erosion and use that image as a marker
image for morphological reconstruction. Like you said, it's like the
thin curve problem expect that I don't really see it as a special
case. As I see it, they're identical - the shape of the small blobs
doesn't matter. Specks or lines, it's all the same process.

Another option (if "bw" is a black and white binary image) would be to
just find all the blobs and after their size is measured, just loop
through tossing out the blobs that are too small. If "bw" is a
monochrome gray scale image then you might try a median filter, which
is intended to get rid of impulsive noise like spikes, dots, specks,
dead pixels, that sort of thing (basically small isolated noise
pixels).
Regards,
ImageAnalyst

Laurent S.

unread,
Jun 12, 2009, 2:24:59 AM6/12/09
to

Raj requests:

> Can you post a sample image, Please? Will help to
> understand the problem in hand.

A typical image needing despeckling arises from a bw
(= bilevel, black-white) scan of print.
A black speckle is a small connected
black region completely surrounded by white pixels.
White speckles are similarly defined.
Small means of diameter not greater than some specified value.
One-pixel speckles are already very important in practice.

Here are some isolated examples snipped out of bw scans at
400dpi:

http://topo.math.u-psud.fr/~slc/speckle_samples/

Small would ideally mean of diameter definitely
less than that of any character (glyph).

I hope this clarifies the problem -- also for ImageAnalyst.

Laurent S.

Hesham

unread,
Jun 12, 2009, 10:02:36 AM6/12/09
to
On 12 June, 07:24, lcs7...@gmail.com (Laurent S.) wrote:
> Here are some isolated examples snipped out of bw scans at
> 400dpi:
>
>      http://topo.math.u-psud.fr/~slc/speckle_samples/
>

Have you tried the median filter?

see the output of the files you'd uploaded after applying the median
filter:
http://i43.tinypic.com/33jtrsz.jpg
http://i39.tinypic.com/2jfj3pi.jpg
http://i39.tinypic.com/2a1ell.jpg
http://i43.tinypic.com/2db5q9c.jpg

aruzinsky

unread,
Jun 12, 2009, 11:00:56 AM6/12/09
to

Hesham

unread,
Jun 12, 2009, 1:37:16 PM6/12/09
to

Apparently without Gaussian weighting the result is better :-)
Probably Laurent wants to use an OCR on his images (?), for that I
suppose a simple and fast algorithm does the job.

Laurent S.

unread,
Jun 12, 2009, 10:29:34 PM6/12/09
to

ImageAnalyst&Hesham and then Aruzinsky
successively suggested and illustrated despeckling
by use of median filtering and then Gaussian
filtering. Thanks for that!

The 3x3 median filtering eliminates both black
and white one-pixel speckles (neglecting effects on
pixels at edge of the image).

But I don't like 3x3 median filtering because,
unfortunately, it rounds

000000000000000000000000000000000000000000000000000000
XXXXXXXXXX000XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXX000XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXX000XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
000000000000000000000000000000000000000000000000000000

to

000000000000000000000000000000000000000000000000000000
XXXXXXXXX00000XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXX000XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXX00000XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
000000000000000000000000000000000000000000000000000000

It also erases any thin line like

000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000

even when long and hence NOT a speckle.

And it breaks an isthmus like

000000000000000000000000000000000000000000000000000000
XXXXXXXXX00000XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXX00000XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
000000000000000000000000000000000000000000000000000000

yielding

000000000000000000000000000000000000000000000000000000
XXXXXXXXX00000XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXX000XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXX00000XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
000000000000000000000000000000000000000000000000000000

giving a highly visible topological change.

For numerous such reasons I have always rejected 3x3
median filtering and favored true despeckling by
"bit whacking" on bw bitmaps.

Incidentally, Aruzinsky uses a more "sympathetic"
median filter along the image edges to eliminate
unpleasant edge effects met by Hesham. So, in

http://topo.math.u-psud.fr/~slc/speckle_samples/

I have added Aruzinsky's median filtered versions of
my samples -- distinguished by the length 4 suffix
"_med".

More importantly, Aruzinsky proposes replacing
3x3 median filtering by (grayscale) convolution with
the Gaussian kernel of standard deviation 0.7 pixel
--- followed by threshholding to bw at .5 gray. (Is
my technical interpretation correct?)

Aruzinsky's 3 results for the samples at the
above URL are now distinguished there by the suffix
"_bw_g7". They are quite satisfactory for practical
use in digitization for machine and human reading.
So I am eager to make more Gaussian tests before
resorting to bw bitwhacking to do exact despeckling.

Hence my questions for Aruzinsky and others:

(1) What exactly is the sd 0.7 discrete Gaussian
kernel for convolving a grayscale bitmap?

(2) Where all can one find freely available
precompiled convolvers for grayscale bitmaps?
And which are the best?

(3) What is a typical time for one such convolution
of a 4000 x 6000 pixel grayscale bitmap?

Thanks for all feedback!

Laurent S.

Laurent S.

unread,
Jun 12, 2009, 11:13:12 PM6/12/09
to


ImageAnalyst suggested:

> ... just find all the blobs and ...

Trivial :) -- but laborious.
Is this a module available in some
free and precompiled library?

Laurent S.

aruzinsky

unread,
Jun 13, 2009, 11:42:21 AM6/13/09
to

No, there was no Gaussian convolution in my examples. I used a
WEIGHTED median filter with Gaussian wieghts. Google "WEIGHTED median
filter" . With a std. dev < 1, it kinda sorta has the effect of what
a regular median filter with a smaller than 3x3 window would have if
that were possible which it isn't. It is more likely to preserve thin
lines.

ImageAnalyst

unread,
Jun 13, 2009, 3:16:14 PM6/13/09
to

-------------------------------------------------------------
Laurent S:
Not laborious at all. For the second method, I could to it in 4 lines
in MATLAB, fewer lines than you just wrote. Do you have MATLAB? Just
do this:
threshold your image (skip this is it's already binary)
call bwlabel()
call regionprops()
call ismember() to do the size filtering
You're done! Not hard at all. 3 or 4 lines. Let me know if you have
MATLAB and need nore help in figuring out how to call the above.

The first method is also 2-4 lines of code.
Good luck,
ImageAnalyst

Laurent S.

unread,
Jun 15, 2009, 3:31:43 AM6/15/09
to


Aruzinsky writes:

> No, there was no Gaussian convolution in my examples. I used a
> WEIGHTED median filter with Gaussian wieghts.

No explicit convolution I agree. However I suspect
(and assumed) that for BITONAL images
any "weighted median filter" is equivalent to a convolution with
a kernel function that is essentially your weighting --
followed by 50% threshholding.

So my question about kernel amounts to :

What are the coefficients of the
your 0.7 sd Gaussian weighting?


> It is more likely to preserve thin lines.

Certainly worth testing. Could you try both
--- 3 x 3 median, and
--- 0.7 sd Gaussian weighted median
on the new test file called "bw_03" in

http://topo.math.u-psud.fr/~slc/speckle_samples/

please?

That test file illustrates narrow isthmuses as well as thin curves.
Ithmuses are common in fonts with serifs, and will be thin in
small but vital characters such as second order subscripts in
scanned math.

Cheers

Laurent S.

aruzinsky

unread,
Jun 15, 2009, 11:12:11 AM6/15/09
to

I typically don't deal with binary images and I had to convert your
images to 8 bit/channel, 3 channel using Irfranview.

http://www.general-cathexis.com/images/bw_03Median.png
http://www.general-cathexis.com/images/bw_03Median0.7.png
http://www.general-cathexis.com/images/bw_03Median0.675.png
http://www.general-cathexis.com/images/bw_03Median0.625.png


If I were you, I would try the following because it is theoretically
much better:

1. Blur image A to get B (floating point)
2. Get gradients of B
3. For each pixel, get asymmetric Gaussian weight kernels elongated in
the gradient directions of the B.
4. Apply weighted median with weights obtained in 3. to A.

aruzinsky

unread,
Jun 15, 2009, 11:13:59 AM6/15/09
to
> http://www.general-cathexis.com/images/bw_03Median.pnghttp://www.general-cathexis.com/images/bw_03Median0.7.pnghttp://www.general-cathexis.com/images/bw_03Median0.675.pnghttp://www.general-cathexis.com/images/bw_03Median0.625.png

>
> If I were you, I would try the following because it is theoretically
> much better:
>
> 1. Blur image A to get B (floating point)
> 2. Get gradients of B
> 3. For each pixel, get asymmetric Gaussian weight kernels elongated in
> the gradient directions of the B.
> 4. Apply weighted median with weights obtained in 3. to A.- Hide quoted text -

>
> - Show quoted text -

CORRECTION:

3. For each pixel, get asymmetric Gaussian weight kernels shortened in

Jens Dierks

unread,
Jun 15, 2009, 6:48:17 PM6/15/09
to
Laurent Swrote:
> Aruzinsky writes:
>
> > No, there was no Gaussian convolution in my examples. I used a
> > WEIGHTED median filter with Gaussian wieghts.
>
> No explicit convolution I agree. However I suspect
> (and assumed) that for BITONAL images
> any "weighted median filter" is equivalent to a convolution with
> a kernel function that is essentially your weighting --
> followed by 50% threshholding.

I dont think that this is the same, weighted Median means that the
distances to the center of the sorted values are modified, not the
values itself.

> Certainly worth testing. Could you try both
> --- 3 x 3 median, and
> --- 0.7 sd Gaussian weighted median
> on the new test file called "bw_03" in
>
> http://topo.math.u-psud.fr/~slc/speckle_samples/
>
> please?
>
> That test file illustrates narrow isthmuses as well as thin curves.
> Ithmuses are common in fonts with serifs, and will be thin in
> small but vital characters such as second order subscripts in
> scanned math.

maybe the medhybrid Filter, described in:
http://www.cs.tau.ac.il/~turkel/notes/meanmed.pdf

would give pleaseable results?

But aruzinskys asymmetric Gaussian weight kernels should also
be quite good.

Laurent S.

unread,
Jun 16, 2009, 1:54:24 AM6/16/09
to


Hello again Aruzinsky,

Thanks for processing my example "bw_03"

I have posted these files at

http://topo.math.u-psud.fr/~slc/speckle_samples/

in bw TIFF Group 4 format. Is 6.25 the lowest sd that
assures suppression of one-pixel "speckles"? Testing
"black_01" would give the answer.


Your new suggestions:

> 1. Blur image A to get B (floating point)

> 2. Get gradients of B

> 3. For each pixel, get asymmetric Gaussian weight

> kernels shortened in the gradient directions of
> the B.

> 4. Apply weighted median with weights obtained
> in 3. to A.

seem promising in view of the above tested example
"bw_03". What tools would you recommend for steps
(2), (3), (4)?

My main worry is that the whole process may now be
more complex and slower than the blob-sorting approach
suggested by ImageAnalyst. In compensation, it may
do some independently useful smoothing of
"jaggies".

> I typically don't deal with binary images
> and I had to convert your
> images to 8 bit/channel, 3 channel using Irfranview.

Well, "8 bit/channel, 3 channel" is the big market.
And, the few utilities I have, tend to process nothing
else, even at a substantial and unnecessary sacrifice of
speed and RAM.

On the other hand bw (black-white, bitonal) is best and
dominant for scanned print. That is mostly because of
the excellent compression schemes available for bw,
both faithful and lossy. There are millions of pages
already scanned in bw. And most of them are marred by
"speckles" (as I have defined them).

Thanks again for all the help.

Cheers

Laurent S.

aruzinsky

unread,
Jun 16, 2009, 10:39:58 AM6/16/09
to
On Jun 15, 11:54 pm, lcs7...@gmail.com (Laurent S.) wrote:
> Hello again Aruzinsky,
>
>      Thanks for processing my example "bw_03"
>
>  >http://www.general-cathexis.com/images/bw_03Median.png
>  >http://www.general-cathexis.com/images/bw_03Median0.7.png
>  >http://www.general-cathexis.com/images/bw_03Median0.675.png
>  >http://www.general-cathexis.com/images/bw_03Median0.625.png
>
> I have posted these files at
>
>      http://topo.math.u-psud.fr/~slc/speckle_samples/
>
> in bw TIFF Group 4 format. Is 6.25 the lowest sd that
> assures suppression of one-pixel "speckles"? Testing
> "black_01" would give the answer.
>

0.7 is the lowest that assures removal of 1 pixel dots.

>      Your new suggestions:
>
>  > 1. Blur image A to get B (floating point)
>
>  > 2. Get gradients of B
>
>  > 3. For each pixel, get asymmetric Gaussian weight
>  > kernels shortened in the gradient directions of
>  > the B.
>
>  > 4. Apply weighted median with weights obtained
>  > in 3. to A.
>
> seem promising in view of the above tested example
> "bw_03".  What tools would you recommend for steps
> (2), (3), (4)?
>

A Visual C++ compiler.

> My main worry is that the whole process may now be
> more complex and slower than the blob-sorting approach
> suggested by ImageAnalyst.  In compensation, it may
> do some independently useful smoothing of
> "jaggies".
>

If you can live with a non-binary result, then isophote smoothing will
work. For some unknown reason, it removes 4x4 but not 1x1 dots, so I
had to enlarge 4X using a box kernel before applying.

Before: http://www.general-cathexis.com/images/black_024X.png
After: http://www.general-cathexis.com/images/black_024XIsophoteSmoothing.png

You can do this with my software that I sell in the root directory of
the above links.

ImageAnalyst

unread,
Jun 16, 2009, 9:52:19 PM6/16/09
to

----------------------------------------------------------------------------------------
Laurent S.:
Simple blob analysis avoids all those spatial filtering problems. If
you know the area of the smallest area you want to keep, then it just
simply throws out all the small shapes and leaves the larger shapes
untouched. Do you have MATLAB? I could write a demo using the image
of your choice in just a few lines.
Regards,
ImageAnalyst

Laurent S.

unread,
Jun 17, 2009, 1:08:19 AM6/17/09
to

Hi all,

At this point we have three or four good candidates for
despeckling bw scanned print --- the more subtle of which
may also do some desirable smoothing.

ImageAnalyst would like to test some typical
examples. I'll have to think twice to present examples
that have no copyright restrictions. Suppression of
speckles of =< 3 pixels will solve most problems. Even =<
1 is OK. Commercial jobs of 100K pages are not uncommon,
so speed is very important; so please give timings.

A small artificial test example of interest for all candidates
is the file "bw_04.tiff" at

http://topo.math.u-psud.fr/~slc/speckle_samples/.

All contestants are cordially invited!

Cheers

Laurent S.

Laurent S.

unread,
Jun 17, 2009, 1:12:11 AM6/17/09
to

Jens Dierks recommends the "MEDHYBRID FILTER":

> maybe the medhybrid Filter, described in:
>
> http://www.cs.tau.ac.il/~turkel/notes/meanmed.pdf
>

> would give pleasant results?

Thanks for the 'heads-up'. Turkel offers a wealth of
material in .../notes/...

Here is the relevant passage from his "meanmed.pdf"

|This is a edge-preserving median filter. It's a three-step
| ranking operation. In a 5x5 area pixels are ranked in two
| different groups as shown below.

| aa bb aa
| aa bb aa
| bb bb ab bb bb
| aa bb aa
| aa bb aa

| The median value of group 'a'
| and 'b' and the central pixel 'ab' is used as the new pixel
| value.

I do not understand exactly what Turkel is
proposing here. Is the group 'a' the pixels in the
'X shaped' subarea

aa aa
aa aa
aa
aa aa
aa aa

And is group 'b' similarly the '+ shaped' subarea? If so, we
presumably are required to take the median value of group 'a'
and also that of of group 'b'. But then what's to do?


| This filter can be used to remove 'shot' noise (in
| which individual pixels are corrupted or missing from an
| image) or to reduce random noise in the context of
| averaging.

| This median filter overcomes the tendancy of the other median
| filters (median3, mediannn, medtrunc) to erase lines which are
| narrower than the half-width of the neighborhood and to round
| corners.

Sounds like the features we want.

Is this (once understood) trivial to program within MATLAB?
As ImageAnalyst's recipe is!

Cheers

Laurent S.

ImageAnalyst

unread,
Jun 17, 2009, 11:20:05 AM6/17/09
to
If I gave you MATLAB code, would you be able to run it? I.e., do you
have MATLAB?

Jens Dierks

unread,
Jun 17, 2009, 11:52:20 AM6/17/09
to
Laurent S. wrote:
> Thanks for the 'heads-up'. Turkel offers a wealth of
> material in .../notes/...

Yes, thanks for the hint.

...


> I do not understand exactly what Turkel is
> proposing here. Is the group 'a' the pixels in the
> 'X shaped' subarea
>
> aa aa
> aa aa
> aa
> aa aa
> aa aa
>
> And is group 'b' similarly the '+ shaped' subarea?

Thats what i think.



> If so, we
> presumably are required to take the median value of group 'a'
> and also that of of group 'b'. But then what's to do?

And then the median from this two values plus the center value
becomes the new value, but i am also not really shure.

This way, the result for bw_04 would be:
http://freenet-homepage.de/JDierks/tmp/bw_04_medhybrid.png

Because it erases very much, here is a version with a center weight
of 3 for Group A and B:
http://freenet-homepage.de/JDierks/tmp/bw_04_medhybrid3Center.png

...

> Sounds like the features we want.

Maybe i misinterprated the paper, because it is somewhat unclear,
but i made also a very easy function, that looks in the 8 directions
from the center pixel and counts how many pixels have the same
value as the center and if so, one more pixel in this direction is taken
into account.
If the amount is at least 3, including the center pixel, the value stays,
otherwise it changes (so this function makes only sense for 2 level
images):
http://freenet-homepage.de/JDierks/tmp/bw_04_least3.png
And 2 iterations of it:
http://freenet-homepage.de/JDierks/tmp/bw_04_least3_2it.png

For comparison here are the results for a 3x3 Median with a center
weight of 5, that should come close to the gaussian07 weighted
median(?):
http://freenet-homepage.de/JDierks/tmp/bw_04_med9Cen5.png
And 2 iterations:
http://freenet-homepage.de/JDierks/tmp/bw_04_med9Cen5_2it.png

It erases more values from the tips of the letters.

> Is this (once understood) trivial to program within MATLAB?
> As ImageAnalyst's recipe is!

Cant tell you really, because i dont use matlab. But it should be
not very complicated.

Jens

Laurent S.

unread,
Jun 17, 2009, 10:03:41 PM6/17/09
to


Hello Jens Dierks,

It seems you are on the right track (best
performance yet) with both "bw_04_least3" and
"bw_04_med9Cen5". I have added all your tests based on
my "bw_04" to

http://topo.math.u-psud.fr/~slc/speckle_samples/

You are too modest in stating:

> that [ i.e., the test "bw_04_med9Cen5" ]


> should come close to the gaussian07 weighted
> median

since it is incomparably better than Arusinsky's
test "bw_03Median0.7" which uses his weighted sd
0.7 Gaussian weighted median filter.

What tools are you using? Hopefully freely
available?

My original plan was to use nothing but C to
eliminate rather small speckles --- with no other
smoothing/rounding effects whatever. Maybe you are
close to that goal?

Also, I recently wrote;

> At this point we have three or four good candidates for
> despeckling bw scanned print --- the more subtle of which
> may also do some desirable smoothing.

But, so far, I see no desirable smoothing (beyond
despeckling) in the tests presented so far. That
suggests we focus on despeckling alone --- seeking
speed, simplicity, and ready availability.

Cheers

Laurent S.

Laurent S.

unread,
Jun 17, 2009, 10:06:55 PM6/17/09
to

Re: Remove speckles in bilevel (bw) image

ImageAnalyst writes:

> If I gave you MATLAB code, would you be able to run it? I.e., do you
> have MATLAB?

It appears that I can (mod some delay) get temporary access to
MATLAB.

So MATLAB code from you could at least let me realisticly test
MATLAB's potential for despeckling. That would be great :=)

Thanks,

Laurent S.

Jens Dierks

unread,
Jun 19, 2009, 12:40:02 PM6/19/09
to
Laurent S. wrote:
> What tools are you using? Hopefully freely
> available?

I use Delphi (pascal), but if you have some experience
with c and image processing, this shouldnt be a big
problem.

> My original plan was to use nothing but C to
> eliminate rather small speckles --- with no other
> smoothing/rounding effects whatever. Maybe you are
> close to that goal?

You can extend the least3 function do greater sizes,
dependend on how big artefacts you want to remove:
Starting from the middle pixel, count all connected
pixel with the same color until you reach the minimum
count for letting the value as is.
Then go on to the next pixel with a different color.
If the minimum count isnt reached, change the color
of the middle pixel (and the next pixels, if they have
the same color).
Because the issue of no changing should be the normal
case, the main procedure is reading and skipping pixels.
No speed problems, beside reading and writing of the
files itself.
And you can do the changing on the fly, no need to
make a copy. And in the case of changing pixels:
change all connected pixels, so you dont have to look
in the direction of prior processed pixels, what makes
the code even easier.

> Also, I recently wrote;
>
> > At this point we have three or four good candidates for
> > despeckling bw scanned print --- the more subtle of which
> > may also do some desirable smoothing.
>
> But, so far, I see no desirable smoothing (beyond
> despeckling) in the tests presented so far. That
> suggests we focus on despeckling alone --- seeking
> speed, simplicity, and ready availability.

I dont know what the OCR prefers, you can do a gaussian
convolution + thresholding to smooth and maybe to
make the letters more bold.

Have you tried these things in matlab so far?
If it is fast enough, no need to program it yourself.

Jens

ImageAnalyst

unread,
Jun 19, 2009, 9:36:15 PM6/19/09
to

=======================================================
Laurent S.:
Here's some MATLAB code. No filtering. Just simply throws out
objects smaller than 180 pixels. Blobs bigger than that have their
original shape, except that I did a hole filling on it to fill a white
speck in the huge letter 2. Unfortunately this also filled in the
thin circle. Maybe you could fill after you did an area threshold.
But that big circle could actually be a big letter O. And your
slanted lines could actually be the letter "l" or number "1." So it
looks like simple filtering based on size won't work - you're going to
have to do full-blown OCR on it. Even then there are judgement calls,
like the circle - is it noise to be thrown out or is it a legitimate
character to be kept?
Regards,
ImageAnalyst
P.S. if you see 3D after the equals signs, then go to Google groups
and download the code from there (it handles mime while some
newsreaders don't)

clc;
clear all;
disp('Running BlobsDemo.m...');
originalImage = imread('C:\Documents and Settings\username\My Documents
\Temporary stuff\Letter pictures\bw_04_least3_2it.tif'); % Read in
demo image

% binaryImage = im2bw(originalImage, 0.4); % Threshold to binary
thresholdValue = 0;
binaryImage = originalImage == thresholdValue; % Alternate way
to threshold.
binaryImage = imfill(binaryImage, 'holes');

subplot(3,2,1); imagesc(originalImage); colormap(gray(256)); title
('Original Image');
subplot(3,2,2); imagesc(binaryImage); colormap(gray(256)); title
('Binary Image');

labeledImage = bwlabel(binaryImage, 8); % Label each blob so we
can make measurements of it
coloredLabels = label2rgb (labeledImage, 'hsv', 'k', 'shuffle'); %
pseudo random color labels

% Save the labeled image as a tif image.
% imwrite(uint16(labeledImage), 'tif_labels.tif');
% tifimage = imread('tif_labels.tif');
% imtool(tifimage, []); % Use imtool so she can inspect the values.

subplot(3,2,3); imagesc(labeledImage); title('Labeled Image');
subplot(3,2,4); imagesc(coloredLabels); title('Pseudo colored
labels');

blobMeasurements = regionprops(labeledImage, 'all'); % Get all the
blob properties.
numberOfBlobs = size(blobMeasurements, 1);
areas = [blobMeasurements.Area];
minimumAreaInPixels = 180;
% Get a list of the blobs that are fairly circular or oval with a
circularity < 1.3.
indexes = find(areas > minimumAreaInPixels);
bigBlobsImage = ismember(labeledImage, indexes);

labeledImage = bwlabel(bigBlobsImage, 8); % Label each blob so we
can make measurements of it
coloredLabels = label2rgb (labeledImage, 'hsv', 'k', 'shuffle'); %
pseudo random color labels
subplot(3,2,5); imagesc(labeledImage);
caption = sprintf('Labeled image with blobs > %d pixels',
minimumAreaInPixels);
title(caption);
subplot(3,2,6); imagesc(coloredLabels);
caption = sprintf('Pseudo colored labels image with blobs > %d
pixels', minimumAreaInPixels);
title(caption);

blobMeasurements = regionprops(labeledImage, 'all'); % Get all the
blob properties.
numberOfBlobs = size(blobMeasurements, 1);

fprintf(1,'Blob # Mean Intensity Area Perimeter Centroid
\n');
for k = 1 : numberOfBlobs % Loop through all blobs.
% Find the mean of each blob. (R2008a has a better way where you can
pass the original image
% directly into regionprops. The way below works for all versions
including earlier versions.)
thisBlobsPixels = blobMeasurements(k).PixelIdxList; % Get list of
pixels in current blob.
meanGL = mean(originalImage(thisBlobsPixels)); % Find
mean intensity (in original image!)
blobArea = blobMeasurements(k).Area; % Get area.
blobPerimeter = blobMeasurements(k).Perimeter; % Get perimeter.
blobCentroid = blobMeasurements(k).Centroid; % Get centroid.
fprintf(1,'#%d %18.1f %11.1f %8.1f %8.1f %8.1f\n', k, meanGL,
blobArea, blobPerimeter, blobCentroid);
end
% Sort by area

set(gcf, 'Position', get(0, 'ScreenSize')); % Maximize figure.
msgbox('Finished running BlobsDemo.m. Check out the figure window and
the command window for the results.');

Laurent S.

unread,
Jun 20, 2009, 12:02:30 AM6/20/09
to

Hi all,

I recently wrote:

> That suggests we focus on despeckling
> alone --- seeking speed, simplicity, and
> ready availability.

Here I describe a despeckling algorithm for bw
images that I have had in mind. It concentrate on a
concept called 'rampart' and relates it to 'small
blobs' (= bloblets = speckles). It remains to be
seen whether it has advantages over a direct quest
for for blobs of the sort that ImageAnalyst has
mastered in MATLAB.

Recall the definition of a black blob.
It is a connected component of the union B of
the black pixels, each pixel being viewed as
the closed unit square centered at an integer
point of the plane R^2. Similarly define
white blobs.

For today, I limit discussion to
elimination of black blobs of =< 3 pixels.
They are considered 'bad' and are to be
eliminated by painting them white. This is
already an extremely useful operation when
applied to scanned bw print. I may discuss
elimination of bigger blobs at a later time;
most of the methods to be used here do apply,
but enhancements of technique and
reformulation of results seem necessary.

There are (up to isometry) just a few
sorts of bad blobs (use a constant width font
here):

One-pixel: X

Two-pixel: XX X
X
Three-pixel:

X xxx X X X X
X XX X X
X X

Happily none of these bad blobs encloses a
white blob. So naive elimination of such blobs
has no impact on the white blobs.


Black blobs of >= 4 pixels may enclose a
white blob, for example the black blob

X
X X
X

encloses a one-pixel white blob. Naive
elimination of it would eliminate the
white blob, puting us in a foolish quandry...


For a fixed epsilon that is > 0 but very
small, consider the epsilon neighborhood N(B) of
the union B of all black pixels; its
frontier E(B) is a finite disjoint union of
embedded piecewise smooth cyclic paths
(often with corners). This E(B) lies
entirely in the interior of the union W of
white pixels. Call E(B) the 'external
rampart' of B. It projects in a standard way
onto the usual topological frontier F(B) of
B. The projection map

p_B: E(B) --> F(B)

is an immersion, i.e. surjective (= onto) locally
one-to-one. F(B) is a (possibly singular)
immersed curve and N(B) - B is naturally
identified to E(B) \times [0,1). Pause here to
draw some pictures!

This description is quite general; it can be
adapted to the more general situation where the
pixels are allowed to be the 2-cells of any
piecewise linear tesselation R^2.

Similarly, for the union W of white
pixels, one has N(W), F(W), E(W) and p_W.

These structures are canonical (i.e.
involve no choice). They can be similarly defined
for any connected union of pixels such as a
connected component of B. They are also
computable, even locally; but we will only
have to compute little local pieces of them.

Note that, in every bad blob B_0, every
pixel meets the frontier F(B_0) and the
external rampart E(B_0) is connected. We will
use (without proof) a converse as follows.

Consider a connected component of B_0 of B,
and suppose that a component loop C of the
rampart E(B_0) is outermost. Here C being
outermost means that the component of R^2 - B_0
containing C is unbounded.

PROPOSITION. With this data, suppose that p_B(C)
touches =< 3 pixels of B_0. Then p_B(C) touches all
pixels of B_0, and C = E(B_0). In particular B_0 is
a bad blob.

We can now outline the key despeckling
operation. Given a point q in the external
rampart E(B_0) of a connected component B_0 of B,
we calculate a component loop C_q in
E(B_0) starting at q. In doing so we easily
determine the distinct black pixels of B_0
touched by the projection p_B(C_q); likewise for
any initial segment of p_B(C_q).

Suppose that C_q is known to be the
outermost loop of the boundary E(B_0) of N(B_0);
a test for this will appear later. Suppose also
that p_{B_0}(C_q) lies in at most 3 pixels of
B_0. Then, according to the Proposition above,
B_0 is a bad blob. We then will suppress the
black blob B_0 by painting it white, and suitably
restart the process with another choice of q. If,
however, some initial segment of p_{C_q} touches
>= 4 black pixels, then we will immediately stop
calculating C_q, and restart the despeckling
process elsewhere.

We now need an efficient scheme for
repeatedly restarting the above process from a
suitable point q so that all the bad blobs are
ultimately eliminated. For this we view the
bitmap as a page of lines made of pixels, each
line breaking up as an alternating sequence of
black 'words' and white 'words'. Then we proceed
in 'latin prose order' through the bitmap.

Inductively, we can suppose that there remain
no bad blobs that involve any of the first N
pixels in that latin ordering. Consider the next
black word b (if any) that consists of =< 3 black
pixels. We must enquire whether it belongs to a
bad blob and, if so, suppress that blob. If it
does belong, then by the inductive hypothesis the
pixels touching b from above are all white and we
then choose the (unique!) q in E(B) projecting to
the top left corner of the black word b. Now, C_q
and the unbounded horizontal line through q
clearly both lie in the unbounded component of
R^2 - B. The Jordan curve theorem and a little
geometry then tells us that the unique bounded
connected component of (R^2 - C_q) necessarily is
the other one, which necessarily contains
p_B(C_q). Hence the blob containing p_B(C_q) is
also the black blob to which the black word b
belongs. If p_B(C_q) touches =< 3 pixels, then by
the above Proposition, these pixels form a whole
bad blob, and we immediately paint it white. If,
on the other hand, it touches >= 4, then
calculation of p_B(C_q) was aborted as soon as
possible. In either case, after a possible
suppression of a bad blob, we know that there is
no longer any bad blob that involves any of the
first N' pixels where N' > N is the serial number
attached to the first pixel of the first
remaining black word following b. Thus, after
finitely many induction steps, there will be no
more bad blobs.

I hope that this algorithm is correct (or
easily correctable :) and nearly optimal in
simplicity and speed.

It would be interesting to test a C-compiled
implementation against the similar despeckling
operations already available to members of this
group, notably those in MATLAB and those involving
median filtering.

I have not yet implemented this algorithm.
Programming and compiler operation is hard
work! If, as suspected from the outset, I
have just reinvented a pre-existing wheel
design, then I hope that clearly better wheels
in the form of ready-to-use binaries will show
up soon!

Apologies in advance for obscurities and gaps
in exposition.

Comments, questions, welcome!

Cheers,

Laurent S.

Laurent S.

unread,
Jun 21, 2009, 1:45:38 AM6/21/09
to

Yesterday I wrote:

> I may discuss elimination of bigger blobs at a
> later time; most of the methods to be used here
> do apply, but enhancements of technique and
> reformulation of results seem necessary.

Such a discussion seems to fall into place
without a serious hitch provided we agree that
the suppression of a black blob B_0 in general
entails painting in white somewhat more than the
pixels of B_0. Namely we consider the outermost
circle component C of the external rampart
E(B_0) of B_0. The unbounded connected component
of R^2 - C is is disjoint from B_0 and hence
homeomorphic to the open annulus C \times R, as
is the connected component of R^2 - B_0
containing C. The other component of R^2 - C
contains B_0 and has closure in R^2 that is a
2-disk D; it contains B_0 and perhaps some of the
other connected components, say B_1, B_2, ..., of
B. Assuming that B_0 turns out to be small, we
will paint white not just B_0 but \D \cap B which
is all of the connected components of B that
happen to lie in D.

One gets a clearer picture of B1, B_2,... by
considering the frontier loops C', C'',... of
N(B_0) that are distinct from the outermost loop
C. They are the boundaries of disjoint 2-disks
D', D'',... that we call the 'alveoles' of B_0
each disjoint from B_0. The connected components
of B that lie in some one of the alveoles D',
D'', ... are the components of B that get painted
white along with B_0.

This picture is not really useful to our
computer because the painting operation occurs
after it has calculated C but before it has
calculated C', C''. Fortunately, given a
calculation of C there are calculations
(e.g. row-by-row) of the pixels within C
--- well known in computer graphics programs.
Incidentally we have essentially described a slow
one yesterday. Namely use the projection
p_B(C) to paint white the pixels that this singular
path touches. This can be viewed as eroding D and
the collection of pixels in it, strictly reducing
the number of those pixels. One can iterate this
sort of erosion until all pixels in D are painted
white.

This more vigorous blob suppression is
reasonable because each of the other connected
components of B is always no larger than B_0
according to several common measures of size (but
not by all, e.g. perimeter!). The most convenient
measure of size in the present context seems to
be 'bounding box size'. The bounding box of a
compact set in R^2 is the least rectangle
containing it and having sides parallel to the
coordinate axes.

Lemma 1. Let D be a 2-disk embedded in R^2
with boundary loop C. Then the bounding
box \bbox C of C in R^2 is exactly the bounding
box \bbox D of D.\qed


Lemma 2. If X and Y are compacta in R^2
and Y is a neighborhood of X, then
\bbox Y is a neighborhood of \bbox X. \qed

______________


REMARK ON EFFICIENCY: In the algorithm of
yesterday for elimination of blobs of =< 3 pixels
we gained speed by aborting the calculation of
p_B(C) whenever an initial segment touched >= 4 pixels.
A similar trick applies in the present context.
If the bounding box for an initial segment of C
becomes larger than that allowed for a black blob
B_0 whose elimination is being contemplated,
then, by the above lemmas and the fact that
epsilon is arbitrarily small, we immediately know
that \bbox B_0 will also be too big, and we can
forthwith abort efforts to paint D \cap B white.

______________


Putting the above modifications onto the
algorithm described yesterday seems to establish:

ASSERTION.

Let I be a black-white image in the plane
R^2, whose pixels are the unit squares about the
integral points. Suppose that, outside some
bounded rectangle, all pixels are white, or else
all pixels are black.

For this data we have just described an
algorithm that, given a pair of positive integers
(a,b), paints white all the the black blobs that
lie within some pixel rectangle of width =< a and
height =< b. It alters no other black pixels and
it alters no white pixel.

______________

VARIANT 1.
The same of course holds with black and
white exchanged.

______________

Applying this and the first algorithm (or
better running them unterlaced?) yields:

VARIANT 2.

For the same data we have described
algorithms that, given two pairs of positive
integers (a,b) and (a',b'), eliminate all the
the black blobs that lie within some pixel
rectangle of width a and height b, and eliminate
all the white blobs that lie within some pixel
rectangle of width a' and height b'.

In this process:

(i) A black blob B_0 not lying in an a\times b
pixel rectangle is modified by filling with black
poxels all the white alveoles in it that lie in
an a' \times b' pixel rectangle. But B_0 is
altered in no other respect; in particular, its
outermost rampart loop and its bounding box are
unchanged.

(ii) A white blob W_0 not lying in an a'\times b'
pixel rectangle is modified by filling with white
the alveoles in it that lie in an
a \times b pixel rectangle. But W_0 is it is
altered in no other respect; in particular, its
outermost rampart loop and its bounding box are
unchanged.

______________

VARIATION 3. It seems that taxicab length of
diagonal of bounding box can replace the \bbox
size used above. The taxicab length of a pixel
rectangle diagonal is a+b for an (a \times b
bounding) box. Many very different boxes have the
same taxicab diagonal-length.


I hope that all this is not already too
complex to program :)

The blob-enumeration algorithms I have seen
discussed are quite different.

Cheers

Laurent S.

Laurent S.

unread,
Jun 24, 2009, 8:52:10 PM6/24/09
to

Hi all,

In my last post in this thread I described a 'despeckling'
algorithm for bitonal images on a grid of square pixels.
ImageAnalyst has been recommending use of an exhaustive
enumeration by MatLab of all connected components of the
union of black pixels. There are well known algorithms
for this that apply to any connected graph (planar or
not) and I suspect (but don't know) that MatLab uses one
of them.

Are the algorithms used by MatLab in some sense
published? In detail?

On the other hand, 'my' algorithm makes essential
use of some topology and geometry of the plane. Here I
will clarify this dependence. It remains to be seen
whether this planarity gives it some advantages -- say
speed, memory usage, applicability ... And it remains to
be seen whether 'my' algorithm is in fact well known or
even already implemented !-)


---------------------------

A TOPOLOGICAL USE OF PLANARITY

I repeatedly (and often implicitly) used the
Jordan-Schoenflies Theorem, a famous result about plane
topology that is over 100 years old. It's meaning and
proof can be explained through the long-known physical
behavior of wooden splines as used by architects before
the age of CAD. The spline is a thin uniform,
non-extensible but very flexible, and elastic, wooken
stick. When the stick is linear, its elastic energy is 0.
When constrained to pass through a sequence of points the
spline still has some liberty and progressively assumes a
shape of locally least elastic energy that is often
unique and in some sense optimal. The motion derived
from the elastic energy released has of course to be
damped by some friction or viscosity.

Suppose a special spline is constructed to form a
smooth endless hoop rather than a stick with 2 ends.
The relevant physical observation comes in two parts:

(i) This hoop-spline can be constrained to take
exactly the shape of any 'not too wiggly' smooth
smooth loop L embedded in the plane provided that L
has the same length as the hoop-spline.

(ii) When the constraint is completely released (but
the damping is maintained) the hoop-spline deforms
continuously and smoothly through smooth embeddings in
the plane to finally form a static circular shape.

It is known that any such smooth deformation in the
plane through smooth non-singular and 1-to-1
embeddings can be extended to a smooth time dependent
flow defined on the whole plane, indeed one that fixes
all points outside some bounded set. Observed physical
motion in a viscous liquid seems to 'prove' this.
Examining the terminal time of this flow we 'deduce':

THE JORDAN-SCHOENFLIES THEOREM: A smooth loop L
embedded in the plane R^2 is here called a Jordan
curve. The complement R^2 - L has exactly two
connected components. One of them is bounded (= of
finite diameter) and the other is unbounded and
contains all points of R^2 outside a set of finite
diameter. The closure of the bounded one is
homeomorphic (even diffeomorphic) to the closed round
2-disk B^2 in R^2, and the closure of the unbounded
one is homeomorphic (even diffeomorphic) to
R^2 - interior (B^2).

A diffeomorphism is a smooth homeomorphism whose inverse
is also smooth.

---------------------------

A GEOMETRIC USE OF PLANARITY

I have relied on a geometric lemma asserting roughly
this:

(*) Let the 'size' of compact subsets of R^2 be measured
in one of a number of 'allowable' ways. Then every compact
subset X of the bounded region of R^2 - L where L is a
Jordan curve has size less than that of L.

Two 'allowable' ways of measuring size are 'euclidean
diameter' and 'taxicab diameter'.

Note that the surface of a very mountainous country may
be homeomorphic to R^2 but not satisfy (*) --- when the
'size' is understood to be euclidean diameter (in R^3).

---------------------------

Cheers

Laurent S.

PS. For a related geometrical proof of the the above
Jordan-Schoenflies theorem, one that has the virtue of
being mathematically precise and rigourous, albeit long,
see M. Grayson, J. Differential Geom. 26 (1987),
285--314.

Laurent S.

unread,
Jun 24, 2009, 9:32:41 PM6/24/09
to

Hello ImageAnalyst,

Thanks for your "BlobsDemo.m".

Having located the nearest MatLab, I find that it
lacks the MatLab image processing module.

But I am not yet at the end of my tether. Could you
please indicate somehow which of the many optional
MatLab modules are required by your "BlobsDemo.m".

I prefer your usage of the word "blob" over my
original term "speck" or "speckle", which connote
small and negligible.

Is your usage of "blob" standard or just a 'good
idea'. Has it a meaning outside the world of
bilevel images?

Elsewhere, I have encountered "blob" meaning
just about any fragment of an image.

Cordially,

Laurent S.

ImageAnalyst

unread,
Jun 24, 2009, 10:35:36 PM6/24/09
to
Laurent S.
The MATLAB code I gave does require the image processing toolbox for
the bwlabel and regionprops methods. bwlabel is a "connected
components" or "labeling" routine, to use more standard terminology.
You can download a 30 day free trial that would include that toolbox,
or, if you have the student version, I think that student version
comes with the image processing toolbox (but no telephone support0.

"blob" is a fairly standard term and I think it's usage is pretty
widespread. It refers to any irregularly (or nicely) shaped object in
a binary image - a binary image which represents locations of the
objects you have found (hopefully found to your satisfaction but it
can include undesired objects also that you may get rid of later). It
usually doesn't apply to gray level images. If you're talking about
the regions in your original image, masked by the binary image so that
just the parts of the original image inside the blobs are being
considered, then they usually call those the "objects," "segmented
objects," or "regions."

And like I said before, you're going to need to do a more
sophisticated analysis of the blobs to decide if they belong or don't
belong in your image than just a simple area filtering like I showed.
Full blown OCR probably does some size/shape/intensity filtering but
then probably does more sophisticated, and proprietary, stuff beyond
that. And that is where the art of OCR comes in and where
manufacturers stake their competitive advantage.

Laurent S.

unread,
Jun 25, 2009, 11:50:27 PM6/25/09
to


Image Analyst,

Thanks for another helpful reply on 24 Jun 2009.

Yes "blob" seems to have a pretty broad meaning not
necessarily implying connectedness nor smallness. I'll have to
make my connectedness etc. explicit.

"speckling" and "speckle" on the other hand seem
usually have meaning substantially in conflict with mine:
in particular speckles are usually crowded together
and not bitonal whereas I was discussing conspicuously
isolated bw regions.

Size, OCR recognition (or not) plus 'degree of isolation'
will hopefully be pretty good tools for automatic suppression
of my 'bad blobs'.

Cheers

Laurent S.

Hesham

unread,
Jul 3, 2009, 11:48:15 AM7/3/09
to lcs...@gmail.com
Laurent, are you still looking for an algorithm/implementation?

I've implemented very efficient code for another application but I
suppose it can be used to remove 'bad blobs'.

Using integral images and then box filters you can detect the blobs.
And using different sizes of the box filters you can detect the scale
of the blob as well. Instead of box filters can use a small octagon
inside a larger one which is more isotropic, but a bit slower (two
more integral images should be computed and for computing the response
at each pixel 6 more additions are needed). For a 640x480 grayscale
image it takes a few millisecond.

In case it may come handy, next weekend I can work on this for 'bad
blobs' and send you the C code.

Laurent S.

unread,
Jul 5, 2009, 2:37:59 AM7/5/09
to

Greetings Hecham,

> I've implemented very efficient code for another application but I
> suppose it can be used to remove 'bad blobs'.

> ...

> In case it may come handy, next weekend I can work on this for 'bad
> blobs' and send you the C code.

Yes it could indeed come in handy --- especially if you could
contribute documented C-code and configurable binaries providing Linux
command-line filters.

What set me off on this track is a long-standing and but never
satisfied request for such despeckling of bilevel scans --- a request
emanating from an Amer. Math. Society activist involved in scanning
and internet posting of legacy math literature in the public domain.

In this thread, ImageAnalyst contributed some MatLab code, but that is
not the sort of solution that is readily accessible to the potential
users.

In this thread also, Jens Dierks has impressively worked some examples
using medial filtering in raw Pascal; that could lead to the requested
open code and binaries but he has not yet committed himself (yet:).
Nor estimated processinf times. But your methods may be close to his.

Incidentally you might try your methods on the same samples worked by
Jens Dierks to check that your approach is competitive.

>...


> For a 640x480 grayscale image it takes a few millisecond.

>...

Total processing time of < 1 sec per bilevel page at 600dpi on a 32 bits
2Ghz Linux box would be satisfactory. You can freely pick your bilevel source
and target file format, since good batch converters exist already.

Gray scale 'bad blob' elimination is less urgent since currently bilevel
scanning and enduse is state-of-the art for such public domain material.
Of course that will gradually change.

Cheers,
Laurent S.

PS. You can read up the original request and some subsequent discussion
on the archive of the Electronic Math journals email list:

http://math.albany.edu:8800/hm/emj/

Try the keyword "speck".

Jens Dierks

unread,
Jul 6, 2009, 12:22:00 PM7/6/09
to
Laurent S. wrote:
...
> In this thread also, Jens Dierks has impressively worked some examples
> using medial filtering in raw Pascal; that could lead to the requested
> open code and binaries but he has not yet committed himself (yet:).
> Nor estimated processinf times. But your methods may be close to his.

I thought my explanations would be enough, because the algorithms
are very easy.
You can download a Delphi6 version with sources and a compiled
version for windows:
http://freenet-homepage.de/JDierks/tmp/ScanFilter.zip

I had some errors in the open dialog, but couldnt find the reason,
its more a beta version for testing.
File format is bitmap, 2 colors.

Best regards,
Jens D.

Jens Dierks

unread,
Jul 6, 2009, 5:38:20 PM7/6/09
to
> You can download a Delphi6 version with sources and a compiled
> version for windows:
> http://freenet-homepage.de/JDierks/tmp/ScanFilter.zip

Some speed improvements:
http://freenet-homepage.de/JDierks/tmp/ScanFilterNew.zip

Ethan

unread,
Jul 8, 2009, 12:05:11 PM7/8/09
to
I just wrote a function in matlab that removes blobs of 3 pixels or
less with 8 way connectivity that uses brute force:

It cycles through the binary image one pixel at a time, row by row.
If the pixel is a 1 and the sum of the 3x3 area around the pixel is <
4 continue, otherwise go to the next pixel:
If all of the 8 surrounding pixels are 0, then set the pixel to 0
and set flag as done (skip the rest, e.g. while not done, do).
There are 4 patterns of two connected pixel. Check for each pattern
and set flag as done as soon as any pattern matches. For example, if
the pixel to the right to is 1 and the 10 pixels around the pair are
0, then set the two pixels to zero and mark set flag as done (skip the
rest).
Then if the pixel is still not done, do the same thing for all 20
patterns of 3 connected pixels. (Check that the 3 pixels in the
pattern (actually, only the 2 additional pixels) are 1 and the
surrounding pixels are 0.) As soon as any pattern matches, set the 3
pixels to zero, set flag as done and continue to the next pixel.
Otherwise unset flag and go to next pixel.

In order to work at the edges, I add three rows and columns of 0s to
the image and only cycle through the pixels within this frame. At the
end, I remove the frame.

Laurent S.

unread,
Jul 8, 2009, 10:31:32 PM7/8/09
to

Hi Jens,

You wrote Mon, 6 Jul 2009 23:38:20 +0200


Congratulations, you seem to have reached two key goals:

--- adequate speed for production work
--- an available and usable binary executable

I would like to make something approaching a
real-world despeckling test on one or more legacy math
articles that are bridled by copyrights.

But, for that, we need you to provide a batch processing
mode in your "ScanFilter" program.

Perhaps the easiest way is add to your "ScanFilter"
the ability to process a 'batch file' of which
a typical line might be:

page_17.bmp > page_17_filt.bmp

Of course a multi-file "drag-and-drop" mode
would be more user-freindly. But that could be
introduced later.

Cheers,

Laurent S

PS. Have you system access to translation

bw .bmp format <==> bw .tiff group 4 fax format

? If so, prefer that .tiff format.
Most bw scans use that .tiff format
to reduce file bulk.

Jens Dierks

unread,
Jul 9, 2009, 12:51:59 PM7/9/09
to
Laurent S.wrote:
>
> Hi Jens,

Hi, but why is this posting behind Ethan's message?

Also i dont know why the real world examples are not public, but
the OCR results will be?

...


> Congratulations, you seem to have reached two key goals:
>
> --- adequate speed for production work
> --- an available and usable binary executable
>
> I would like to make something approaching a
> real-world despeckling test on one or more legacy math
> articles that are bridled by copyrights.
>
> But, for that, we need you to provide a batch processing
> mode in your "ScanFilter" program.
>
> Perhaps the easiest way is add to your "ScanFilter"
> the ability to process a 'batch file' of which
> a typical line might be:
>
> page_17.bmp > page_17_filt.bmp

Ok, nearly the same:
call the program with:

ScanFilter inBmpFile outBmpFile Filtermode

where Filtermode is a number:
1 for Median9cw5
2 for Least3
3 for Least4

You can do a executeable batchfile like:
ScanFilter page_17.bmp page_17_filt.bmp 3
ScanFilter page_18.bmp page_18_filt.bmp 3
...

I know, its not really smart to start the program every
time again, but for know i doubt that these functions
are state of the art for real world examples.
And i dont want to waste much more time for free, if
this will not be useable in the end.

This is the update:
http://freenet-homepage.de/JDierks/tmp/ScanFilterBatch.zip

regards,
Jens


> PS. Have you system access to translation
>
> bw .bmp format <==> bw .tiff group 4 fax format

nope

Laurent S.

unread,
Jul 9, 2009, 10:46:27 PM7/9/09
to

Hi Jens,

Where I wrote "bridled" I meant to write "not bridled".
So my intended sentence was:

<< I would like to make something approaching a
real-world despeckling test on one or more legacy math

articles that are not bridled by copyrights. >>

My genaral intent in this 'despeckling project' is to
make it convenient to provide free access and good
quality for public domain mathematics literature.

In contrast, JSTOR has for a decade been providing
expensive and restricted (but good quality!) access to
a big slice of America's legacy math literature ---
including most of its math journal volumes now in the
public domain.

The fate of much legacy math literature is still
being decided. See NUMDAM.

> Also i dont know why the real world examples are not
> public, but the OCR results will be?

My real-world examples will be entirely 'public'.
They will eventually include OCR and text copy.
Sorry for my misleading sentence.

Thanks for the command line update "ScanFilterBatch". I
hope to have examples up soon.

> i doubt that these functions are state of the art for
> real world examples. And i dont want to waste much more
> time for free, if this will not be useable in the end.

My guess is that you have in hand the simplest way to
program the simplest despeckling and that your program
is fast and an improvement on anything hitherto
freely available.

This may be the moment to add some documentation!

Cheers

Laurent S.

Questions to ponder:

1. Can you completely eliminate speckles of 3 pixels?

2. Can you rigourously *prove* you can?

3. Can you assure that a class of 'sufficiently big
blobs' will remain absolutely intact?

4. What about allowing the user to vary the median
weightings used?

Laurent S.

unread,
Jul 9, 2009, 11:05:56 PM7/9/09
to

Hi Ethan,

You wrote:

> I just wrote a function in matlab that
> removes blobs of 3 pixels or

> less with 8 way connectivity that uses brute force:...

Thanks for the try.

My first quiery is about speed.
Could you please give us a fairly precise timing
for processing the sample file
"gutter.tiff" in

http://topo.math.u-psud.fr/~slc/speckle_samples/

?? And how does it compare in performance
with the MatLab code posted by ImageAnalyst.
(I am still waiting to get access to MatLab.)

The sample "gutter.tiff" involves the sort of
'gutter and edge' speckling that
bw scans of books often massively present.
(The gutter speckles often do mix with print.)

Cheers

Laurent S.

Ethan

unread,
Jul 10, 2009, 10:39:47 AM7/10/09
to
On Jul 9, 11:05 pm, lcs7...@gmail.com (Laurent S.) wrote:

> My first quiery is about speed.
> Could you please give us a fairly precise timing
> for processing the sample file
> "gutter.tiff" in

First, when I view the image, gutter.tiff, in a browser its polarity
is opposite that seen when I open it in Photoshop or Matlab. I am
going to assume that the image should be the way it appears in Matlab
& photoshop. So the image is black on the left, white on the right,
with narrow black stripe running down it on the right of center.

My code is written to remove "on" pixel noise. That is connected
groups of 3 pixels or more with a value of 1.
When I run the function in this way, it takes about 9.12 seconds.
But my guess is that you want to remove the black specks so I inverted
the image so the 1s are 0s and the 0s are 1s and then ran my function.
This took about 6.65 seconds. (not including inverting the image; just
the "algorithm")

My code is (purposefully) bad Matlab code in that it is using two
nested loops. It also has a sub function that returns the index values
of each of the patterns it checks. If this was incorporated into the
main function, it might be faster. But it was much easier to write the
other way. There are more efficient ways to write it in Matlab, but my
job is to write code that will eventually be translated to C. Anyway,
the execution of this "algorithm" after it is compiled in C would
probably be much faster. (Probably by more than x10, but I really
don't have enough experience to say.)

I'm sorry, but I am not going to try comparing my function to
ImageAnalyst's because even if I had the time, I would have to make
changes to try to figure out which parts are purely the algorithm part
and which parts are extra for management, explanation, image handling,
etc. So I don't want to make an unfair comparison.

I noticed in my code, that if the algorithm finds a pixel or group of
pixels that need to be eliminated, you might gain some speed by
skipping the check on pixels that have been eliminated already. I
tried implementing this, but adding this to the function made it
slower rather than faster. I suspect that making this "if,then" check
on every pixel is more costly than just checking to see if the pixel
is on or off and doing the other checks if it is on, at least in
Matlab.

I read in a paper,
"Three or less connected pixels, the “isolated” pixels, are
eliminated. In software this morphological operation is implemented in
a hierarchical way which involves purely random memory access. Since
this strategy is not well suited for an FPGA implementation, the
cleaning module is built as a pipeline with a logic unit (LU) with
parallel access to all relevant pixels. Again, the pixels are stored
in a [shift register array] of size 7 x 5. The LU detects in parallel
all possible combinations of three or less connected feature pixels."
There were no refernces given on the details.

Hope this helps.

Jens Dierks

unread,
Jul 10, 2009, 12:26:48 PM7/10/09
to
Hi,
Laurent S. wrote:
...

> My genaral intent in this 'despeckling project' is to
> make it convenient to provide free access and good
> quality for public domain mathematics literature.

Free access means no payment, or does it mean that
it is also for public, but for a (little) payment?
In the last case, maybe it would be fair to give the
people involved in this thread a "really free" access?

Maybe i misunderstood you, my english is not so
good.

...


> My guess is that you have in hand the simplest way to
> program the simplest despeckling and that your program
> is fast and an improvement on anything hitherto
> freely available.
>
> This may be the moment to add some documentation!

If the program is finished, i will make one.

> Questions to ponder:
>
> 1. Can you completely eliminate speckles of 3 pixels?

I guess this question belongs to the new "Least4"
function?
Yes, it should remove all 8way connected Pixels with less
then 4 counts. Except the 2 border pixels, that are not
processed.

> 2. Can you rigourously *prove* you can?

I am just a stupid programmer, just give me an image
containing all cases and i can give you the answer ;-)

> 3. Can you assure that a class of 'sufficiently big
> blobs' will remain absolutely intact?

For now, not all patterns of 4 or more connected
pixels are proofed, so there are cases in which also
one ore more pixels of larger blobs could be
removed.
But i can add the remaining patterns, that would be
no problem.
I think a better algorithm should not only do the
decision on the count, but also on the spread of the
pixels: more random -> blob, not random ->
containing informations, dont remove.
But this would be another function and problably not
fast enough.

> 4. What about allowing the user to vary the median
> weightings used?

Would be possible, in lack of some speed.
But a lower center weight would erase the 1 pixel
wide lines.
No much room for working weights.

For the example gutter.tiff:
In my opinion, this is a case where a grayscale
scanning would make much more sense. Because
gradients of the lightness can be removed without
having speckle artefacts.
Otherwise, these regions would be better cut away
completely by estimation of page borders?

So far for now, i do the program finishing and
documentation if i find some time.
Also, there are some implementations from others,
that could be promising, we will see.

regards, Jens

Laurent S.

unread,
Jul 11, 2009, 2:32:22 AM7/11/09
to

Hi Ethan,

> I am going to assume that the image should be the way
> it appears in Matlab & photoshop.

Correct.

> 9.12 seconds

> 6.65 seconds

Not good enough for production. OK for research.

> the execution of this "algorithm" after it is compiled in C would
> probably be much faster. (Probably by more than x10, but I really
> don't have enough experience to say.)

I would guess that a pefected brute force algorithm in C would be
fastest of any for =< 2 pixel speckles.
But beyond 4 pixels such programming seems to get horrid.

That is why I moved from brute force to (implicit) use of Jordan
curves; that seems to give a very nice though sophisticated algorithm
that would hopefully be quite competitive for eliminating blobs of many
pixels, say >=6.

> I read in a paper,
> "Three or less connected pixels

It is easy to immagine that this problem is open to
attack by massive parallelism.

On the other hand I am confident that excellent solutions
exist without.

Other considerable sources of extra speed are bit twiddling
and assembly language. The use for one byte per pixel is
convenient but wasteful both of RAM and of processing power.

> Hope this helps.

Indeed it does.

Could you post the despeckled "gutter.tiff".
It is a fair debugging test.

Cheers

Laurent S.

Laurent S.

unread,
Jul 11, 2009, 2:37:08 AM7/11/09
to


Hi Jens,


> Free access means no payment, or does it mean that
> it is also for public, but for a (little) payment?


I suspect freeware with an appeal for contribution
would get optimal results. It's your program and your decision.

> In the last case, maybe it would be fair to give the
> people involved in this thread a "really free" access?

Quite possible.

> If the program is finished, i will make one.

It is best to document while perfecting.

> Yes, it ["Least4"] should remove all 8way

> connected Pixels with less
> then 4 counts. Except the 2 border
> pixels, that are not processed.

Try it on "gutter.tiff". Bugs?

I have spotted no instance where a character is damaged
but a computer check is needed. Could you add a feature that
color codes all speckles to be eliminated.
Say by coloring them red.

> I think a better algorithm should not only do the
> decision on the count, but also on the spread of the
> pixels: more random -> blob, not random ->
> containing informations, dont remove.
> But this would be another function and problably not
> fast enough.

Not ripe for action.

> No much room for working weights.

I have encountered remarks that weighted median
filtering is only good for small blobs.
Worth exploring.

> For the example gutter.tiff:
> In my opinion, this is a case where a grayscale
> scanning would make much more sense. Because
> gradients of the lightness can be removed without
> having speckle artefacts.

There are millions of pages of bw scan in tiff G4 format.

> Otherwise, these regions would be better cut away
> completely by estimation of page borders?

Yes, but only after bw despeckling -- in my experience.

> Also, there are some implementations from others,
> that could be promising, we will see.

Yes, but you are furlongs ahead of us ;-)

Cheers

Laurent S.

PS. A switch in your program requesting
white speckle elimination would be very useful
in prectice.

Jens Dierks

unread,
Jul 11, 2009, 8:58:31 AM7/11/09
to
Hi,
Laurent S. schrieb:

> > Free access means no payment, or does it mean that
> > it is also for public, but for a (little) payment?
>
>
> I suspect freeware with an appeal for contribution
> would get optimal results. It's your program and your decision.

I dont ment my program, but the account to the math-papers.
But i think i get the point now. As said, my english is not the
best.

> > Yes, it ["Least4"] should remove all 8way
> > connected Pixels with less
> > then 4 counts. Except the 2 border
> > pixels, that are not processed.
>
> Try it on "gutter.tiff". Bugs?

You can test it with:
http://freenet-homepage.de/JDierks/tmp/gen3pixel.bmp

These patterns are all removed.
To keep all 4 and higher count blobs, i have to add some
code, as i mentioned before.


> I have encountered remarks that weighted median
> filtering is only good for small blobs.
> Worth exploring.

The (weighted) median cannot differ between good
and bad pixel, if you want to erase larger blobs, then it
has also an effect on larger parts of the letters, like
edge rounding.
If you dont want to change the letters at all, median
does not work.



> PS. A switch in your program requesting
> white speckle elimination would be very useful
> in prectice.

Should be makeable.

Regards, Jens

Laurent S.

unread,
Jul 11, 2009, 9:22:44 PM7/11/09
to

Hi Jens,

I proposed:

> PS. A switch in your program requesting
> white speckle elimination would be very useful

> in practice.

Oops, unless I am mistaken median filtering
(even weighted) treats black and white entirely similarly.

So presumably my proposal should read:

| PS. Switches in your program requesting
| only white speckle elimination
| or only black speckle elimination might be
| of some interest.

My original proposal would apply to Ethan's MatLab
program.

Cheers

Laurent S.

Steve Eddins

unread,
Jul 13, 2009, 7:46:34 AM7/13/09
to
Laurent S. wrote:
> Hi Ethan,
>
> You wrote:
>
> > I just wrote a function in matlab that
> > removes blobs of 3 pixels or
> > less with 8 way connectivity that uses brute force:...
>
> Thanks for the try.
>
> My first quiery is about speed.
> Could you please give us a fairly precise timing
> for processing the sample file
> "gutter.tiff" in
>
> http://topo.math.u-psud.fr/~slc/speckle_samples/
>
> ?? And how does it compare in performance
> with the MatLab code posted by ImageAnalyst.
> (I am still waiting to get access to MatLab.)

The following code removes black blobs containing fewer than 4 pixels.

bw2 = ~bwareaopen(~bw,4);

The time required is about 1.007 seconds on my computer.

MATLAB release: R2009a
Also requires: Image Processing Toolbox
Computer: IBM Thinkpad T60
Computer details:

CPU: x86 Family 6 Model 14 Stepping 8, GenuineIntel
The measured CPU speed is: 1815 MHz
Number of processors: 2
RAM: 2046 MB
Swap space: 4959 MB
Microsoft Windows XP

Processed image location:

http://blogs.mathworks.com/images/steve/2009/gutter_processed.tiff

---
Steve Eddins
http://blogs.mathworks.com/steve/

Ethan

unread,
Jul 13, 2009, 8:36:42 AM7/13/09
to

> Could you post the despeckled "gutter.tiff".
> It is a fair debugging test.

I posted the processed image here:
http://gallery.me.com/emontag#100041

Laurent S.

unread,
Jul 14, 2009, 12:32:24 AM7/14/09
to


Hi All,

Thanks to Jens D., Steve E., and Ethan M. for processing
the sample "gutter.tiff". Nice progress.


--- Ethan, unfortunately, I have failed to downmoad

http://gallery.me.com/emontag#100041

from your page. I believe that (for me) the URL will have to be
by full filepathname going below your top level.


--- Steve,

> bw2 = ~bwareaopen(~bw,4);

Is "bwareaopen" part of some standard MatLab library? Or where
does it come from? Is it fresh code? What relation to
ImageAnalyst's posted code that he said involves a complete
listing of connected components?

To better compare with Jens D.'s program could you (and Ethan)
please remove both black and white blobs from "gutter.tiff".


--- Jens, I have posted what seems to me the best output
that your median filter program is currently giving me, namely
"gutter_jd_1_max.tiff" of 66Ko. I will discuss its unique features
later. (Note that Jens' program currently removes some but not all
3-pixel connected blobes.)


I am continuing to post results at

http://topo.math.u-psud.fr/~slc/speckle_samples/

identifying your contributions as we discuss them. For uniformity,
I use the (optimally?) efficient TIFF G4 fax format.

Steve's new contribution is gutter_se_1.tiff of 140Ko.
The original test file was "gutter.tiff" of 318Ko.

Cheers,

Laurent S.

Steve Eddins

unread,
Jul 14, 2009, 8:29:32 AM7/14/09
to
Laurent S. wrote:
> Hi All,
>
> [snip]

>
> --- Steve,
>
> > bw2 = ~bwareaopen(~bw,4);
>
> Is "bwareaopen" part of some standard MatLab library? Or where
> does it come from? Is it fresh code? What relation to
> ImageAnalyst's posted code that he said involves a complete
> listing of connected components?

bwareaopen is a function in the Image Processing Toolbox, a MATLAB
add-on product; that's why I mentioned that the Image Processing Toolbox
was required. It works by doing connected component labeling and then
discarding the connected components containing too few pixels. You may
find its documentation here:

http://www.mathworks.com/access/helpdesk/help/toolbox/images/bwareaopen.html

I do not know exactly what ImageAnalyst's solution was. This thread has
been quite long and I confess to not reading every single bit of it. ;-)

However, I have corresponded with ImageAnalyst and know that he is
familiar with the Image Processing Toolbox and its use. I suspect his
code is doing something equivalent.

> To better compare with Jens D.'s program could you (and Ethan)
> please remove both black and white blobs from "gutter.tiff".

This code removes both black and white blobs:

bw2 = bwareaopen( ~bwareaopen( ~bw, 4 ), 4);

It runs in about 1.67 seconds on my computer (computer specs posted
previously).

The result is posted here:

http://blogs.mathworks.com/images/steve/2009/gutter_processed_2.tif

Jens Dierks

unread,
Jul 14, 2009, 1:49:51 PM7/14/09
to
Hi,
Laurent S. wrote:
...
> So presumably my proposal should read:
>
> | PS. Switches in your program requesting
> | only white speckle elimination
> | or only black speckle elimination might be
> | of some interest.

Yes, i thought of that.
My update is:
http://freenet-homepage.de/JDierks/tmp/ScanFilter09.zip

I changed the LeastX functions to a LeastN function,
what should be equal to bwareaopen in Matlab.

Plus Median with chooseable center weight and some
words about the using of the program.

regards,
Jens D.

Laurent S.

unread,
Jul 15, 2009, 1:28:05 AM7/15/09
to

Hello Jens,

Great progress Jens. This provides your first documentation,
-- a vital part. I have suggested some linguistic modifications
since you admit to lacking fluency in English; see
the folder "ScanFilter09_doc_var_ls" at

http://topo.math.u-psud.fr/~slc/speckle_samples/

Initially users from Linux and Mac platforms will be
visiting PCs to run your program. So I suggest adding
a line or two to indicate how to get from Windows to
the command line.

Cheers

Laurent S.

Jens Dierks

unread,
Jul 15, 2009, 4:48:04 AM7/15/09
to
Laurent S. wrote:
> Great progress Jens. This provides your first documentation,
> -- a vital part. I have suggested some linguistic modifications
> since you admit to lacking fluency in English; see
> the folder "ScanFilter09_doc_var_ls" at
>
> http://topo.math.u-psud.fr/~slc/speckle_samples/

Have you made the folder visible? I cant see it.

Laurent S.

unread,
Jul 15, 2009, 9:19:41 PM7/15/09
to

Jens,

> Have you made the folder visible? I can't see it.

Sorry, the full URL was:

http://topo.math.u-psud.fr/~slc/jd/ScanFilter09_doc_var_ls

Laurent S.

PS. I anticipate that to achieve good production speed
it will be necessary to avoid killing the
ScanFilter.exe process whenever another file is filtered.
(It is not urgent for small trials.)

Laurent S.

unread,
Jul 16, 2009, 1:49:46 AM7/16/09
to

*** graph component algorithm ***

Jens,

For a bw bitmap, you gave a quick description of the median
filter that you are using in the program ScanFilter. But you
gave no description of your "Least N" algorithm for deleting
small components of the union B of black pixels.

To inspire you to say more, I will describe below a 'graph
component algorithm' applicable to any finite graph G, one
that will enumerate the connected components of G containing
=< N vertices by specifying the =< N vertices in each such
component; the corresponding component of G is then the full
subgraph with exactly these vertices.

This 'graph component algorithm' applies to give a possible
"Least N" procedure by choosing the vertices V of the graph
G to be the black pixels and the edges E to be the
adjacencies of (closed, square) black pixels of the given bw
bitmap.

Is this nearly your algorithm? Or nearly Matlab's?

Here is the 'graph component algorithm'. At each stage we
will have a partition \P of the vertices V of G into
disjoint classes.

Initially the partition \P is into classes consisting of a
single vertex.

In general the partition is into disjoint classes as follows:

--- one class called "big league" consisting entirely of
vertices v, for each of which the connected component of G
containing v is known to contain more than N vertices.

--- any number of classes -- to be called clubs --
each of which is connected and consists of =< N vertices.

Given such a partition \P and a club C in it, we examine
the members of the club (a linked list would be suitable)
searching for an edge e leading to a vertex v' outside the
club. Then there are a few cases to handle:

* If v' is in the big league then we amalgamate C into
the big league.

* if v' is in another club C' and together the two clubs
have > N members then both clubs amalgamate into the
"big league".

In both above cases one restarts this process with the
next club, if any. (One can use a linked list of clubs.)

* if v' is in another club C' and together the two clubs
have =< N members then the clubs amalgamate to form a new
club. Linked lists enumerating the two clubs can be chained
together to get one for the amalgamated club.

* if there is no v' outside C then the club is a connected
component of G; it is output and one moves on to the next
club, if any.

When no club is left, all components have been output, and
the listing of small components is complete. The big league
then consists of all vertices not in a component of =< N
vertices.

Cheers

Laurent S.

Observation. The black despeckling algorithm I have
sketched in this thread can be viewed as a involving a very
trivial case of the above 'graph component algorithm'.
Namely its application to the frontier F of the union B of
black pixels. This F is is already a graph -- its edges
being the edges lying in F of square pixels. Each
'negatively oriented' component component F_0 of F
corresponds naturally to a connected component B_0 of B
namely the one containing F_0. And conversely, given B_0,
its F_0 is the 'outer' frontier component of B_0. The
'graph component algorithm' above becomes quite trivial for
the graph F since its components are essentially canonical
cyclic linked lists, coded for example as 'turtle tracks'.

Jens Dierks

unread,
Jul 16, 2009, 6:56:16 AM7/16/09
to
Laurent S. wrote:
> Sorry, the full URL was:
>
> http://topo.math.u-psud.fr/~slc/jd/ScanFilter09_doc_var_ls

Thank you, this helps a lot.
For the question about getting in windows to the command line
mode
(although the command line option was more designed as
an addition for different operating systems and emulators,
where this mode is maybe more common)
i have to know how the startmenu folder "Zubeh�r" and the
program "Eingabeaufforderung" (maybe console?) are named
in the english windows versions.

Then, i dont really understand what you have changed in
the description of the Least N function:
"LeastN:
counts of all pixels that have the same color as the 'center' pixel
and that are 8-way connected to the center pixel."

I try to explain the function again, it is very simple:
Like the floodfill algorithm, the function searches all pixels
with the same color as the starting pixel and that are
connected between each other. Except that the connectivity
in the floodfill algorithm is commonly a 4-way one, and
in the Least N function also the diagonal directions are
counted as connections. (I hope it is understandable)

So the function starts with the first pixel, counts all to this
pixel (8-way-) connected pixels having the same color, and
changes all these pixels, if the count is less then the given
N.
Then it moves to the next pixel that has to be proofed,
and so on.
I have not seen a difference in the gutter.tif image compared
to the output from the correspondending matlab bwareaopen
function, so it should be equal to it in the main steps.

> PS. I anticipate that to achieve good production speed
> it will be necessary to avoid killing the
> ScanFilter.exe process whenever another file is filtered.
> (It is not urgent for small trials.)

Of course, i will make an option to process a whole folder,
so the command line option will not be urgently needed.

Best regards,
Jens D.

Jens Dierks

unread,
Jul 16, 2009, 8:55:35 AM7/16/09
to
addition:

You described the options of the function:
" Least N : 1..100 = N, minimum pixelcount of connected blobs to stay. "

Since "blob" is defined here as all (8-way) connected pixels having the
same color, the desciption above is a bit misleading?
Because blobs arent connected, else it would be one blob...
N, minimum count of connected pixels to stay(?)

Laurent S.

unread,
Jul 16, 2009, 1:15:36 PM7/16/09
to

Jens,

Language is always slightly ambiguous like programs are
always slightly buggy. We need to debug the verbiage.
Am trying to contact you by email/telephone.
Will be back late -- after dinner.

Cheers

Laurent S.

Laurent S.

unread,
Jul 17, 2009, 1:15:13 AM7/17/09
to

** Making a despeckling algorithm more combinatorial

We have now encountered two rather different algorithms A and
A* for despeckling. A is essentially a general algorithm for
enumerating the connected components of any finite graph, and
A* is a much more special algorithm using the topology and
geometry of the planar pixel grid --- which I have described by
installments in this thread.

Yesterday I concluded with a comparison of A with A*. The
theme was that the general algorithm A operated on a graph
'dual' to B --- with vertices corresponging to the pixels,
while A* operated on the frontier F of B in the pixel plane.

I squeezed the comparison into a short paragraph by describing
correctly only the agreeable special case where the frontier B
is a compact 2-manifold and F is its boundary, and hence a
1-manifold.

> Observation. The black despeckling algorithm I have
> sketched in this thread can be viewed as a involving a very
> trivial case of the above 'graph component algorithm'.
> Namely its application to the frontier F of the union B of
> black pixels. This F is is already a graph -- its edges
> being the edges lying in F of square pixels. Each
> 'negatively oriented' component component F_0 of F
> corresponds naturally to a connected component B_0 of B
> namely the one containing F_0. And conversely, given B_0,
> its F_0 is the 'outer' frontier component of B_0. The
> 'graph component algorithm' above becomes quite trivial for
> the graph F since its components are essentially canonical
> cyclic linked lists, coded for example as 'turtle tracks'.

In general, F has not just bivalent vertices but also
quadrivalent vertices wherever two black pixels intersect on a
common corner point, say Q, and not more:

o--------o
|XXXXXXXX|
|XXXXXXXX|
|XXXXXXXX|
|XXXXXXXX|
|XXXXXXXX|
o--------Q--------o
|XXXXXXXX|
|XXXXXXXX|
|XXXXXXXX|
|XXXXXXXX|
|XXXXXXXX|
o--------o

(You need a constant width font for this figure and the next.)

Up to now, my way of treating this general case has not been
very appetizing to a programmer, since it introduced a
plethora of geometry more reminiscent of geography than of
combinatorics. Here is another that seems neater. Let us
blow up each such corner point Q to become a new l-simplex s;
while the two square black pixels meeting at Q become black
pentagons meeting along s :

o-----------o
| |
| |
| |
o--------O |
| \ |
| \ |
| O--------o
| |
| |
| |
o-----------o

One can arrange that outside such vignettes nothing is
changed. This gives B' and F'. Note that B' is now
a 2-manifold with boundary F'. Furthermore, there is a natural
quotient map of the plane q: R^2 --> R^2 that crushes s onto Q
and is bijective elsewhere. Clearly q induces a bijection of
the components of B' onto those of B. The quoted agreeable
special case of algorithm A* readily adapts to despeckle
this new 'imaginary' bw image an hence also the original.

Laurent S.

PS. Beware that modified pixel grids for black and for
white are not the same. The relation between them is crucial
to the famous oriental game "Go"!

Laurent S.

unread,
Jul 18, 2009, 1:02:26 AM7/18/09
to

** Making the despeckling algorithm A* still clearer


I improve yesterday's discussion of natural combinatorics for
the despeckling algorithm A*. I am not happy with the
combinatorics of the resolution B' as defined yesterday for the
black pixels B of a given bw bitmap. Recall that B' is
intended to be a 2-manifold in R^2 whose connected components
are (i) obvious, and (ii) in natural one-to-one correspondence
with the connected components of B. The latter are the black
blobs that the algorithm A* is investigating.

The faults of B' as defined yesterday are:

-- a lack of natural and memorable geometry -- in spite of
clear combinatorics

-- unmentioned degeneracies:- 2-cells that are hexagons, heptagons,
or octagons may appear in addition to the pentagons mentioned.

Today's solution is to 'blow up' each 'singular point' Q of B (see
figure) into a small square 2-cell S thus:


|XXXXXXXX
|XXXXXXXX
|XXXXXXXX X indicates black
|XXXXXXXX
|XXXXXXXX
--------Q--------


XXXXXXXX|
XXXXXXXX|
XXXXXXXX|
XXXXXXXX|
XXXXXXXX|


-
/ \
|
| quotient map
|
---


|XXXXXXXX
|XXXXXXXX
oXXXXXXXX
/X\XXXXXXX
/XXX\XXXXXX
-----oXXXXXo-----
XXXXXX\XXX/
XXXXXXX\X/
XXXXXXXXo
XXXXXXXX|
XXXXXXXX|

(Use a constant width font to view the above 'ASCII' figure; it
will probably make the square appear as a rhombus, but it would
become square if viewed from a suitable vantage point.) This
'blowup' operation seems to do its job perfectly. Let B'' be
the resulting planar 2-manifold-with-boundary in R^2. A
square pixel in B is 'snubbed' at those of its four corners that
happen to be a singular point of B; it becomes a slightly
smaller convex 2-cell that has between 4 and 8 sides. We can
make the sides of all the inserted squares equal to any small
length. The most obvious common side length to choose is the
side length of a regular octagon with the same width as a pixel
of the original bitmap. Crushing each square to the
corresponding singular point of B we get a bijection of the
components of B'' to those of B; this lets one carry through the
despeckling algorithm A* with very clear combinatorics.

Relation to yesterday's resolution B':

The resolution B' used yesterday is naturally the quotient of
B'' defined by crushing each square S onto onto the
corresponding segment s in B' by (restriction of) the affine
linear map that crushes to a point each line with slope the
angle 45 degrees = \pi/4. The segment s can be thought of as
being the segment in S half way between the 2 opposite sides of
S that have slope 235 degrees = 3\pi / 4 .


EXAMPLE: The infinite checkerboard and a famous tiling:

Today's resolution B'' of the the union B of the black
squares on the the infinite checkerboard can be described as
follows. In each white square, inscribe a white regular octagon
with the same distance between opposite faces. Then B'' is the
complement of the union of the interiors of all these octagons
It is connected -- i.e. a single blob. The white cells just
mentioned and the and black 2-cells in B'' when taken together
(forgetting color) are the 2-cells of the essentially unique
tesselation of the plane by regular octagons and squares. This
famous tesselation is a widely used floor tiling design.

For any given bw bitmap on the the infinite pixel grid
(whose pixels are all the squares of the infinite
checkerboard), let \B denote the union of its black pixels; its
resolution \B'' (today's) is canonically a quotient of part of
this famous floor tiling, namely the union of those of its
square and octagonal 2-cells (= tessela) that intersect the
(square) pixels in \B. In this sense, the famous floor tiling
design is a 'universal tool' for analysing any bw bitmap.

Cheers

Laurent S.

Laurent S.

unread,
Jul 23, 2009, 10:55:10 PM7/23/09
to

Thanks go to Steve Eddins for showing how a 'oneliner'
in MatLab can eliminate small (=< 4pixel) blobs in bw scans:

> This code removes both black and white blobs:
>
> bw2 = bwareaopen( ~bwareaopen( ~bw, 4 ), 4);
>
> It runs in about 1.67 seconds on my computer (computer
> specs posted previously).

>> CPU: x86 Family 6 Model 14 Stepping 8, GenuineIntel


>> The measured CPU speed is: 1815 MHz
>> Number of processors: 2
>> RAM: 2046 MB
>> Swap space: 4959 MB
>> Microsoft Windows XP

> The result is posted here:
>
> http://blogs.mathworks.com/images/steve/2009/gutter_processed_2.tif

Perfect results from standard commerclal functions!


Jens Dierks' program ScanFilter at

> http://freenet-homepage.de/JDierks/tmp/ScanFilter09.zip

as announced Tue, 14 Jul 2009 19:49:51 +0200, is like home brew;
rather special -- but heady stuff. It runs here at (very
approximately) 10 times the speed, on the same file, giving
the same result.

Jens is now improving his nice user interface and batch mode.
That will permit a more accurate speed comparison. And also
massive use; I anticipate that, when the program is stable,
numerous public scientific literature sites will use it to
clean up their freely available legacy literature scans.

Cheers

Laurent S.

Laurent S.

unread,
Aug 1, 2009, 2:27:03 AM8/1/09
to

** 4-connectity blobs **

I have been inspecting results of despeckling bw scans
of scientific works using the remarkable utility
"ScanFilter" of Jens Dierke, that this thread has been fortunate
enough to instigate:

Jens> My update is:
> http://freenet-homepage.de/JDierks/tmp/ScanFilter09.zip

Examples have recently led me to recommend elimination of a
'finer' sort of blob that should perhaps be called
"4-connectivity blob".

Here I will just make this notion clear and exhibit some of
the examples I have been looking at.

WHAT ARE 4-CONNECTIVITY BLOBS?

So far I have been discussing elimination small black or
white blobs (or speckles) in a black-white (=bw) bitmap on
the standard (infinite) grid of square pixels.

Let's focus first on white. By definition, a white blob has
so far been understood to be a (path-)connected component of
the union of all (square) white pixels. The path connecting
two pixels of the blob is allowed to pass through the corners
of the square pixels. This connectivity notion is also known
as 8-connectivity since a pixel of the (infinite square) grid
has 8 immediate neighbors. The white pixels are thus
partitioned into disjoint 8-connectivity blobs. Up to now I
have called them simply 'white blobs'. Similarly for black.

But now, there is also a (known!) notion of 4-connectity
blob; it arises if one restricts the connecting paths to
avoid the corners of all pixels. Each pixel has then 4
immediate neighbors rather than 8. There results a partition
of the white pixels into 4-connectity blobs. Every white
8-connectivity blob is thus partitioned into white
4-connectity blobs (one or more).

To be quite clear the original notion of blob I have been
using should now be called an 8-connectivity blob (black or
white).

Example: In the bitmap defined by the 8 x 8 checkerboard
(extended by white to infinity), there is exactly one black
8-connectivity blob consisting of all 32 black pixels. That
one blob breaks up into 32 black 4-connectivity blobs
consisting of one pixel each. There is just one white
8-connectivity blob and it extends to infinity. There are
(look carefully!) 17 white 4-connectivity blobs, one extending
to infinity and 16 consisting of one pixel each.


EXAMPLES FOR ELIMINATION OF SMALL WHITE 4-CONNECTIVITY BLOBS.

I have discussed elimination of 8-connectivity blobs ((black
or white) in bw bitmaps and Jens' ScanFilter is currently
a supremely fast tool for this.

Here are examples suggesting that elimination of white
4-connectivity blobs is desirable in bw scans of scientific
articles; see the directories "4-conn_w_char_bmp"
and "4-conn_w_fig_bmp" at:

http://topo.math.u-psud.fr/~slc/speckle_samples/

These two directories exhibit small white 4-connectivity
blobs that are not eliminated by the 8-connectivity
despeckling so far implemented by Jens' ScanFilter utility. The
first shows blobs in characters, the second shows blobs in
parts of line figures.

Where scans of scientific works without greytones (usually in
photos) are concerned, I have yet to encounter a small white
4-connectivity blob that should NOT be eliminated. In
other words, this new sort of despeckling seems safe.
(So far :)

Such white blobs can often be eliminated using suitable
median filtering, but perhaps not so simply cleanly and safely.

I hope Jens will easily add elimination of (small)
4-connectivity blobs to his ScanFilter utility!

Cheers

Laurent S.

Laurent S.

unread,
Aug 2, 2009, 1:27:04 AM8/2/09
to

** Correction

Concerning the bw bitmap defined by the 8 x
8 standard bw checkerboard (extended by white
to infinity) I carelessly wrote on Sat, 01 Aug
2009 08:27:03 +0200 :

> There are (look carefully!) 17 white
> 4-connectivity blobs, one extending to infinity
> and 16 consisting of one pixel each.

when in fact, there are 19 white 4-connectivity
blobs:- one extending to infinity and 18
consisting of one pixel each. Those 18 are the
18 white pixels of the central 6 x 6 square
region of the 8 x 8 board.

Laurent S.

Laurent S.

unread,
Nov 4, 2009, 6:42:33 PM11/4/09
to

*** ScanFilter version 2009v1.2 ***

Greetings all,

Jens Dierks (= JD) has released a new public
version of his free ScanFilter utility;
it is ScanFilter version 2009v1.2.

I have written an accompanying user guide, and some
background material on the 3x3 weighted median filters and
on the the "Least N" color component filters that
ScanFilter implements.

I will be managing the internet distribution site
for ScanFilter at:

http://lcs98.free.fr/soft/scanfilter/

Here are the opening paragraphs of
"ScanFilter_Guide.txt":

| The program ScanFilter is designed to
| efficiently (albeit incompletely) remove 'speckles'
| from bw (= black and white = bilevel) scanned images
| of print. The term 'speckle' here refers to a small
| collection of pixels with a common color (b or w),
| which the program user considers inapproptiate and
| hence wants to switch the color. Such "despeckling"
| is typically a preliminary step in preparing scanned
| images of print for onscreen viewing. It reduces file
| bulk in all compressed formats; it also increases
| image quality when done with care. Despeckling is
| often urgent when the original scan files were bw
| rather then grayscale, the degree of urgency
| depending on the scanner and its settings.
|
| ScanFilter currently operates under the
| Microsoft Windows operating system only. It combines
| a neat interactive user interface with the capacity
| to process many thousands of pages per hour. Its
| speed comes from occasional use of assembly language,
| and also some 64 bit vector processing features that
| are collectively called "MMX". The latter vector
| features were introduced on Intel's Pentium
| microprocessors, and now exist on many others,
| including AMD's K6 and later. Furthermore, ScanFilter
| has good input-output (= i/o) capabilities thanks to
| the free "LibTiffDelphi" code library that was
| accessed by JD from his Delphi Pascal programming
| environment; thus it can even serve as as a format
| converter.
|
| A last-minute addition to this version is a
| bitmap inspection tool intended to collaborate with
| other bw bitmap filtering utilities. It is a setup
| for visual scrutiny of differences between
| any two bw bitmaps on the same pixel grid. It is
| accessed by "drag-and-drop" onto the icon of the
| ScanFilter program.

After his considerable efforts to perfect
ScanFilter in correspondence with me throughout
August, September, and October, Jens has indicated
that he is now 100% focused on other programming
projects. Thus, this release marks an indefinite pause
in his devellopment of ScanFilter. However Jens has
generously provided the complete Delphi6 Pascal
sourcecode for 2009v1.2 allowing very liberal
conditions on other programmers' use thereof.

My ScanFilter site above will track and/or post both
eventual updates and related devellopments. I hope
members of this newsgroup will be offering comments,
suggestions, and announcements related to ScanFilter and
its documentation.

The present version of ScanFilter merely scatches
the surface of what can be called 'morphological bw image
processing'. But I believe it offers a very good basis
for interesting and useful devellopments in that
direction.

Thanks to all who have contributed to this thread,
and a special salute Jens' for his excellent programming.

Laurent S.

0 new messages