Interesting Programming Problem, Need Some Help

Leonard Howell

unread,

May 21, 1998, 3:00:00 AM5/21/98

to

Here is a interesting problem that perhaps someone has solved
before or has an idea of how to program the solution. Given a
square array (NxN) filled with zeros and ones (in practice, there
are many more zeros than ones), what is the distribution of patterns
of size 1, 2, 3, ...., where a pattern is defined any string of adjacent
1's, either horizontally or /and vertically distributed, and the number
of "singlets" is also of interest To simply the problem, I'm ignoring
diagonal connections. For example, in the following array, there are:
5 ones, 1 two, 1 three, 2 fours, and 1 six.

0 0 0 0 0 0 1 0
1 0 0 1 1 0 1 1
0 1 1 0 1 1 0 0
1 1 1 0 0 0 1 0
0 0 1 0 0 0 0 1
0 1 0 0 0 0 0 1
0 0 1 1 1 0 0 0
1 0 1 0 0 0 0 1

APL and FORTRAN solutions preferred, but any language appreciated.
Thanks, Leonard Howell

Bill Wade

unread,

May 21, 1998, 3:00:00 AM5/21/98

to

Leonard Howell wrote in message <6k1hi1$27j$1...@hammer.msfc.nasa.gov>...
> [ Get histogram of "groups of adjacent 1s"]

Build a graph which has a node at every '1' and an undirected edge at every
adjacent pair of nodes. O(N*N)

Repeatedly pick a node which hasn't yet been counted, and count the
connected sub-graph using a depth-first search. O(node-count).

You could actually use your original data structure and change 1's to
something else (even zero) to show that they had been counted.

Jonathan Fry

unread,

May 21, 1998, 3:00:00 AM5/21/98

to

--
I can't offer code, but here is a simple approach.

The general idea is to scan all the cells in the matrix and count the
runs of which each cell is the anchor. A cell is an anchor of a run if
it is the leftmost or bottommost cell in the run (those directions are
chosen arbitrarily). Taking a vertical run as an example, a cell is the
bottommost of a run if its value is one and the position below it is
not. Now count ones up the column to find the run length. If the
length exceeds one, increment the corresponding frequency counter. The
same idea works in other directions. Since each run has exactly one
anchor, this algorithm can't count a row more than once. If the code
looks for runs in four different directions, it will find each run. If
a cell has four runs of length one, it's a singleton.

To make the programming easier in FORTRAN, start by appending rows of
zeros above and below the matrix, and appending columns of zeros at its
left and right edges. Then loop over only the original matrix.

I think the algorithm can be executed for all cells in parallel for APL,
but I don't see how to count singletons cheaply in APL.
--------------------
Jonathan Fry
Developer
SPSS Inc.
j...@spss.com (SPSS questions to sup...@spss.com)

Jerry A Green

unread,

May 21, 1998, 3:00:00 AM5/21/98

to Leonard Howell

Leonard Howell wrote:
>
> Here is a interesting problem that perhaps someone has solved
> before or has an idea of how to program the solution. Given a
> square array (NxN) filled with zeros and ones (in practice, there
> are many more zeros than ones), what is the distribution of patterns
> of size 1, 2, 3, ...., where a pattern is defined any string of adjacent
> 1's, either horizontally or /and vertically distributed, and the number
> of "singlets" is also of interest To simply the problem, I'm ignoring
> diagonal connections. For example, in the following array, there are:
> 5 ones, 1 two, 1 three, 2 fours, and 1 six.
>
> 0 0 0 0 0 0 1 0
> 1 0 0 1 1 0 1 1
> 0 1 1 0 1 1 0 0
> 1 1 1 0 0 0 1 0
> 0 0 1 0 0 0 0 1
> 0 1 0 0 0 0 0 1
> 0 0 1 1 1 0 0 0
> 1 0 1 0 0 0 0 1
>
> APL and FORTRAN solutions preferred, but any language appreciated.
> Thanks, Leonard Howell

First a question:

Would the following be interpreted as 2 ones and 1 three?

0 0 0 0 0

0 0 1 0 0

0 1 1 1 0

0 0 1 0 0

0 0 0 0 0

If so, then your problem is to find the longest path of all possible
paths ( 6 in your example ), then eliminate this path (by setting the
1's to 0's). Then find the next longest path within the remaining paths
( one of the 2 fours in your example ). Then eliminate this path and so
on. When there are no paths longer than 1 your are done.

I have written algorithms similar to this in the past. For example, a
program to compute the "Knights Tour" (a problem in which you need to
find a path in which a chess knight can move to every square of a 6x6 or
larger grid without landing on the same square more than once.)

Jerry . . .

--
Jerry A Green mailto:sa...@cs-software.com
Custom Solutions http://www.cs-software.com/
7638 Oldridge Road
Charleston, SC 29418 Your source for discounted
Voice: (843) 760 0597 Fortran compilers and
Fax: (843) 552 3239 related software

Jonathan Fry

unread,

May 21, 1998, 3:00:00 AM5/21/98

to

Jonathan Fry wrote:
>
> Leonard Howell wrote:
> >
> > Here is a interesting problem that perhaps someone has solved
> > before or has an idea of how to program the solution. Given a
> > square array (NxN) filled with zeros and ones (in practice, there
> > are many more zeros than ones), what is the distribution of patterns
> > of size 1, 2, 3, ...., where a pattern is defined any string of adjacent
> > 1's, either horizontally or /and vertically distributed, and the number
> > of "singlets" is also of interest To simply the problem, I'm ignoring
> > diagonal connections. For example, in the following array, there are:
> > 5 ones, 1 two, 1 three, 2 fours, and 1 six.
> >
> > 0 0 0 0 0 0 1 0
> > 1 0 0 1 1 0 1 1
> > 0 1 1 0 1 1 0 0
> > 1 1 1 0 0 0 1 0
> > 0 0 1 0 0 0 0 1
> > 0 1 0 0 0 0 0 1
> > 0 0 1 1 1 0 0 0
> > 1 0 1 0 0 0 0 1
> >
> > APL and FORTRAN solutions preferred, but any language appreciated.
> > Thanks, Leonard Howell
>

> --
> I can't offer code, but here is a simple approach.
>
> The general idea is to scan all the cells in the matrix and count the
> runs of which each cell is the anchor. A cell is an anchor of a run if
> it is the leftmost or bottommost cell in the run (those directions are
> chosen arbitrarily). Taking a vertical run as an example, a cell is the
> bottommost of a run if its value is one and the position below it is
> not. Now count ones up the column to find the run length. If the
> length exceeds one, increment the corresponding frequency counter. The
> same idea works in other directions. Since each run has exactly one
> anchor, this algorithm can't count a row more than once. If the code
> looks for runs in four different directions, it will find each run. If
> a cell has four runs of length one, it's a singleton.
>
> To make the programming easier in FORTRAN, start by appending rows of
> zeros above and below the matrix, and appending columns of zeros at its
> left and right edges. Then loop over only the original matrix.
>
> I think the algorithm can be executed for all cells in parallel for APL,
> but I don't see how to count singletons cheaply in APL.
> --------------------
> Jonathan Fry
> Developer
> SPSS Inc.
> j...@spss.com (SPSS questions to sup...@spss.com)

--
Kindly ignore my reply above. It solves a different problem.

whuber

unread,

May 21, 1998, 3:00:00 AM5/21/98

to Leonard Howell

Leonard Howell wrote:
>
> Here is a interesting problem that perhaps someone has solved
> before or has an idea of how to program the solution. Given a
> square array (NxN) filled with zeros and ones (in practice, there
> are many more zeros than ones), what is the distribution of patterns
> of size 1, 2, 3, ...., where a pattern is defined any string of adjacent
> 1's, either horizontally or /and vertically distributed, and the number
> of "singlets" is also of interest To simply the problem, I'm ignoring
> diagonal connections. For example, in the following array, there are:
> 5 ones, 1 two, 1 three, 2 fours, and 1 six.
>
> 0 0 0 0 0 0 1 0
> 1 0 0 1 1 0 1 1
> 0 1 1 0 1 1 0 0
> 1 1 1 0 0 0 1 0
> 0 0 1 0 0 0 0 1
> 0 1 0 0 0 0 0 1
> 0 0 1 1 1 0 0 0
> 1 0 1 0 0 0 0 1
>
> APL and FORTRAN solutions preferred, but any language appreciated.
> Thanks, Leonard Howell

Let's generalize and suppose there are M rows and N columns, with M >= N
(you can always transpose the array to make this happen). There is a
scanning algorithm that requires O(N) space and O(M*N) time, which is as
good as it gets. The proof is by induction on the number of rows, M.
The algorithm requires a data structure that maintains information about
the distribution of patterns within the M X N array together with
information about the patterns that touch its bottom row. For the
induction step, one updates this structure according to what's in the
next row.

Rather than be formal, I will proceed by example. In the 8 X 8 array
presented, after scanning the first row I will represent the data
structure thus:

0 0 0 0 0 0 [1] 0

where the brackets enclose the tip of the lone identified pattern.
After scanning the second row, the data structure looks like this:

[1] 0 0 [2 2] 0 [3 3]

which echos the pattern of ones on the second row of the array, but also
remembers the size of each pattern in which those ones participate. You
should be able to see that creating this updated structure requires only
the previous structure together with the new row, and that you only have
to scan the new row two or three times (or once, if you're clever with
pointers). At the third step we get:

0 [2 2] 0 [4 4] 0 0 | 1, 3

where now, on the right, we begun to output the complete patterns that
have been identified. The full algorithm for this pattern then looks
like:

0 0 0 0 0 0 [1] 0

[1] 0 0 [2 2] 0 [3 3] |
0 [2 2] 0 [4 4] 0 0 | 1, 3
[5 5 5] 0 0 0 [1] 0 | 4
0 0 [6] 0 0 0 0 [1] | 1
0 [1] 0 0 0 0 0 [2] | 6
0 0 [3 3 3] 0 0 0 | 1, 2
[1] 0 [4] 0 0 0 0 [1] |
| 1, 4, 1

The cumulative output, shown on the right, gives the desired result.
(To get the last bit of output, consider processing an additional row of
zeros.)

Actually, one of the most interesting aspects of the updating is not
used in this example, so let's look at the following array:

1 0 1
1 1 1
1 0 0

The algorithm proceeds as follows:

[1] 0 [1]
[5 5 5] |
[6] 0 0 |
| 6

The interesting thing happened at the second step, where a row of three
ones overlapped two patterns. They were merged. Overall, though, it
should still be easy to see that the updating can occur in one pass,
left to right across each row, and that the data structure can't be any
bigger than a constant times the row size.

Here's another amusing example:

1 1 1 1
1 0 0 1
1 1 0 1
1 0 1 0

which is processed in this way:

[4 4 4 4]
[6 0 0 6] |
[9 9 0 9] |
[10] 0 [1] 0 |
| 10, 1

Here, in the second row, separate runs of 1s were connected to the same
pattern. In general, adjacent pattern tips can merge during the
updating, pattern tips can disappear (and cause some output), and new
patterns can appear.

The trickiest part of all this is demonstrating that the data structure
can be represented succinctly as a disjoint series of intervals: in
other words, that you can't have any interleaving of the tips of two
patterns. But this is clear, by a discrete analog of the Jordan curve
theorem, that states if you find a 1 that is the tip of one pattern, and
another 1 further on in the same row that is the tip of the same
pattern, then any intervening 1s must belong to the same pattern, even
if they are separated along that row by 0s. The previous example
illustrates this.

This scanning algorithm clearly lends itself to a FORTRAN implementation
(which would actually scan left to right) rather than an APL one. I
leave the details to an intrepid programmer. Of course, since I've been
so informal (perhaps cryptic), it's also possible I've been wrong...
there's nothing like the rigor of writing a proof (or a working
program).

--Bill Huber
Quantitative Decisions
Merion, PA

Greg Heil

unread,

May 21, 1998, 3:00:00 AM5/21/98

to Leonard Howell

Leonard Howell wrote:
> ...Given a square array (NxN) filled with zeros and ones

(in practice, there are many more zeros than ones), what
is the distribution of patterns of size 1, 2, 3, ...., where
a pattern is defined any string of adjacent 1's, either
horizontally or /and vertically distributed, and the number
of "singlets" is also of interest

Sounds like a transitive closure problem over the relation
of adjacency of occupied cells. A unique representative for
each connected component could also be found using the UNION/
FIND algorithms of Aho Hopcroft Ullman in almost linear time.

> APL and FORTRAN solutions preferred, but any language appreciated.

Some background in J...

D m
.1.1...1
..11.1..
..111.11
.1....1.
.11...1.
..1.....
11.1....
.1..11.1

To select the cells
adj=: ,# [:i. [:*/ $
adj m
1 3 7 10 11 13 18 19 20 22 23 25 30 33 34 38 42 48 49 51 57 60 61 63
of which these
adj m*.1|.!.0"1 m
10 18 19 22 33 48 60
are connected rowwise to these
1+adj m*.1|.!.0"1 m
11 19 20 23 34 49 61
and similarly these
adj m*.1|.!.0 m
3 10 11 22 25 30 34 49
are connected columnwise to these
8+adj m*.1|.!.0 m
11 18 19 30 33 38 42 57

Give such relations (not forgetting the diagonal) one
just needs to construct the transitive closure then
select and count unique rows.

Several threads in comp.lang.apl have mentioned algorithms
for transitive closure (16 hits in dejanews for "transitive
and closure"). In particular Tom Chwastyk quotes one in
APL (as opposed to the J above;)

greg heil
mailto:gh...@acm.org or remove no-spam from reply-to...
http://www.scn.org/tl/anvil

Greg Heil

unread,

May 23, 1998, 3:00:00 AM5/23/98

to Leonard Howell

Greg Heil wrote:
> Leonard Howell wrote:
> > ...Given a square array (NxN) filled with zeros and ones
(in practice, there are many more zeros than ones), what
is the distribution of patterns of size 1, 2, 3, ...., where
a pattern is defined any string of adjacent 1's, either
horizontally or /and vertically distributed, and the number
of "singlets" is also of interest

> Sounds like a transitive closure problem over the relation
of adjacency of occupied cells. A unique representative for
each connected component could also be found using the UNION/
FIND algorithms of Aho Hopcroft Ullman in almost linear time.

Now that i think of it UNION/FIND is the way to go. Given the
set of equations between cells developed below the UNION algorithm
builds a forest of trees. When two trees are equated it walks up
one (FIND) planting the other (and each node on the path) at the top
of the other. Processing all equations thusly gives one tree for each
connected component. A second pass is required to count the sizes
of the trees. This stage uses FIND which starts at a node (occupied
cell in the original array) and walks to the root (bringing up all
nodes on the path) at the top, one would increment a counter. At
the end of this stage you have a labeling as well.

Timing of the algorithm is _ever so slightly_ super linear, by
a factor related to the inverse of Ackermans function. The tree
walk/lifting is a tad tricky but i did do it in '73 in APL (w/o
AHU's help;) and could probably be induced to do it again if you
find AHU inadequate.

> Some background in J...

Gathering the above together

adj=: ,# [:i. [:*/ $

left=: 13 : '0 1+/adj y.*.1|.!.0"1 y.'
above=: 13 : '(0,#y.)+/ adj y.*.1|.!.0 y.'
(left,.above) m
10 18 19 22 33 48 60 3 10 11 22 25 30 34 49
11 19 20 23 34 49 61 11 18 19 30 33 38 42 57

Which are the equations that would need to be run on the forest
seeded by
adj m

Using more complicated connectivity criteria, this is a common
step in image analysis.

Randy MacDonald

unread,

May 24, 1998, 3:00:00 AM5/24/98

to

In article <6k1hi1$27j$1...@hammer.msfc.nasa.gov>, "Leonard Howell"

<leonard...@msfc.nasa.gov> wrote:
>Here is a interesting problem that perhaps someone has solved

>before or has an idea of how to program the solution. Given a

>square array (NxN) filled with zeros and ones (in practice, there
>are many more zeros than ones), what is the distribution of patterns
>of size 1, 2, 3, ...., where a pattern is defined any string of adjacent
>1's, either horizontally or /and vertically distributed, and the number

>of "singlets" is also of interest To simply the problem, I'm ignoring
>diagonal connections. For example, in the following array, there are:
> 5 ones, 1 two, 1 three, 2 fours, and 1 six.
>
>
>0 0 0 0 0 0 1 0
>1 0 0 1 1 0 1 1
>0 1 1 0 1 1 0 0
>1 1 1 0 0 0 1 0
>0 0 1 0 0 0 0 1
>0 1 0 0 0 0 0 1
>0 0 1 1 1 0 0 0
>1 0 1 0 0 0 0 1
>
>

>APL and FORTRAN solutions preferred, but any language appreciated.

>Thanks, Leonard Howell

What you are looking for, in other words, is the distribution of the sizes of
the cliques in the table, where adjacency is in the horizontal/vertical
direction.

In short, the solution can be expressed as (in a pseudo-J):

solution M : freq {rho}{each} cliques adjacencymatrix M

adjacencymatrix M: ( | (-/~) where M) e. 0,1,({rho}M)[2]

where M: #([:i.#)

cliques M : nub {enclose}[2]transitiveclosure M

nub V: (nubsieve V)/V
or as (~:#) or ~. in J

nubsieve V: (V{iota}V)={iota}{shape}V
or as ~: in J

transitiveclosure M: ({and}.{or})^:_ M

freq: (head,#)/.

--
|\/| Randy A MacDonald | I'm a boy, I want details.
|\\| ra...@godin.on.ca |
BSc(Math) UNBF '83 | APL: If you can say it, it's done.
Natural Born APL'er | *** GLi Info: in...@godin.on.ca ***
| Also http://www.godin.on.ca/randy
------------------------------------------------<-NTP>----{ gnat }-

Jim Weigang

unread,

May 24, 1998, 3:00:00 AM5/24/98

to Leonard Howell

Leonard Howell wrote on May 21:

> Here is a interesting problem that perhaps someone has solved before
> or has an idea of how to program the solution. Given a square array
> (NxN) filled with zeros and ones (in practice, there are many more
> zeros than ones), what is the distribution of patterns of size 1, 2,
> 3, ...., where a pattern is defined any string of adjacent 1's,
> either horizontally or /and vertically distributed, and the number
> of "singlets" is also of interest To simply the problem, I'm
> ignoring diagonal connections. For example, in the following array,
> there are: 5 ones, 1 two, 1 three, 2 fours, and 1 six.

This is a variation/subset of the "blob finder" problem. Once you know
the frequency of different size blobs, it seems likely that a followup
request will be to identify the individual blobs in the matrix.

Here's my algorithm for finding blobs:

* Visit each nonzero cell (top to bottom, left to right)

* If the cell is 1, assign it a new blob index (>= 2)

* Examine the following neighbors of the cell:
. . .
. X n
n n n

* If a neighbor is 1, replace it with X's blob index

* If a neighbor is not 1 or 0, remember that blob n is
equivalent to blob X

* After visiting all the cells, make another pass through the
matrix, recoding the initial blob indices to merge equivalent
blobs and use sequential indices

The blob population counts can be accumulated by bumping a counter
whenever you change a 1 to a blob index.

This algorithm has enough inherent iterativeness that I didn't spend
much time trying to "parallelize" it for fast APL execution. A
straightforward iterative coding (for the APL+ dialect) is given below.
It would make an interesting test for Bob Bernecky's APEX APL compiler,
or it could be translated to a conventional compiled language. If it's
really going to be run using an interpreter, further improvements could
be made. (In particular, the :For K loop could be parallelized easily
if Dyalog APL's P[i]{<-}.+1 feature were available in APL+.)

As written, the program follows links along diagonal lines in addition
to vertical and horizontal lines. Line [16] can be modified if you
want to ignore diagonal neighbors. If you don't want the blob indices
and can't afford the space taken by the integer version of the matrix,
the code could be modified to work with just two rows of the Boolean
matrix at a time (as Bill Huber suggested).

Here are some examples:

B{<-}1=?10 15{rho}4
B
0 0 0 0 0 0 1 0 0 0 0 1 0 0 0
0 0 0 0 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 0 0 0 1 1 0 0 0 0 0 1 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 1 0 0 1 0 1
0 0 0 0 1 0 0 1 0 1 0 0 0 0 0
0 1 0 0 1 1 0 0 0 0 1 1 1 0 1
0 0 1 0 0 0 1 1 1 1 0 0 0 0 0
0 1 1 0 0 0 1 0 0 0 0 0 0 0 0

DISP BLOBS B
. . . . . . B . . . . A . . . 3 1 <- three singletons
. . . . B B B . . . . . . . . 2 2 <- two doubles
. . . . . . . . . . . . . . . 2 4
C C . . . D D . . . . . E . E 1 5
. . . . . . . . . . . . . E . 1 13 <- one 13-item blob
. . . . . . . . . F . . E . E
. . . . F . . G . F . . . . .
. H . . F F . . . . F F F . I
. . H . . . F F F F . . . . .
. H H . . . F . . . . . . . .

B8
0 0 0 0 0 0 1 0 Leonard's example

1 0 0 1 1 0 1 1
0 1 1 0 1 1 0 0
1 1 1 0 0 0 1 0
0 0 1 0 0 0 0 1
0 1 0 0 0 0 0 1
0 0 1 1 1 0 0 0
1 0 1 0 0 0 0 1

DISP BLOBS1 B8 BLOBS1 ignores diagonal neighbors
. . . . . . A . 5 1
B . . C C . A A 1 2
. D D . C C . . 1 3
D D D . . . E . 2 4
. . D . . . . F 1 6
. G . . . . . F
. . H H H . . .
I . H . . . . J

And here's the code:

{del} Z{<-}BLOBS B;A;C;F;I;J;K;P;R;S;T;U
[1] @Finds blobs of 1s in Boolean matrix {omega}, and counts their sizes
[2] @ This version uses both diagonal and vertical/horizontal neighbors
[3] @ Argument:
[4] @ B - a Boolean matrix
[5] @ Results:
[6] @ 1{pick}Z - similar to B, but with each blob replaced by an index
[7] @ 2{pick}Z - frequency matrix
[8] @ [;1] number of blobs
[9] @ [;2] blob size (how many 1s)
[10] @
[11] Z{<-}(-2+{rho}B){take}(1+{rho}B){take}B @add 0s all around B
[12] R{<-}{iota}10 @blob recode vector
[13] @ ^ R[i] gives the value that blob i should be mapped to
[14] P{<-}10{rho}0 @population of each blob
[15] C{<-}1 @current blob index
[16] S{<-}(1 0) (0 1) (1 1) (1 {neg}1) @neighbor offsets
[17] @ ^ To exclude diagonal neighbors, set S{<-}(1 0) (0 1)
[18]
[19] @ Loop for each 1 in the matrix
[20] :For I :in ({or}/Z)/{iota}1{take}{rho}Z @each row containing 1s
[21] :For J :in (Z[I;]{/=}0)/{iota}1{drop}{rho}Z @each nonzero on the row
[22]
[23] @ If the cell is 1, assign it a new blob index
[24] U{<-}Z[I;J] @current cell's blob index
[25] :If U=1 @not yet assigned to a blob
[26] U{<-}Z[I;J]{<-}C{<-}C+1 @new blob index
[27] :If C>{rho}R @R needs more elements
[28] R{<-}R,({rho}R)+{iota}{rho}R @double the size
[29] P{<-}(2{times}{rho}P){take}P @more pop counts, too
[30] :Endif
[31] P[C]{<-}1 @new blob has 1 member
[32] :Endif
[33]
[34] @ Mark neighbors as members of the current blob
[35] :For K :in S @each neighbor
[36] T{<-}{enclose}I J+K
[37] A{<-}Z[T] @the neighbor
[38] :Select A
[39] :Caselist 0 U
[40] :Continue @do nothing if 0 or already in this blob
[41] :Case 1 @unmarked cell
[42] Z[T]{<-}U @mark neighbor as member of our blob
[43] P[U]{<-}P[U]+1 @bump our pop count
[44] :Else @already in another blob
[45] ((R=R[A])/R){<-}U
[46] @ ^ Mark all blobs that are equivalent to neighbor's
[47] @ blob as being equivalent to current blob
[48] :Endselect
[49] :Endfor
[50]
[51] :Endfor @column loop
[52] :Endfor @row loop
[53]
[54] R{<-}C{take}R @discard extra elements
[55] P{<-}C{take}P
[56]
[57] @ Recode the result, merging equivalent blobs and labeling them
[58] @ with consecutive indices
[59] U{<-}T={iota}{rho}T{<-}R
[60] (U/T){<-}{iota}+/U @reduce values to sequential indices
[61] T{<-}0,T[R]-1
[62] @ ^ There are no more 1s in Z (they were used to mark unassigned
[63] @ cells), so map both 0 and 1 to 0, map 2 to 1, 3 to 2, etc.:
[64] Z{<-}T[1+Z]
[65]
[66] @ Combine the populations for the merged blobs
[67] T{<-}1 0{drop}R GROUPEDSUM P @population for each blob
[68] @ ^ Drop first row, which is for unused blob index 1
[69] F{<-}T[;2]GROUPEDSUM(1{take}{rho}T){rho}1 {+
+}@num singletons, doubles, triples, etc.
[70]
[71] Z{<-}(1 1{drop}{neg}1 {neg}1{drop}Z) ({reverse}F) {+
+}@return labeled blobs and frequencies
{del}

{del} Z{<-}G GROUPEDSUM D;L;T
[1] @Sum of {omega}, grouped by {alpha}
[2] @ Arguments:
[3] @ D - data vector
[4] @ G - grouping vector (same shape as D)
[5] @ Result:
[6] @ Z - a matrix whose columns hold:
[7] @ [;1] group value (from G)
[8] @ [;2] data sum (from D)
[9] @ Caution: The result may not have much precision if the
[10] @ data values differ widely in scale.
[11] @
[12] G{<-}G[T{<-}{gradeup}G] @sort by group
[13] D{<-}D[T]
[14] L{<-}G{/=}1{rotate}G @1s mark last of each kind
[15] L[(0<{rho}L)/{rho}L]{<-}1 @correct possible wraparound
[16] T{<-}L/+\D @cumulative group sums
[17] T{<-}T-{neg}1{drop}0,T @individual grouped sums
[18] Z{<-}(L/G),[1.5]T @add distinct group values
{del}

{del} Z{<-}DISP A;F
[1] @Formats a BLOBS result for display
[2] Z F{<-}A
[3] Z{<-}((2{times}2{pick}{rho}Z){rho}1 0)\('.',65{drop}#AV)[1+Z]
[4] Z{<-}Z F
{del}

For more information about the {keywords} used above, see
http://www.chilton.com/~jimw/aplascii.html .

Jim

nhbk...@rrzn-user.uni-hannover.de

unread,

May 26, 1998, 3:00:00 AM5/26/98

to

Leonard Howell (leonard...@msfc.nasa.gov) wrote:
: Here is a interesting problem that perhaps someone has solved

: before or has an idea of how to program the solution. Given a
: square array (NxN) filled with zeros and ones (in practice, there
: are many more zeros than ones), what is the distribution of patterns
: of size 1, 2, 3, ...., where a pattern is defined any string of adjacent
: 1's, either horizontally or /and vertically distributed, and the number
: of "singlets" is also of interest To simply the problem, I'm ignoring
: diagonal connections. For example, in the following array, there are:
: 5 ones, 1 two, 1 three, 2 fours, and 1 six.

: 0 0 0 0 0 0 1 0
: 1 0 0 1 1 0 1 1

: 0 1 1 0 1 1 0 0
: 1 1 1 0 0 0 1 0
: 0 0 1 0 0 0 0 1
: 0 1 0 0 0 0 0 1
: 0 0 1 1 1 0 0 0
: 1 0 1 0 0 0 0 1

: APL and FORTRAN solutions preferred, but any language appreciated.
: Thanks, Leonard Howell

A pretty simple solution using Fortran can be found at

ftp://caver.muk.uni-hannover.de/pub/f90/area2d.f90

It is probably less efficient than those proposed by the previous posters.
But efficiency was not very important for the purpose I have written it
for. (The alternative was counting manually :-)
OTOH it's not too bad, can work both with and without diagonal connections
and it works reliably even for the strangest input.
The algorithm is working on logical arrays and is looking recursively for
all .TRUE. neighbours of a .TRUE. cell and marks them as .FALSE. in a
work-copy of the input array. In other words: it is traversing trees,
each covering one continous area. This is repeated for all "still-.TRUE."
cells in a single pass.
The maximum possible recursion depth can be huge. In worst case it equals
the number of cells of the largest area. Thus, to avoid stack overflows,
the scan function is implemented without using recursion but mimics its own
stack by using a linked list of visited cells.

Regards
Michael
--
*****************************************************************************
Michael Steffens Institut f. Meteorologie u. Klimatologie
Tel.: +49-511-7624413 Universitaet Hannover
email: Michael....@mbox.muk.uni-hannover.de Herrenhaeuser Str. 2
stef...@muk.uni-hannover.de D-30419 Hannover

PGP fingerprint = FA BE 6C 1C F6 C3 EC 33 DD 42 6B 7F DE CF 84 B8

Leonard Howell

unread,

May 26, 1998, 3:00:00 AM5/26/98

to

Jim,

Many thanks for the program. I also downloaded your APLASCII.w3 workspace
and when I tried to run the programs, I got an evolution error from line 2
of DISP which looks like Z F{<-}A. I replaced it with (Z F){<-}A and that
took care of the problem. I'm using APL+WIN Ver. 3 - is there some
"setting" that needs to be set to accept code such as Z F{<-}A ?

Also, I ran BLOBS several times and found a case where the program
apparently needs some tweaking. I haven't dug through the logic yet enough
to know where to make changes, so I thought you'd like to take a look at the
following case and decide best how to fine tune BLOBS.

A

0 0 1 0 0 0 0 0 0 0
1 0 0 0 0 1 0 0 0 0

0 1 1 0 0 1 1 0 0 1
1 0 1 1 1 0 0 0 0 0
0 0 1 0 0 1 1 0 1 1
0 1 0 0 1 0 1 0 0 0
0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 1 0 1 1 1
0 0 0 0 0 0 1 1 0 0
0 0 1 0 1 1 0 1 0 0

BLOBS A
0 0 1 0 0 0 0 0 0 0 3 1
8 0 0 0 0 8 0 0 0 0 1 2
0 8 8 0 0 8 8 0 0 2 1 5
8 0 8 8 8 0 0 0 0 0 1 21
0 0 8 0 0 8 8 0 3 3
0 8 0 0 8 0 8 0 0 0
0 0 0 0 0 0 0 8 0 0
0 0 0 0 0 5 0 8 8 8
0 0 0 0 0 0 5 8 0 0
0 0 4 0 5 5 0 5 0 0

note that the 5's should really be 8's.

Thanks, Leonard

Leonard Howell

unread,

May 27, 1998, 3:00:00 AM5/27/98

to

I want to thank everyone for the tremendous response and and help I received
on this problem. I have received programs in C++, FORTRAN90, J, and APL to
solve this problem, as well as important verbal insights on this problem and
its associated programming. Right now APL is the only language I feel very
confortable in programming, and for that reason I chose to first concentrate
on Jim Weigang's provided APL program. However, I am also excited about
trying to learn J in the near future when I can get a little money to buy it
as it looks very powerful too. I've got FORTRAN90 but I still only use
FORTRAN77, so again it's a matter of finding the time to sit down and learn
it. Below is a note I sent to Jim W. this morning regarding this problem
and everyone's help, and I thought I'd post it here for those that are
interested.

Thanks so much for the apl program and all the analysis you did. It will
most likely be an important part of a large Monte Carlo simulation of the
performance characteristics of the Orbiting array of Wide-angle Light
collectors (OWL): a Pair of Earth Orbiting "Eyes" to Study Air Showers
Initiated by 10^20 Electron Volt Quanta. The matrix cells can be thought of
as pixels of the detector and the blobs will be flashes caused by incident
cosmic rays and photon noise in the atmosphere (e.g., star light
reflections). We will probably want to add other features for
discriminating the blobs, e.g., look for straight line patterns, at a later
date. We hope to first fly a proof of concept instrument on a high altitude
balloon with something like a 50 by 50 pixel detector (that should run very
fast !), and then later a spaceborne detector on a satellite with a 1000 by
1000 pixel imager.

For the next few days I will be getting more familiar with the science,
etc., and then resume programming. The apl news group and the Fortran group
has been a tremendous help on this problem and it really underscores the
utility of the Internet, news groups, etc., and I will give appropriate
references to you and others that have contributed when I document (and
hopefully publish) what comes out of all this. We will likely have a
homepage for this project before too long and I'll post things of interest
there too. (you can take a look at one of our other project homepages at
http://science.msfc.nasa.gov/cosmicray/ica/default.htm ).

I'll post this to the apl newsgroup too so everyone can get a feel for the
application. I'm excited to see that apl can solve the large array problem
(1000 by 1000) in realistic time, and I guess from everything I've been
exposed to from the many helpful responses I received, it is time to think
about learning another language. J looks very interesting and powerful, and
then I also received a C++ program and a FORTRAN90 program in response to
this post, so I will need to look at them sometime too, as I do have a C
compiler (came free with my Fortran compiler). I don't have J but might buy
it when I can get some more money here at work. I recently bought a Micron
PC for $2500 and then the APL+WIN 3.0 upgrade was $1200, so I'm at a point
where I need to be productive with what I've got, at least for a while,
before I go asking for more software tools.

Thanks again and I'll *talk* to you soon,

Leonard

Aaron Kulkis

unread,

May 30, 1998, 3:00:00 AM5/30/98

to

Hmmmmm, whether you realize it or not, I think you hit on a really
good solution...

Jonathan Fry wrote:
>
> Jonathan Fry wrote:
> >

> > Leonard Howell wrote:
> > >
> > > Here is a interesting problem that perhaps someone has solved
> > > before or has an idea of how to program the solution. Given a
> > > square array (NxN) filled with zeros and ones (in practice, there
> > > are many more zeros than ones), what is the distribution of patterns
> > > of size 1, 2, 3, ...., where a pattern is defined any string of adjacent
> > > 1's, either horizontally or /and vertically distributed, and the number
> > > of "singlets" is also of interest To simply the problem, I'm ignoring
> > > diagonal connections. For example, in the following array, there are:
> > > 5 ones, 1 two, 1 three, 2 fours, and 1 six.
> > >

> > > 0 0 0 0 0 0 1 0

> > > 1 0 0 1 1 0 1 1

> > > 0 1 1 0 1 1 0 0
> > > 1 1 1 0 0 0 1 0
> > > 0 0 1 0 0 0 0 1
> > > 0 1 0 0 0 0 0 1
> > > 0 0 1 1 1 0 0 0
> > > 1 0 1 0 0 0 0 1

> > >
> > > APL and FORTRAN solutions preferred, but any language appreciated.
> > > Thanks, Leonard Howell
> >

--

Aaron R. Kulkis
Unix Systems Administrator

--------------------------------
I speak for me, not my employer.
--------------------------------

Threads: a poor substitute that you implement in an OS
when it can't handle true context switching properly.