Haskell: How do you find most common letter

Varant Aboulian

unread,

Mar 27, 1999, 3:00:00 AM3/27/99

to

I'm just a new user of haskell. I was wondering, and I know it might
seem a little easy, but,
If I have a list of characters (i.e. all letters), how can I find the
most common letter in the list and then output it, assuming there will
be no ties with other letters in a list.
var...@dot.net.au

Adam P. Jenkins

unread,

Mar 27, 1999, 3:00:00 AM3/27/99

to

This is interesting because it looks like a good example of something
that's easier to implement in an imperative language. In C++ I could
write

char most_frequent(std::string const& chars)
{
// create frequency vector, init elements to 0
std::vector<int> freq(256, 0);

for (int i = 0; i < chars.length(); i++)
freq[ static_cast<unsigned char>(chars[i]) ]++;

return distance(freq.begin(),
std::max_element(freq.begin(), freq.end()));
}

This takes O(n) time. In Haskell the best I can think of is something
like the above but using a Map data type with O(log n) insertion time,
and of course recursion. Aside from the fact that there is no
standard Map that comes with Haskell, this would take O(n log n) time.
Is it possible to write an O(n) version of the above algorithm in
Haskell?

--
Adam P. Jenkins
ajen...@netway.com

Martin Norb{ck

unread,

Mar 29, 1999, 3:00:00 AM3/29/99

to

27 Mar 1999 13:23:14 -0500, ajen...@netway.com (Adam P. Jenkins) ->
: > If I have a list of characters (i.e. all letters), how can I find the

: > most common letter in the list and then output it, assuming there will
: > be no ties with other letters in a list.
: > var...@dot.net.au
:
: This is interesting because it looks like a good example of something
: that's easier to implement in an imperative language. In C++ I could
: write
:
: char most_frequent(std::string const& chars)
: {
: // create frequency vector, init elements to 0
: std::vector<int> freq(256, 0);
:
: for (int i = 0; i < chars.length(); i++)
: freq[ static_cast<unsigned char>(chars[i]) ]++;
:
: return distance(freq.begin(),
: std::max_element(freq.begin(), freq.end()));
: }
:
: This takes O(n) time. In Haskell the best I can think of is something
: like the above but using a Map data type with O(log n) insertion time,
: and of course recursion. Aside from the fact that there is no
: standard Map that comes with Haskell, this would take O(n log n) time.
: Is it possible to write an O(n) version of the above algorithm in
: Haskell?

So your solution is not quite a solution to the problem since the
characters were in a list and not in an array, but it's easily fixed.

Generally, algorithms which use destructive update gets a O(log n)
penalty in purely functional languages. In this case you can use a
mutable array, but I don't know if they are part of standard Haskell.
They are in the ST module of hugs and ghc.

So it might look like this. It works despite the ugly syntax.

import ST

inc_by_1 :: (Num elt, Ix ix) => STArray s ix elt -> ix -> ST s ()
inc_by_1 arr ix =
do val <- readSTArray arr ix
writeSTArray arr ix (val+1)

max_index :: (Num elt, Ix ix) => STArray s ix elt -> ST s ix
max_index arr = undefined -- left as exercise to reader

most_frequent :: [Char] -> Char
most_frequent chars = runST (
do freq <- newSTArray (minBound, maxBound) 0
sequence (map (inc_by_1 freq) chars)
max_index freq)

I would very much like to see some nice syntax for arrays in Haskell.

n.

--
--[ www.dtek.chalmers.se/~d95mback ]---[ PGP: 0x453504F1 ]---[ UIN: 4439498 ]--
Opinions expressed above are mine, and not those of my future employees.

Wolfgang Grieskamp

unread,

Mar 30, 1999, 3:00:00 AM3/30/99

to

d95m...@dtek.chalmers.se (Martin Norb{ck) writes:

> Generally, algorithms which use destructive update gets a O(log n)
> penalty in purely functional languages.

This is one of the rather resistent myths in the FP community. O(1)
array update can be implemented in pure languages in several ways. All
these techniques rely on that the array in question is used
"single-threaded" or "linear", which is typically the case for
iterative array algorithms. (This doesn't means that an array is
forced to be _always_ linear, as e.g. expressed by linear type
systems; it just means that in those execution traces where the array
is _used_ linear, O(1) is guaranteed.)

The techniques can be partitioned in runtime and compile-time methods.
At runtime, one can use reference counting (as the OPAL system does)
or version arrays (as my implementation of immutable aggregates for
Java/Pizza does). At compile-time, a pragmatic approach for detecting
single-threadedness is not a hard problem (easier then strictness
analysis) -- though I don't know of systems where this optimization
made it into the main source tree. For OPAL, we had several student
projects years ago which realized the optimization prototypical.
However, the overhead of runtime checking in the particular
implementation model of OPAL isn't so large, so we didn't tracked on.
Nowadays, as (moderate) constant factors in efficiency aren't a big
issue anymore, the benefit of a compile time analysis seems to me
mostly for generating compiler hints for the user that an aggregate
update may be not O(1).

Regards
Wolfgang

--
w...@cs.tu-berlin.de http://uebb.cs.tu-berlin.de/~wg/

Matt Harden

unread,

Mar 30, 1999, 3:00:00 AM3/30/99

to

Martin Norb{ck wrote:
>
> 27 Mar 1999 13:23:14 -0500, ajen...@netway.com (Adam P. Jenkins) ->
> : > If I have a list of characters (i.e. all letters), how can I find the
> : > most common letter in the list and then output it, assuming there will
> : > be no ties with other letters in a list.
> : > var...@dot.net.au
> :
> : This is interesting because it looks like a good example of something
> : that's easier to implement in an imperative language. In C++ I could
> : write
> :

<snip>
> :

> : This takes O(n) time. In Haskell the best I can think of is something
> : like the above but using a Map data type with O(log n) insertion time,
> : and of course recursion. Aside from the fact that there is no
> : standard Map that comes with Haskell, this would take O(n log n) time.
> : Is it possible to write an O(n) version of the above algorithm in
> : Haskell?
>

> ... In this case you can use a

> mutable array, but I don't know if they are part of standard Haskell.
> They are in the ST module of hugs and ghc.
>
> So it might look like this. It works despite the ugly syntax.
>

Who needs arrays? This should run in O(n) time, albeit with
a pretty large constant factor:

findmax :: (Enum a, Bounded a, Ord a) => [a] -> (Int,a)
findmax xs = foldl1 max (
map
(\x -> (length (filter (==x) xs),x) )
[minBound..maxBound]
)

Right? It relies on the fact that we're dealing with a bounded
enumerated type, which Char of course is. If you know it's all letters,
you can reduce the constant factor by replacing [minBound..maxBound]
with ['a'..'z','A'..'Z'].

Matt Harden

George Russell

unread,

Mar 30, 1999, 3:00:00 AM3/30/99

to

Wolfgang Grieskamp wrote:
[snip]

> This is one of the rather resistent myths in the FP community. O(1)
> array update can be implemented in pure languages in several ways. All
> these techniques rely on that the array in question is used
> "single-threaded" or "linear", which is typically the case for
> iterative array algorithms. (This doesn't means that an array is
> forced to be _always_ linear, as e.g. expressed by linear type
> systems; it just means that in those execution traces where the array
> is _used_ linear, O(1) is guaranteed.)

[snip]
I agree with every word of this. It seems to me that detecting linearity
is a special case of region analysis. While I would like to see functional
compilers doing region analysis of some sort, I don't think it's a satisfactory
solution to the original problem, since I don't like relying on the compiler
to spot arbitrary optimisations to turn my algorithm from O(NM) to O(M).
After all, you could write <arbitrarily inefficient solution> and expect
the Haskell compiler to turn it into optimised code.

I think the monadic solution (which uses ST monads, which are not
part of standard Haskell as far as I know) is the best one, but even that
isn't ideal. Suppose the original problem gave us a vector of characters,
rather than a list of characters. Then the ideal solution would allow
parallelisation (different processors doing different bits of the
vector), but the ST monad approach would force everything to
be sequenced.

Martin Norb{ck

unread,

Mar 30, 1999, 3:00:00 AM3/30/99

to

Tue, 30 Mar 1999 04:03:02 +0000, ma...@ninenet.com (Matt Harden) ->
: Who needs arrays? This should run in O(n) time, albeit with

: a pretty large constant factor:
:
: findmax :: (Enum a, Bounded a, Ord a) => [a] -> (Int,a)
: findmax xs = foldl1 max (
: map
: (\x -> (length (filter (==x) xs),x) )
: [minBound..maxBound]
: )
:
: Right? It relies on the fact that we're dealing with a bounded
: enumerated type, which Char of course is. If you know it's all letters,
: you can reduce the constant factor by replacing [minBound..maxBound]
: with ['a'..'z','A'..'Z'].

Well, you are right about the O(n) time bound, but you also have a O(n)
memory bound whereas my solution only have O(1). (Assuming the list is
created lazily.)

But that is because of your algorithm, not because of Haskell.

Matt Harden

unread,

Mar 30, 1999, 3:00:00 AM3/30/99

to

Martin Norb{ck wrote:
>
<quotation of me snipped>

>
> Well, you are right about the O(n) time bound, but you also have a O(n)
> memory bound whereas my solution only have O(1). (Assuming the list is
> created lazily.)
>
> But that is because of your algorithm, not because of Haskell.

I don't entirely see why O(n) space is necessary for my algorithm. I
assume it is because the entire list has to be saved to calculate
length(...) for each value of x from minBound to maxBound.

In any case, this one should be O(n) time and O(1) space, is pure
functional, and doesn't need the ST monad. Probably a large constant
factor (in time) again, though.

> import Array
>
> hist :: (Ix a, Num b) => (a,a) -> [a] -> Array a b
> hist bnds xs = accumArray (+) 0 bnds [(x,1) | x<-xs, inRange bnds x]
>
> findmax :: (Ix a, Ord a) => (a,a) -> [a] -> (Int,a)
> findmax bnds xs =
> foldl1 max [(y,i) | (i,y) <- assocs (hist bnds xs)]

The hist function is of course the most important part. Please note
that hist is from the Haskell Library Report, and is not my creation.

Matt Harden

Wolfgang Grieskamp

unread,

Mar 30, 1999, 3:00:00 AM3/30/99

to

George Russell <g...@informatik.uni-bremen.de> writes:

> [...] It seems to me that detecting linearity is a special case of

> region analysis. While I would like to see functional compilers
> doing region analysis of some sort, I don't think it's a
> satisfactory solution to the original problem, since I don't like
> relying on the compiler to spot arbitrary optimisations to turn my
> algorithm from O(NM) to O(M).

I don't see how, as a functional programmer, one can generally avoid
making such assertions. Another example of this kind is tail-recursion
which is expected to be executed in constant space. Ideally,
complexity in time and space should be be specified in the language
definition (resp. in the library modules), such that it becomes
independent of the language implementation. I think at least SML does
it for tail-recursion. (Don't know of the state of Haskell to this
end.)

George Russell

unread,

Mar 30, 1999, 3:00:00 AM3/30/99

to

Wolfgang Grieskamp made various reasonable points. But you cannot
for any valid optimisation X, assume that your compiler does X when assessing
the efficiency of an ML program, as then all valid ML programs could be
deemed efficient. You have to decide what optimisations are likely
to be implemented for environments in which the program is likely to be run.
I don't think that using linearity to allow in-place update qualifies.
Maybe it should do, but that's another argument. (A disadvantage would
be that in-place update can clobber incremental garbage collectors.)

Wolfgang Grieskamp

unread,

Mar 30, 1999, 3:00:00 AM3/30/99

to

George Russell <g...@informatik.uni-nospam-bremen.de> writes:

> But you cannot for any valid optimisation X, assume that your
> compiler does X when assessing the efficiency of an ML program, as
> then all valid ML programs could be deemed efficient. You have to
> decide what optimisations are likely to be implemented for
> environments in which the program is likely to be run.

Thats the reason why in the real world I prefer a runtime technique
such as version arrays, which is (mostly) independent of the compiler
implementation. Version arrays can be provided as an modular
extension; the module interface specifies the complexity behaviour
(informally); debugging options may warn the developer at runtime when
storage needs to be copied.

I've made good experiences with version arrays and the like in the
Java/Pizza framework, by using immutable aggregates such as sequences,
sets, and finite mappings which are implemented by these
techniques. The constant factor to expect is around 2-3 compared to
mutable aggregates. The programming style is much more convenient as
with mutable aggregates, since algebraic (relational) expressions on
these fundamental data types can be used. And if one actually needs a
state, then an object can wrap the immutable aggregate as an
attribute.

These arguments even more apply to the case of a functional language
such as Haskell. I regard it as, carefully speaken, strange to use a
monad to program with such a nice mathematical object as e.g. a
sequence, set, or map. (This is not directed against you, George, who
already stated objections against the mutable array monad.) If one
searches for the ultimate efficiency and exact control of a
(sequential) machine, C++ might be a better choice. Right, Haskell can
be "as fast as C++", but this is only an issue for system library
hackers. The real benefits and advantages of functional languages are
found on the application programmers side. At least there is the more
promising market for FP, I suppose.

Fergus Henderson

unread,

Mar 31, 1999, 3:00:00 AM3/31/99

to

George Russell <g...@informatik.uni-bremen.de> writes:

>Wolfgang Grieskamp wrote:
>> This is one of the rather resistent myths in the FP community. O(1)
>> array update can be implemented in pure languages in several ways. All
>> these techniques rely on that the array in question is used
>> "single-threaded" or "linear", which is typically the case for
>> iterative array algorithms. (This doesn't means that an array is
>> forced to be _always_ linear, as e.g. expressed by linear type
>> systems; it just means that in those execution traces where the array
>> is _used_ linear, O(1) is guaranteed.)
>[snip]

>I agree with every word of this. It seems to me that detecting linearity

>is a special case of region analysis. While I would like to see functional
>compilers doing region analysis of some sort, I don't think it's a satisfactory
>solution to the original problem, since I don't like relying on the compiler
>to spot arbitrary optimisations to turn my algorithm from O(NM) to O(M).

Wolfgang Grieskamp mentioned linear type systems. If you have a linear type
system, then you can rely on the implementation doing the linearity analysis.
(And as Wolfgang Greiskamp mentioned in a followup, version arrays are
another solution, albeit one with relatively high constant factors.)

--
Fergus Henderson <f...@cs.mu.oz.au> | "I have always known that the pursuit
WWW: <http://www.cs.mu.oz.au/~fjh> | of excellence is a lethal habit"
PGP: finger f...@128.250.37.3 | -- the last words of T. S. Garp.