[Haskell-cafe] Markov Text Generator & Randomness

25 views
Skip to first unread message

Charles-Pierre Astolfi

unread,
Jul 23, 2014, 10:30:32 PM7/23/14
to Haskell Cafe
Hi -cafe,

I'm coding a Markov Text Generator (of order 1). Basically, you have a
source text, and knowing the frequencies of pairs of consecutive
words, you generate a somewhat syntactically correct text from this.

Here's the link to my code and to a source text you can use as example.

test.txt
http://lpaste.net/raw/4004174907431714816
code
http://lpaste.net/4147715261379641344

The kickers is that this code generates sentences with consecutive
words that never appears next to each other in the source text.
For example, the code generated "They sat over at because old those
the lighted.", but "over at" never occurs in the source text, so it
shouldn't occur in a generated sentence.

The makeDb function gives is correct, so my problem actually lies in
generate and/or in draw.

I think there's something about RVar that I messed up, but I don't see
the problem. Any ideas?

Cheers,
--
Charles-Pierre
_______________________________________________
Haskell-Cafe mailing list
Haskel...@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Ben Gamari

unread,
Jul 23, 2014, 10:40:12 PM7/23/14
to Charles-Pierre Astolfi, Haskell Cafe
Charles-Pierre Astolfi <c...@crans.org> writes:

> Hi -cafe,
>
> I'm coding a Markov Text Generator (of order 1). Basically, you have a
> source text, and knowing the frequencies of pairs of consecutive
> words, you generate a somewhat syntactically correct text from this.
>
> Here's the link to my code and to a source text you can use as example.
>
> test.txt
> http://lpaste.net/raw/4004174907431714816
> code
> http://lpaste.net/4147715261379641344
>
> The kickers is that this code generates sentences with consecutive
> words that never appears next to each other in the source text.
> For example, the code generated "They sat over at because old those
> the lighted.", but "over at" never occurs in the source text, so it
> shouldn't occur in a generated sentence.
>
You mean like,

"The old man looked from his glass across the square, then over at the waiters."

Otherwise my cursory look turned up no bugs.

Cheers,

- Ben

Charles-Pierre Astolfi

unread,
Jul 24, 2014, 8:25:24 AM7/24/14
to Ben Gamari, Haskell Cafe
> "The old man looked from his glass across the square, then over at the waiters."
You're embarrassingly right! But then, "those the" definitely never
appears, altough it does in my generated text.

> Otherwise my cursory look turned up no bugs.
Unfortunately there is :(

--
Cp

Vo Minh Thu

unread,
Jul 24, 2014, 9:20:49 AM7/24/14
to Charles-Pierre Astolfi, Haskell Cafe
Just a note: in the first "where" clause in `generate`, you don't need
to pass around the `db` variable (it is visible to the "where"
clause).

I never used that `RVar` monad, but I guess that every time you run
`rword` you might get a different result. So in your `go` function,
once you have executed `word <- rword`, you should not pass `rword`
down the `draw` function, but instead, say, `return word`.

Ben Gamari

unread,
Jul 24, 2014, 9:56:03 AM7/24/14
to Charles-Pierre Astolfi, Haskell Cafe
Charles-Pierre Astolfi <c...@crans.org> writes:

>> "The old man looked from his glass across the square, then over at the waiters."
> You're embarrassingly right! But then, "those the" definitely never
> appears, altough it does in my generated text.
>
>> Otherwise my cursory look turned up no bugs.
> Unfortunately there is :(
>

Ahh yes, looking a bit more closely now I have a few points:

1. In `draw`: The first argument is an action which will return a new
word. Instead of passing `rword :: RVar Word`, you presumably rather want to
pass `word :: Word`. This is likely the cause of your bug.

2. In `draw`: Instead of `weightedSample` which produces a random shuffling of
the entire list, you really just want to draw a single word. This
is a categorical distribution; use `Data.Random.Categorical.fromList`
to construct the distribution and `R.rvar` to draw a variate. Note
that you may only want to avoid doing the former more than once as
construction of the distribution requires sorting and normalizing.

3. `map (\(x,y)->(y,x))` is just `map swap` where `swap` is provided
by `Data.Tuple`.

My quick rework of your code can be found here [1].

Cheers,

- Ben


[1] http://lpaste.net/108025

Charles-Pierre Astolfi

unread,
Jul 24, 2014, 5:36:55 PM7/24/14
to Ben Gamari, Haskell Cafe
You're right Ben, changing the signature to Word instead of RVar Word
did the trick. Stupid mistake.

Thanks!
--
Cp
Reply all
Reply to author
Forward
0 new messages