Need to iron out a few corner cases... but on the whole, it just works....
region_struct is the main data structure. It encodes information on a
per-region basis. The main function is intersect_streams, which does
the approximate stream intersection, and returns a list of regions
which the caller may choose from by applying it's own heuristics.
TODO: Make the fucker display correctly formated lyrics. How to go
about doing that???? If you look at struct word_t, you will see that
provision for that has already been made. :-) So, we just need to get
the bounding offsets, and dump the data between those offsets from the
input string to the output stream.
What do you need to compile it?
g++ 3/4, etc....
g++ intersection.cpp [should be just dandy].
How to run:
./a.out 2> /dev/null
[To stop it from displaying a lot of debug junk]....
What is this "Size: 195" displayed???? It's the number of matches that
intersection algorithm found.... whooooow!!!! Hence, we use one more
filter which chooses the longest match.
ps. There shouldn't be any apparent bugs because it's been thought through....
This can be used to get also say poems from the web, etc....
The one problem I know of is commented at line 285, which is why we
_sometimes_ get trailing junk characters -- after the song lyrics are
done. IMHO, this can be taken care of later when we do the choosing of
the best match, and instead of choosing the longest match, choose the
shortest one. The place I'm talking about is when we generate the
fully connected graph for each of the search results, and choose the
one with the most overlap. Another possible solution to this problem
looks like the one of using a two way intersection. Since our
algorithm is asymmetric(A INTERSECT B) is not necessarily the same as
(B INTERSECT A) [hello... it's an approximate algorithm.... You should
have come to expect it!!!!]....
A question I have kept asking myself: Why the f*** am I using
hash_multimap???? I don't know????
--
-Dhruv Matani.
http://www.geocities.com/dhruvbird/
"Be sure brain is in gear before engaging mouth"
-- Anonymous