Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Best way to search through STL maps?

1 view
Skip to first unread message

Steve

unread,
Nov 29, 2009, 7:46:36 AM11/29/09
to
I have the following STL containers:

(using namespace std;)

typedef vector<string> TokenVector; // e.g [the] [cat] [sat] [on]
[the] [mat]

typedef map<TokenVector, long > NGramMap; // a list of unique n-grams
(as above), and a freq count.

My goal is to search for n-grams that match a pattern with wildcards,
for instance:

[the] [cat] [sat] [?] [the] [mat]

and return a list of all matches with all the possibilities for the
wildcard [?], along with the freq count

Assuming the map is sorted alphabetically, is there some way to search
them - like a regular expression - that return pattern matches?

The only way I can think of at the moment is to search for "[the]
[cat] [sat]", then sift through those matches.
This gets messy, though, when the pattern starts with a wildcard, or
contains 2 or 3 wildcards.

This seems like it might be a common task, so I thought I'd ask here
before re-inventing the wheel.
Does STL have ready-made solutions for this kind of search?

Does it make the problem any easier if it's constrained to _always_
being 6-grams, and that there are never more than 3 wildcards?

Thanks for any clues, even if it's just what topics I need to study in
order to crack it.

Steve


er

unread,
Nov 29, 2009, 2:37:06 PM11/29/09
to
Does the question boil down to finding whether

[the] [cat] [sat] [on] [the] [mat]

matches

[the] [cat] [sat] [?] [the] [mat]


where each of the above n-grams are represented by a collection of
strings and symbol ? is not a question mark but a wildcard?


Steve

unread,
Nov 29, 2009, 5:28:31 PM11/29/09
to


Yes, succinctly put.

er

unread,
Nov 29, 2009, 6:00:11 PM11/29/09
to

I have a feeling (yes, nothing more reliable than that as I don't deal
text processing) that working with strings might be easier than
breaking them into tokens that are held by a STL container. And do so
using C++ I would look into Boost.Regex. But probably you already
thought of this and have a good reason to use tokens.

Steve

unread,
Nov 30, 2009, 4:42:53 AM11/30/09
to

Yes, but having still found no shortcuts, I'm beginning to think that
keeping composite string copies for searching is the way to go.
(I really should investigate Boost, but a quick search reveals people
having nightmares building it for OSX due to the transition to 64-bit,
so I'll have to hold off for a while.)

Thanks

Steve

er

unread,
Nov 30, 2009, 9:18:39 AM11/30/09
to
> (I really should investigate Boost, but a quick search  reveals people
> having nightmares building it for OSX due to the transition to 64-bit,
> so I'll have to hold off for a while.)

FYI I was able to build a couple of the boost libraries (the bulk of
boost does not need installing) on Snow Leopard (64-bit). Some of the
searches you found probably pertain to advanced features that you
probably don't need.

Jeff Flinn

unread,
Nov 30, 2009, 10:46:22 AM11/30/09
to

Did you mean this:

> it was reported that in Snow Leopard, Apple has removed support for
> ppc64, and folks trying to use that receive fairly obscure errors.
> Some of Boost users were already confused by this. Now, this is
> of course more like Apple's problem, but here's a patch, against
> current trunk, to do a nice diagnostic of this in Boost.Build.
> The intention is that if you try to build just ppc64, or a two-way
> fat binary including ppc64 you get an error message, and if you
> try tobuild 4-way fat binary, ppc64 is skipped and you get 3-way
> binary.

Jeff

Steve

unread,
Nov 30, 2009, 11:43:54 AM11/30/09
to

Yes, not that exact message, but the gist of it. Well, it looks like
it's worth persevering then.

er

unread,
Nov 30, 2009, 1:17:25 PM11/30/09
to

> >  > it was reported that in Snow Leopard, Apple has removed support for
> >  > ppc64, and folks trying to use that receive fairly obscure errors.
> >  > Some of Boost users were already confused by this. Now, this is
> >  > of course more like Apple's problem, but here's a patch, against
> > Jeff
>
> Yes, not that exact message, but the gist of it. Well, it looks like
> it's worth persevering then.

Does ppc64 concern you? Not me, so I did not need a patch.

Steve

unread,
Dec 1, 2009, 4:40:23 AM12/1/09
to

No it doesn't, but I thought specifying *just* that omission was
tricky for a non-unix man like myself. I'm just being chicken ;) I'm
sure we've all suffered the scenario when, in the middle of an idea,
you get sidetracked installing something, and then 24 hours disappear,
along with clumps of hair.
I think I read somewhere that aspects of boost will be appearing C+
+0x, so it's a no-brainer that I need to start making good use of it.I
may be gone some time ;)

PS Great to be talking to someone here using OSX, I almost thought it
wasn't worth mentioning.

Message has been deleted
0 new messages