On 09/30/2014 07:00 AM, Malcolm McLean wrote:
> On Tuesday, September 30, 2014 10:08:28 AM UTC+1, JKdr wrote:
>> Input would be:
>>
>> 1) the strings that need to be searched for (or regex patterns) and
>> their substitutions
>>
>> 2) a stream (from a file) and its encoding
>>
>> I spent some time trying to find an implementation of such an
>> algorithm, which I think shouldn't be a big deal at least in the case
>> that you don't use regex, just strings to find, but all implementations
>> I have found touch the input stream more than once
>>
>> I think the data strategies of such algorithms is based on
>> graphs/decision trees, which are used in state machines and similar
>> algorithms
>>
> You need to build a suffix tree, on your search strings. Usually it's done the other
> way round, build the suffix tree with the data and search on the terms,
> but the method can work in reverse.
>
> Then it's a bit of a fiddle to hold the characters in a buffer, until they match
> or don't match your end of string character, which is the terminal of your
> suffix tree.
I had something very similar in mind. String search algorithms tend to
be messy (repeatedly looping though both data). What seems to be great
theoretically sometimes is not practical, could you point me to
algorithms like the ones you describe, preferably with tests? ;-)
Thank you
lbrtchx