On 2014-10-24, Noob <
ro...@127.0.0.1> wrote:
> Hello,
>
> Consider a large number of sed replacement commands, stored in a file.
>
> sed s@aaaa@bbb@
> sed s@cccccc@ddd@
> ...
>
> I have ~3000 such commands in command_file.
> and I'm running sed -f command_file input_file
> (input_file is ~500 KB)
>
> The sed command takes a long time (around 5 minutes).
If you want a blazing fast solution, and the set of replacements does
not have to be easy to modify, you can turn it into a Lex program.
(Compiled to C with lex implementation such as GNU Flex.)
Lex turns all of its pattern rules into a one giant regular expression.
The pattern rules trigger snippets of C code called actions. These
actions can perform output.
That action is triggerd which matches the longest possible text
at any given input position. If there is a tie, then the earlier rule
wins.
If none of the rules match, then a default rule kicks in which matches a single
character, and sends it to standard output, then advances the input to the next
character. (If you write a one-character rule explicitly, it shadows that
rule.)
Her is an example Lex program which translates "foo" to "xyzzy" and "bar"
to "quux".
%{
#include <stdio.h>
%}
/* regex "macros can" be defined here: symbolic names for regex fragments */
%%
foo { fputs("xyzzy", stdout); }
bar { fputs("quux", stdout); }
%%
int main(void)
{
yylex();
return 0;
}
I saved thea bove in a file called "lex.l" and built on an Ubuntu Linux
system like this:
$ lex lex.l # this produces a file called "lex.yy.c"
$ cc lex.yy.c -ll # we must link in the flex library "libflex"
Test run:
$ ./a.out
foo
xyzzy
foo bar
xyzzy quux
Hey foo, I have a bar for you.
Hey xyzzy, I have a quux for you.
The Unix command for compiling a Lex program is "lex"
and the run-time support library is linked with "-ll". These work
fine on the Ubuntu system and other systems that have the right
symbolic links:
$ ls -l /usr/lib/i386-linux-gnu/lib{fl,l}.a
-rw-r--r-- 1 root root 4304 Nov 7 2011 /usr/lib/i386-linux-gnu/libfl.a
lrwxrwxrwx 1 root root 7 Nov 7 2011 /usr/lib/i386-linux-gnu/libl.a -> libfl.a
$ ls -l /usr/bin/*lex
-rwxr-xr-x 1 root root 288384 Nov 7 2011 /usr/bin/flex
lrwxrwxrwx 1 root root 4 Nov 7 2011 /usr/bin/lex -> flex
Without these symlinks, you would have to use "flex" and link with "-lfl".