Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Best way to replace a set of strings in large files?

0 views
Skip to first unread message
Message has been deleted
Message has been deleted

pk

unread,
Dec 20, 2009, 11:07:05 AM12/20/09
to
Ryan Chan wrote:

> Consider the case:
>
> You have 200 lines of mapping to replace, in a csv format, e.g.
>
> apple,orange
> boy,girl
> ...
>
> You have a 500MB file, you want to replace all 200 lines of mapping,
> what would be the most efficient way to do it?

Not sure about "most efficient", but with awk you can do all of that in a
single pass (almost) over the data:

awk -F, 'NR==FNR{a[$1]=$2;next}
{for(i in a)gsub(i,a[i]); print}' mapfile datafile

However, that has at least two problems, which may or may not be relevant
for your scenario:

1) Does not know about "words", so if "pineapple" appears in the data, it
will become "pineorange";

2) assumes that all the strings don't contain regex metacharacters, and that
will likely produce wrong outcomes if one of the words to replace is, say
"a.*b" or similar.

John Hasler

unread,
Dec 20, 2009, 11:21:24 AM12/20/09
to
man sed
--
John Hasler
jha...@newsguy.com
Dancing Horse Hill
Elmwood, WI USA
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted

unruh

unread,
Dec 21, 2009, 2:04:09 PM12/21/09
to
On 2009-12-21, Ryan Chan <ryanc...@gmail.com> wrote:

> On Dec 21, 12:21?am, John Hasler <jhas...@newsguy.com> wrote:
>> man sed
>> --
>> John Hasler
>> jhas...@newsguy.com

>> Dancing Horse Hill
>> Elmwood, WI USA
>
> Yes, I have tried to replace using sed, and work quite fast for a
> SINGLE replacement.
> But if I run the sed multiple times, then it will be slow.

Why run it multiple times? sed or even ed can run as many commands as
you like in a single invocation.


>
> So I ask here to know if any faster method to replace a mapping stored
> in a file. (I can write some scripts, but not sure if any existing way
> can do the tricks)

Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
0 new messages