Google Groups Home
Help | Sign in
read ahead or before
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  13 messages - Collapse all
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
Mag Gam  
View profile
 More options Jul 26, 3:02 pm
Newsgroups: comp.lang.awk
From: Mag Gam <magaw...@gmail.com>
Date: Sat, 26 Jul 2008 12:02:48 -0700 (PDT)
Local: Sat, Jul 26 2008 3:02 pm
Subject: read ahead or before
I have been trying to do this instead of placing everything in a hash/
array and compare in the END block.

For example, if I have a file like this

111
2222
333
333
4445
3434

Notice there is a duplicate "333". How can I test if the next line is
the same as the current line? I suppose I can use getline() but is
there another clever way of achieving this?

Also, how can I check for previous line?

TIA


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
pk  
View profile
 More options Jul 26, 3:14 pm
Newsgroups: comp.lang.awk
From: pk <p...@pk.invalid>
Date: Sat, 26 Jul 2008 21:14:22 +0200
Local: Sat, Jul 26 2008 3:14 pm
Subject: Re: read ahead or before
On Saturday 26 July 2008 21:02, Mag Gam wrote:

> I have been trying to do this instead of placing everything in a hash/
> array and compare in the END block.

> For example, if I have a file like this

> 111
> 2222
> 333
> 333
> 4445
> 3434

> Notice there is a duplicate "333". How can I test if the next line is
> the same as the current line? I suppose I can use getline() but is
> there another clever way of achieving this?

I don't know if that can be considered more clever, however you can just
save the value of the previous line:

awk '{if ($0==prev) { # ... this line is the same as previous line }
      prev=$0}' file

What are you trying to do? What's the underlying problem?

If you just want to remove duplicates, you can do

awk '!a[$0]++' file

> Also, how can I check for previous line?

See above.

--
All the commands are tested with bash and GNU tools, so they may use
nonstandard features. I try to mention when something is nonstandard (if
I'm aware of that), but I may miss something. Corrections are welcome.


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mag Gam  
View profile
 More options Jul 27, 10:05 am
Newsgroups: comp.lang.awk
From: Mag Gam <magaw...@gmail.com>
Date: Sun, 27 Jul 2008 07:05:42 -0700 (PDT)
Local: Sun, Jul 27 2008 10:05 am
Subject: Re: read ahead or before
Thanks for the response.

The underlying problem is, the file is huge; its close to 15g and I
would like to compare.

What I am trying to do is, compare the current line to the next like
(or vice versa, 2nd line to 1st first).

With the hash solution, I was able to get the answer. However my
sysadmin is complaining I am taking up too much memory.

On Jul 26, 3:14 pm, pk <p...@pk.invalid> wrote:


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Janis Papanagnou  
View profile
 More options Jul 27, 12:11 pm
Newsgroups: comp.lang.awk
From: Janis Papanagnou <Janis_Papanag...@hotmail.com>
Date: Sun, 27 Jul 2008 18:11:17 +0200
Local: Sun, Jul 27 2008 12:11 pm
Subject: Re: read ahead or before

Mag Gam wrote:
> Thanks for the response.

[Please don't top-post!]

> The underlying problem is, the file is huge; its close to 15g and I
> would like to compare.

(Never measured files in gram, so I can't help you here.)

> What I am trying to do is, compare the current line to the next like
> (or vice versa, 2nd line to 1st first).

Have you tried pk's proposal? - Which solves what you've asked for.

> With the hash solution, I was able to get the answer. However my
> sysadmin is complaining I am taking up too much memory.

You already told us that your own hash solution doesn't fit your
needs. So just use pk's solution. What's the problem?

Janis


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
loki harfagr  
View profile
 More options Jul 27, 1:48 pm
Newsgroups: comp.lang.awk
From: loki harfagr <l...@theDarkDesign.free.fr>
Date: 27 Jul 2008 17:48:06 GMT
Local: Sun, Jul 27 2008 1:48 pm
Subject: Re: read ahead or before

On Sun, 27 Jul 2008 18:11:17 +0200, Janis Papanagnou wrote:
> Mag Gam wrote:
>> Thanks for the response.

> [Please don't top-post!]

>> The underlying problem is, the file is huge; its close to 15g and I
>> would like to compare.

> (Never measured files in gram, so I can't help you here.)

 Ah Janis, the poor OP wasn't meaning grams but gravitational levels
and under 15 g that's certainly difficult to cure any file ~;O)

>> What I am trying to do is, compare the current line to the next like
>> (or vice versa, 2nd line to 1st first).

> Have you tried pk's proposal? - Which solves what you've asked for.

>> With the hash solution, I was able to get the answer. However my
>> sysadmin is complaining I am taking up too much memory.

> You already told us that your own hash solution doesn't fit your needs.
> So just use pk's solution. What's the problem?

 I suspect pk's solution (though very good) may, in the OP case,
still consume a lot of memory in the a[] buffer if by 'chance'
the input overgravitated file has a lot of different lines ;-)

 If that's the point, I propose here a possible way to
drastically reduce the memory usage, certainly not the
golf contest winner of the month but quite close to list
in obfuscating style samples ;D)
 Anyway:

$ awk '{n++;n%=1;a[n]=a[n+1];a[n+1]=$0;if(a[n+1]==a[n]){print "Mind the gap";next}}1'

 that way if the OP sysadmin has a problem with mem usage that'd leave us
with a few hypothesis, the server might upgrade from Z80-MSX-16KB towards
power machines like AtariST520 or even a PDP-20, or the sysadmin has to
be seen as human and may need to have some vacation time (like this week I
just had ;-) or maybe the file has extremely looong records...

(to OP: Replace the ``Londoner'' message by whatever you need, other msg or action)

--
have space suit : "VMSBUX:B...@GOHH.GO"
will travel : tr "MLKJHGFDSQNBVCXWPOIUYTREZA"  "a-z"

    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ted Davis  
View profile
 More options Jul 27, 4:01 pm
Newsgroups: comp.lang.awk
From: Ted Davis <tda...@umr.edu>
Date: Sun, 27 Jul 2008 15:01:31 -0500
Local: Sun, Jul 27 2008 4:01 pm
Subject: Re: read ahead or before

Functionally, this is the same as PK's suggestion, it's just written out
in a fuller (C-like), and hopefully, clearer, form - since you didn't say
what you want to do with the lines after suppressing adjacent duplicates,
I wrote it to print the non-duplicate lines as it encounters them.  This
should not be sensitive to the file size because it stores only one line
at a time.

  {
        if( $0 != Prev ) print $0
        Prev = $0
  }

In minimalist awk format, that's
  $0 != Prev {print}
  {Prev = $0}

As a command line program that could be (minimalist format)

  awk '$0!=Prev{print}{Prev=$0}' source > target

(tested under Fedora and XP (as a script file - all variations tested
under Linux) with your sample data)

BTW, "gigabytes" is usually abbreviated GB (Gb would be "gigabits").
Abbreviations for SI prefixes for units larger than kilo are all upper
case - all those smaller than mega are in lower case - the full prefixes
are in lower case unless the language requires initial capitals (k and K
have an unofficial byte/bit context usage: k = 1000; K = 1024).

--

T.E.D. (tda...@mst.edu) MST (Missouri University of Science and Technology)
used to be UMR (University of Missouri - Rolla).
.


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Janis Papanagnou  
View profile
 More options Jul 27, 5:08 pm
Newsgroups: comp.lang.awk
From: Janis Papanagnou <Janis_Papanag...@hotmail.com>
Date: Sun, 27 Jul 2008 23:08:17 +0200
Local: Sun, Jul 27 2008 5:08 pm
Subject: Re: read ahead or before

:-)   Frankly, I wasn't sure whether he could have meant gravity ;-)

Oh, I meant his first proposal, the one without a[]...

   awk '{if ($0==prev) { # ... this line is the same as previous line }
        prev=$0}' file

Janis


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Janis Papanagnou  
View profile
 More options Jul 27, 5:40 pm
Newsgroups: comp.lang.awk
From: Janis Papanagnou <Janis_Papanag...@hotmail.com>
Date: Sun, 27 Jul 2008 23:40:48 +0200
Local: Sun, Jul 27 2008 5:40 pm
Subject: Re: read ahead or before

If we're going to go minimalist, maybe even...

     awk '$0!=prev;{prev=$0}' source > target

Janis


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sashi  
View profile
 More options Jul 30, 9:31 pm
Newsgroups: comp.lang.awk
From: Sashi <small...@gmail.com>
Date: Wed, 30 Jul 2008 18:31:41 -0700 (PDT)
Local: Wed, Jul 30 2008 9:31 pm
Subject: Re: read ahead or before

> If you just want to remove duplicates, you can do
> awk '!a[$0]++' file

Typical wizardry in awk.
Can someone please explain why/how this works?

Thanks,
Sashi


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Grant  
View profile
 More options Jul 30, 10:18 pm
Newsgroups: comp.lang.awk
From: Grant <g_r_a_n...@dodo.com.au>
Date: Thu, 31 Jul 2008 12:18:09 +1000
Local: Wed, Jul 30 2008 10:18 pm
Subject: Re: read ahead or before

On Wed, 30 Jul 2008 18:31:41 -0700 (PDT), Sashi <small...@gmail.com> wrote:
>> If you just want to remove duplicates, you can do
>> awk '!a[$0]++' file

>Typical wizardry in awk.
>Can someone please explain why/how this works?

awk '(!$0 in a) {       # if not seen
        a[$0]++         # add $0 to seen list a[]
        print           # and print $0

}' file

Grant.
--
http://bugsplatter.mine.nu/

    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ed Morton  
View profile
 More options Jul 31, 3:17 am
Newsgroups: comp.lang.awk
From: Ed Morton <mor...@lsupcaemnt.com>
Date: Thu, 31 Jul 2008 02:17:45 -0500
Local: Thurs, Jul 31 2008 3:17 am
Subject: Re: read ahead or before