Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Tool for sequence searches
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  10 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Guillaume Dargaud  
View profile  
 More options Jun 5 2012, 6:01 am
Newsgroups: comp.unix.shell
From: Guillaume Dargaud <use_the_contact_f...@www.gdargaud.net>
Date: Tue, 05 Jun 2012 12:01:59 +0200
Local: Tues, Jun 5 2012 6:01 am
Subject: Tool for sequence searches
Hello all,
what generic command line tools can I use when dealing with searching
sequences missteps ?
Let me explain better with some examples. Say I have a simple sequence of
numbers and some may be missing:
1
2
3
6
7
8
11
...
I'd like to obtain the gap limits, so here 3, 6, 8, 11...

My first thought is to do a loop in bash that inc a number and simply check
if it's present in the expected position, but I'm sure you guys can come up
with a better way.

Note that the real problem is more complicated (I should also detect
repetitions) so I'm open to various suggestions.

--
Guillaume Dargaud
http://www.gdargaud.net/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Janis Papanagnou  
View profile  
 More options Jun 5 2012, 6:46 am
Newsgroups: comp.unix.shell
From: Janis Papanagnou <janis_papanag...@hotmail.com>
Date: Tue, 05 Jun 2012 13:46:53 +0300
Local: Tues, Jun 5 2012 6:46 am
Subject: Re: Tool for sequence searches
Am 05.06.2012 13:01, schrieb Guillaume Dargaud:

> Hello all,
> what generic command line tools can I use when dealing with searching
> sequences missteps ?
> Let me explain better with some examples. Say I have a simple sequence of
> numbers and some may be missing:
> 1
> 2
> 3
> 6
> 7
> 8
> 11
> ...
> I'd like to obtain the gap limits, so here 3, 6, 8, 11...

Here's one solution (where I output the corresponding two bounds
in one line which I think is clearer, but you may also write two
print statements instead)...

   awk '++c!=$1 {print c-1,$1; c=$1}'

> My first thought is to do a loop in bash that inc a number and simply check
> if it's present in the expected position, but I'm sure you guys can come up
> with a better way.

> Note that the real problem is more complicated (I should also detect
> repetitions) so I'm open to various suggestions.

The program that I suggested will show repetitions as lines with
two identical values, so an extension is straightforward, e.g.

   awk '++c!=$1 {print (c-1!=$1 ? "gap:" : "rep:"),c-1,$1; c=$1}'

Janis


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Guillaume Dargaud  
View profile  
 More options Jun 5 2012, 8:33 am
Newsgroups: comp.unix.shell
From: Guillaume Dargaud <use_the_contact_f...@www.gdargaud.net>
Date: Tue, 05 Jun 2012 14:33:37 +0200
Local: Tues, Jun 5 2012 8:33 am
Subject: Re: Tool for sequence searches

> The program that I suggested will show repetitions as lines with
> two identical values, so an extension is straightforward, e.g.

Thanks. I always have a hard time understanding awk scripts but I'll look at
it better.

For simple repetitions, I usually use 'uniq -d', but in my case the lines
can repeat like this:
SomeRandomChars-123-SomeOtherChars
SomeOtherRandomChars-123-SomeOtherChars
So I can't simply use 'uniq'.

I'll see if I can adapt your awk script. Thanks
--
Guillaume Dargaud
http://www.gdargaud.net/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Janis Papanagnou  
View profile  
 More options Jun 5 2012, 8:49 am
Newsgroups: comp.unix.shell
From: Janis Papanagnou <janis_papanag...@hotmail.com>
Date: Tue, 05 Jun 2012 15:49:24 +0300
Local: Tues, Jun 5 2012 8:49 am
Subject: Re: Tool for sequence searches
Am 05.06.2012 15:33, schrieb Guillaume Dargaud:

>> The program that I suggested will show repetitions as lines with
>> two identical values, so an extension is straightforward, e.g.

> Thanks. I always have a hard time understanding awk scripts but I'll look at
> it better.

If you have concrete questions, feel free to ask.

> For simple repetitions, I usually use 'uniq -d', but in my case the lines
> can repeat like this:
> SomeRandomChars-123-SomeOtherChars
> SomeOtherRandomChars-123-SomeOtherChars
> So I can't simply use 'uniq'.

IIUC, repetitions are defined by the suffix?
Then the awk program could be extended in two steps; define the FS="-"
so that you can access the three parts individually as $1, $2, and $3,
and then compare only the relevant parts of the whole line, in your case
probably the concatenation  $2 $3  (or maybe only the mid part  $2 ?).

Janis


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ed Morton  
View profile  
 More options Jun 5 2012, 11:05 am
Newsgroups: comp.unix.shell
From: Ed Morton <mortons...@gmail.com>
Date: Tue, 05 Jun 2012 10:05:08 -0500
Local: Tues, Jun 5 2012 11:05 am
Subject: Re: Tool for sequence searches
On 6/5/2012 5:01 AM, Guillaume Dargaud wrote:

If you want a real solution, post the real problem. Often the solutions to
simple problems aren't extensible to more complex problems and then everyone
gets frustrated that you didn't just post the real problem in the first place
and so wasted everyone's time helping you solve a non-existent problem.

      Ed.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Thomas 'PointedEars' Lahn  
View profile  
 More options Jun 5 2012, 2:54 pm
Newsgroups: comp.unix.shell
Followup-To: comp.unix.shell
From: Thomas 'PointedEars' Lahn <PointedE...@web.de>
Date: Tue, 05 Jun 2012 20:54:20 +0200
Local: Tues, Jun 5 2012 2:54 pm
Subject: Re: Tool for sequence searches

Guillaume Dargaud wrote:
> what generic command line tools can I use when dealing with searching
> sequences missteps ?

For example the bash debugger.

--
PointedEars

Please do not Cc: me. / Bitte keine Kopien per E-Mail.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
bsh  
View profile  
 More options Jun 5 2012, 10:50 pm
Newsgroups: comp.unix.shell
From: bsh <brian_hi...@rocketmail.com>
Date: Tue, 5 Jun 2012 19:50:32 -0700 (PDT)
Local: Tues, Jun 5 2012 10:50 pm
Subject: Re: Tool for sequence searches
On Jun 5, 3:01 am, Guillaume Dargaud

<use_the_contact_f...@www.gdargaud.net> wrote:
> What generic command line tools can I use when dealing with searching
> sequences missteps ?
> ...
> My first thought is to do a loop in bash that inc a number and simply check
> if it's present in the expected position, but I'm sure you guys can come up
> with a better way.

Does the following help? It finds the _first_ missing digit of the
sequence, not all of them, but perhaps it can be inserted into a
loop to iteratively capture all of them in turn, after processing.

http://groups.google.com/group/comp.unix.shell/browse_thread/thread/d...

=Brian


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ed Morton  
View profile  
 More options Jun 6 2012, 12:40 pm
Newsgroups: comp.unix.shell
From: "Ed Morton" <mortons...@gmail.com>
Date: Wed, 06 Jun 2012 16:40:54 GMT
Local: Wed, Jun 6 2012 12:40 pm
Subject: Re: Tool for sequence searches

http://groups.google.com/group/comp.unix.shell/browse_thread/thread/d...

> =Brian

Seems kinda wordy compared to:

   awk '$0 != ++expected{print expected; exit}'

Regards,

   Ed.

Posted using www.webuse.net


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
bsh  
View profile  
 More options Jun 6 2012, 8:03 pm
Newsgroups: comp.unix.shell
From: bsh <brian_hi...@rocketmail.com>
Date: Wed, 6 Jun 2012 17:03:42 -0700 (PDT)
Local: Wed, Jun 6 2012 8:03 pm
Subject: Re: Tool for sequence searches
On Jun 6, 9:40 am, "Ed Morton" <mortons...@gmail.com> wrote:

> bsh <brian_hi...@rocketmail.com> wrote:
> > On Jun 5, 3:01 am, Guillaume Dargaud
> > <use_the_contact_f...@www.gdargaud.net> wrote:
> http://groups.google.com/group/comp.unix.shell/browse_thread/thread/d...
> Seems kinda wordy compared to:
> awk '$0 != ++expected{print expected; exit}'

Hmmm. I anticipated this comment, but frankly, I thought
that it had been sufficiently hashed out several years ago....

Wordy? Well, yes and no. Reminds me of the amusing
programmer humor that purports to program "Hello World"
as a newby, intermediate, graduate student, and professional
programmer. As the latter, his "Hello World" is a 200-line
C++ class....

Scriptarians will presumably preternaturally elevate the
criterion of _character count_ to quantize elegance, ur,
wordiness. Well, suum cuique pulchrum est....

(I would have myself made an argument around the criterion
of elegance, not wordiness).

And inasmuch as the "wordiness" is very much there, but
well hidden by use of the abstraction of the VHLL awk
interpreter... _and_ my code is two orders of magnitude more
efficient for small data samples, my solution cannot not be
no not-small exercise in not-wordiness, now can it?

(Or I could have just been snarky, and have said, "Take it to
comp.lang.awk!" -- but I wouldn't do that: the one-liner _is_
kinda nice, but then I knew that, my provided solution coming
after having seen that one).

=Brian


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ed Morton  
View profile  
 More options Jun 7 2012, 2:50 am
Newsgroups: comp.unix.shell
From: Ed Morton <mortons...@gmail.com>
Date: Thu, 07 Jun 2012 01:50:41 -0500
Local: Thurs, Jun 7 2012 2:50 am
Subject: Re: Tool for sequence searches
On 6/6/2012 7:03 PM, bsh wrote:

...and he is fired for not getting his product to market before Microsoft and
his job is outsourced to a newby who will do it next time in one line and a
fraction of the development interval. I get what you're trying to say but
spending the time to write the most robust, extensible, efficient code possible
isn't always the best idea.

So you knew the OP could output the first missing number with a trivial
one-liner and you suggested he do it with a fairly lengthy script instead. Would
you mind sharing why that'd be the preferred approach?

      Ed.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »