Shelling

Occam's Razor

unread,

Jul 12, 2012, 8:52:41 PM7/12/12

to

how do you create a tab delimited file with several columns, with integers in a specific column (e.g., the 11th column)? how would this file be verified?

Message has been deleted

Stephane Chazelas

unread,

Jul 13, 2012, 3:05:51 AM7/13/12

to

2012-07-12 17:52:41 -0700, Occam's Razor:

> how do you create a tab delimited file with several columns,
> with integers in a specific column (e.g., the 11th column)?

printf '\t\t\t\t\t\t\t\t\t\t%d\t\t\n' 3 14 1 59 265 35 89 > file

> how would this file be verified?

What does that mean?

grep -Evx $'\t\t\t\t\t\t\t\t\t\t[0-9]+\t\t' file &&
echo >&2 'those lines did not have an integer in the 11th
columns or did not have 13 fields or the other fields were not
empty'

$'...' is ksh93 syntax also supported by zsh and bash.

The above implies that "089" is a valid integer and that " 1" is
not, you may need to adapt.

See also:

awk -F'\t' 'NF != 13 || $11 !~ /^[0-9]+$/' < file

--
Stephane

Occam's Razor

unread,

Jul 13, 2012, 5:04:41 PM7/13/12

to

X-No-Archive

Thanks Stephane.
Specifically, I have the following questions:

1. If you have a very large file, named 'ColCheck', tab-delimited, that you are asked to process and you know that each line in 'ColCheck' has 7 columns, and that the values in the 5th column are integers. Using shell functions (and standard LINUX/UNIX filters), how would you verify that these conditions were satisfied in 'ColCheck'?

2. In the same file, each value in column 1 is unique. How would you
verify that?

3. How would you write a shell function that counts the number of occurrences of the word “SpecStr” in the file 'ColCheckMe'.

4. How would you do this in PERL?

5. How would you log the output (regular output as well as error messages) of a UNIX program into a file?

Lem Novantotto

unread,

Jul 14, 2012, 5:42:12 AM7/14/12

to

Occam's Razor ha scritto:

> 1. If you have a very large file, named 'ColCheck', tab-delimited, that
> you are asked to process and you know that each line in 'ColCheck' has 7
> columns, and that the values in the 5th column are integers. Using shell
> functions (and standard LINUX/UNIX filters), how would you verify that
> these conditions were satisfied in 'ColCheck'?

These are for bash in linux, GNU grep.

To check that every line has exactly seven columns, we could check it has
exactly six TABs (that is: at least six TABs, but not seven).
Something like:

$ grep -vP '^([^\011]+\011){6}[^\011]+$' ColCheck &>/dev/null || echo OK

Besides, this above requires that no columns are empty.

Then, to check that everything in the fifth column is a digit:

$ [[ $(cut -f5 ColCheck |grep [^0-9]) ]] || echo OK

> 2. In the same file, each value in column 1 is unique. How would you
> verify that?

(( $(wc -l ColCheck |cut -d" " -f1) == $(cut -f1 ColCheck |sort -u |wc -l) )) && echo OK

> 3. How would you write a shell function that counts the number of
> occurrences of the word “SpecStr” in the file 'ColCheckMe'.

If I understand what you want to do:

CountOcc () { echo $(grep -o "SpecStr" ColCheckMe | wc -l) ; }

Then you can use it as a command, or recall its output with command
substitution this way: $(CountOcc)

> 4. How would you do this in PERL?

I don't remember. I give it up.

> 5. How would you log the output (regular output as well as error
> messages) of a UNIX program into a file?

$ unix_command >logfile 2>&1

or:

$ unix_command &>logfile

or (not recommended):

$ unix_command >&logfile
--
Bye, Lem
Ceterum censeo ISLAM esse delendum
_________________________________________________________________
Non sprecare i cicli idle della tua CPU, né quelli della tua GPU.
http://www.worldcommunitygrid.org/index.jsp
http://www.rnaworld.de/rnaworld/ http://home.edges-grid.eu/home/
http://www.gpugrid.net/

Janis Papanagnou

unread,

Jul 14, 2012, 7:29:37 AM7/14/12

to

On 13.07.2012 23:04, Occam's Razor wrote:
> X-No-Archive
>
> Thanks Stephane. Specifically, I have the following questions:
>
> 1. If you have a very large file, named 'ColCheck', tab-delimited, that you
> are asked to process and you know that each line in 'ColCheck' has 7
> columns, and that the values in the 5th column are integers. Using shell
> functions (and standard LINUX/UNIX filters), how would you verify that
> these conditions were satisfied in 'ColCheck'?

This will print the matching lines...

awk -F$'\t' 'NF==7 && $5~/^[0-9]+$/'

Negate it to find non-matching...

awk -F$'\t' '!(NF==7 && $5~/^[0-9]+$/)'

Return an error if you need to have the exit code on shell level...

awk -F$'\t' '!(NF==7 && $5~/^[0-9]+$/) {exit(1)}'

>
> 2. In the same file, each value in column 1 is unique. How would you verify
> that?

To check and get the error on shell level...

awk -F$'\t' '$1 in a {exit 1} {a[$1]}'

>
> 3. How would you write a shell function that counts the number of
> occurrences of the word “SpecStr” in the file 'ColCheckMe'.

I wouldn't do that on shell level, I'd probably use awk.
But you since "ColCheckMe" seems to be different from "ColCheck"
you first should give us some information about how the word
“SpecStr” should be matched; which are valid occurrences...?

SpecStr
SpecStr,
SpecStr1
xSpecStr
;SpecStr;

In other words; which are valid delimiters, and is a substring
match also valid.

>
> 4. How would you do this in PERL?

(Don't know.)

>
> 5. How would you log the output (regular output as well as error messages)
> of a UNIX program into a file?

a_unix_program >/some/path/to/file 2>&1

Janis

>

Thomas 'PointedEars' Lahn

unread,

Jul 14, 2012, 8:57:01 AM7/14/12

to

Janis Papanagnou wrote:

> On 13.07.2012 23:04, Occam's Razor wrote:
>> X-No-Archive

Figures.

>> [1.-4.]

>> 5. How would you log the output (regular output as well as error
>> messages) of a UNIX program into a file?
>
> a_unix_program >/some/path/to/file 2>&1

Do you realize that you just did their homework?

--
PointedEars

Please do not Cc: me. / Bitte keine Kopien per E-Mail.