Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Reading an irregular ASCII file and extract the data to two different variables

12 views

Skip to first unread message

Madhavan Bomidi

unread,

Jul 11, 2013, 11:50:10 AM7/11/13

Hello everyone,

Firstly, I inform you all that I am a linux user but for data analysis I use either MATLAB/ IDL. Now I have an ASCII file with more than 6-7 million lines in irregular fashion. I have written MATLAB & IDL scripts for processing the same. I have several thousand files to process. I tried to run a single file with IDL code on our server and it took 2 days to create the output I desired. Someone suggested me that with linux shell script the computation can be much faster.

The input file looks like this:
...
...
...
...
385 827 209 282 56 981 # RECORD1
485 832 209 281 56 983
585 832 209 282 56 981
685 832 210 282 57 982
785 830 210 282 56 983
885 832 210 281 56 983
20130402,170911 0 $GPRMC,170911.000,A,5055.1998,N,00624.5027,E,0.01,0.00,020413,,,A*68 # RECORD2
985 832 210 281 56 983
85 832 210 282 56 982
185 832 210 282 56 982
285 832 210 282 57 983
385 833 209 281 56 981
485 832 209 281 56 983
585 832 210 282 56 981
685 832 210 283 56 983
785 831 210 282 57 983
885 832 210 281 57 983
20130402,170912 0 $GPRMC,170912.000,A,5055.1998,N,00624.5027,E,0.01,0.00,020413,,,A*6B
985 830 209 281 57 983
85 832 210 282 56 981
185 831 210 283 56 983
285 832 210 282 56 983
385 832 210 281 57 983
20130402,203256 0 $GPRMC,203256.000,A,5055.2011,N,00624.5033,E,0.00,0.00,020413,,,A*68
487 789 170 412 0 928
587 793 169 412 0 931
687 791 170 411 0 928
787 793 169 410 0 929
887 794 170 412 0 929
987 793 170 412 0 930
87 792 169 412 0 931
187 793 169 412 0 931
287 793 169 410 0 929
387 794 169 411 0 928
487 794 170 412 0 929
...
...
...

I wanted to open each single file in sequence and read line-by-line from starting and pipe the data to RECORD1 and RECORD2 respectively by checking the number of words for RECORD1 (=6) and the total string length for RECORD2 (=80).
From RECORD2, I want 4 quantities: (as seen from the example above)
20130402 170911 5055.1998 00624.5027
For RECORD1, I want average of 5th element above 5 lines & below 5 lines spanning the RECORD2 and finally extract the 5th element (xxx).
This now makes ...
20130402 170911 5055.1998 00624.5027 xxx
Now I want to write the individual day files basing on the first element (i.e., '20130402.dat'). There can be other following days.

I started making the bash script as below:
------------------------------------
#!/bin/bash

# Read the files sequentially
for file in *.bin; do
FILENAME=$file;
echo "Processing $FILENAME ..."

# To read line-by-line of a file
kount=0
RECORD1=()
RECORD2=()
while read line; do
let kount++
data=$line
if [`echo ${#data}`=="80"]; then
#echo "$kount $line"
RECORD2=`echo "$kount $data"`
elif[`echo $data | wc -w`=="6"] then
RECORD1=`echo "$kount $data"`
fi
done < $FILENAME

echo kount

done

----------------------------------------

I know I am wrong in between while piping the data to RECORD1 & RECORD2. Can anyone suggest me how I shall proceed here. I tried searching many commands like grep, sed, awk, but no clue to correct my mistake.

Appreciate your help in this regard,
Thanks in advance

Joe Rosevear

unread,

Aug 7, 2013, 3:07:47 AM8/7/13

I can't tell from your description exactly what you need.

But it seems to me that this would be a good application for a little C
code. Then write a script or scripts to run the C code.

The C code would extract the data from a single file, and the script(s)
would iterate over your several thousand files. Or something like
that.

Or... Instead of C code, the extraction might be do-able from a
script. I'm not much of an awk or perl user, but those tools are
available from a script and might do the job. Or maybe a simple
application of grep. Perhaps this is what you were wanting. Sorry I
can't help more directly. Your problem description is too vague.

Would you like to tell a little more about the problem?

-Joe

--
http://JosephRosevear.com
http://RosevearSoftware.com

0 new messages