Madhavan Bomidi
unread,Jul 11, 2013, 11:50:10 AM7/11/13You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to
Hello everyone,
Firstly, I inform you all that I am a linux user but for data analysis I use either MATLAB/ IDL. Now I have an ASCII file with more than 6-7 million lines in irregular fashion. I have written MATLAB & IDL scripts for processing the same. I have several thousand files to process. I tried to run a single file with IDL code on our server and it took 2 days to create the output I desired. Someone suggested me that with linux shell script the computation can be much faster.
The input file looks like this:
...
...
...
...
385 827 209 282 56 981 # RECORD1
485 832 209 281 56 983
585 832 209 282 56 981
685 832 210 282 57 982
785 830 210 282 56 983
885 832 210 281 56 983
20130402,170911 0 $GPRMC,170911.000,A,5055.1998,N,00624.5027,E,0.01,0.00,020413,,,A*68 # RECORD2
985 832 210 281 56 983
85 832 210 282 56 982
185 832 210 282 56 982
285 832 210 282 57 983
385 833 209 281 56 981
485 832 209 281 56 983
585 832 210 282 56 981
685 832 210 283 56 983
785 831 210 282 57 983
885 832 210 281 57 983
20130402,170912 0 $GPRMC,170912.000,A,5055.1998,N,00624.5027,E,0.01,0.00,020413,,,A*6B
985 830 209 281 57 983
85 832 210 282 56 981
185 831 210 283 56 983
285 832 210 282 56 983
385 832 210 281 57 983
20130402,203256 0 $GPRMC,203256.000,A,5055.2011,N,00624.5033,E,0.00,0.00,020413,,,A*68
487 789 170 412 0 928
587 793 169 412 0 931
687 791 170 411 0 928
787 793 169 410 0 929
887 794 170 412 0 929
987 793 170 412 0 930
87 792 169 412 0 931
187 793 169 412 0 931
287 793 169 410 0 929
387 794 169 411 0 928
487 794 170 412 0 929
...
...
...
I wanted to open each single file in sequence and read line-by-line from starting and pipe the data to RECORD1 and RECORD2 respectively by checking the number of words for RECORD1 (=6) and the total string length for RECORD2 (=80).
From RECORD2, I want 4 quantities: (as seen from the example above)
20130402 170911 5055.1998 00624.5027
For RECORD1, I want average of 5th element above 5 lines & below 5 lines spanning the RECORD2 and finally extract the 5th element (xxx).
This now makes ...
20130402 170911 5055.1998 00624.5027 xxx
Now I want to write the individual day files basing on the first element (i.e., '20130402.dat'). There can be other following days.
I started making the bash script as below:
------------------------------------
#!/bin/bash
# Read the files sequentially
for file in *.bin; do
FILENAME=$file;
echo "Processing $FILENAME ..."
# To read line-by-line of a file
kount=0
RECORD1=()
RECORD2=()
while read line; do
let kount++
data=$line
if [`echo ${#data}`=="80"]; then
#echo "$kount $line"
RECORD2=`echo "$kount $data"`
elif[`echo $data | wc -w`=="6"] then
RECORD1=`echo "$kount $data"`
fi
done < $FILENAME
echo kount
done
----------------------------------------
I know I am wrong in between while piping the data to RECORD1 & RECORD2. Can anyone suggest me how I shall proceed here. I tried searching many commands like grep, sed, awk, but no clue to correct my mistake.
Appreciate your help in this regard,
Thanks in advance