Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

trying to sort by the X axis value which is variable "col3"

71 views
Skip to first unread message

Rabe

unread,
Dec 3, 2018, 1:30:57 AM12/3/18
to
The script below converts a .str file (string file) into a .csv file. data entry is via guido form. the essence of the script is copied below.
what I would like to achieve is to be able to have the output csv sorted by the X axis information which is stored as variable "col3". How do I go about it? Much appreciated.


#Open a string file to read.
set readFile [open "$file" "r"]

#Create and open a new file for writing the selected information to.
set writeFile [open "$outfile$ext" "w"]
#Create headings for your columns in the resultant string file.
puts $writeFile "StringNo, Y, X, Z, Au, col6, col7, col8,"

#Start a loop to get each line of data in the file.
while { ![eof $readFile]} {

#Get the next line of data.
set data [gets $readFile]
#If there are any commas in the text file, split the file at every comma and place data into a TCL list ie turns this: abcd,efg into this: {abcd} {efg} effectively removing any commas
set data1 [split $data ","]
#Join the TCL back together into a single text string. (But this time there won't be any commas) Removing the commas is important, otherwise the macro will split the file at the comma as well as the range
if { [lindex $data1 0] !=0 } {
set rawdata [join $data1]
#This is the if elseif statement that will dictate how many columns to use.
if {"$use1" == "yes" && "$use2" == "no" } {
set col1 [string range $rawdata $colsetupstart1 $colsetupend1]
puts $writeFile "$col1,"
} elseif {"$use1" == "yes" && "$use2" == "yes" && "$use3" == "no"} {
set col1 [string range $rawdata $colsetupstart1 $colsetupend1]
set col2 [string range $rawdata $colsetupstart2 $colsetupend2]
puts $writeFile "$col1, $col2,"
} elseif {"$use1" == "yes" && "$use2" == "yes" && "$use3" == "yes" && "$use4" == "no"} {
set col1 [string range $rawdata $colsetupstart1 $colsetupend1]
set col2 [string range $rawdata $colsetupstart2 $colsetupend2]
set col3 [string range $rawdata $colsetupstart3 $colsetupend3]
puts $writeFile "$col1, $col2, $col3,"
} elseif {"$use1" == "yes" && "$use2" == "yes" && "$use3" == "yes" && "$use4" == "yes" && "$use5" == "no"} {
set col1 [string range $rawdata $colsetupstart1 $colsetupend1]
set col2 [string range $rawdata $colsetupstart2 $colsetupend2]
set col3 [string range $rawdata $colsetupstart3 $colsetupend3]
set col4 [string range $rawdata $colsetupstart4 $colsetupend4]
puts $writeFile "$col1, $col2, $col3, $col4,"
} elseif {"$use1" == "yes" && "$use2" == "yes" && "$use3" == "yes" && "$use4" == "yes" && "$use5" == "yes" && "$use6" == "no"} {
set col1 [string range $rawdata $colsetupstart1 $colsetupend1]
set col2 [string range $rawdata $colsetupstart2 $colsetupend2]
set col3 [string range $rawdata $colsetupstart3 $colsetupend3]
set col4 [string range $rawdata $colsetupstart4 $colsetupend4]
set col5 [string range $rawdata $colsetupstart5 $colsetupend5]
puts $writeFile "$col1, $col2, $col3, $col4, $col5,"
} elseif {"$use1" == "yes" && "$use2" == "yes" && "$use3" == "yes" && "$use4" == "yes" && "$use5" == "yes" && "$use6" == "yes" && "$use7" == "no"} {
set col1 [string range $rawdata $colsetupstart1 $colsetupend1]
set col2 [string range $rawdata $colsetupstart2 $colsetupend2]
set col3 [string range $rawdata $colsetupstart3 $colsetupend3]
set col4 [string range $rawdata $colsetupstart4 $colsetupend4]
set col5 [string range $rawdata $colsetupstart5 $colsetupend5]
set col6 [string range $rawdata $colsetupstart6 $colsetupend6]
puts $writeFile "$col1, $col2, $col3, $col4, $col5, $col6,"
} elseif {"$use1" == "yes" && "$use2" == "yes" && "$use3" == "yes" && "$use4" == "yes" && "$use5" == "yes" && "$use6" == "yes" && "$use7" == "yes" && "$use8" == "no"} {
set col1 [string range $rawdata $colsetupstart1 $colsetupend1]
set col2 [string range $rawdata $colsetupstart2 $colsetupend2]
set col3 [string range $rawdata $colsetupstart3 $colsetupend3]
set col4 [string range $rawdata $colsetupstart4 $colsetupend4]
set col5 [string range $rawdata $colsetupstart5 $colsetupend5]
set col6 [string range $rawdata $colsetupstart6 $colsetupend6]
set col7 [string range $rawdata $colsetupstart7 $colsetupend7]
puts $writeFile "$col1, $col2, $col3, $col4, $col5, $col6, $col7,"
} elseif {"$use1" == "yes" && "$use2" == "yes" && "$use3" == "yes" && "$use4" == "yes" && "$use5" == "yes" && "$use6" == "yes" && "$use7" == "yes" && "$use8" == "yes"} {
set col1 [string range $rawdata $colsetupstart1 $colsetupend1]
set col2 [string range $rawdata $colsetupstart2 $colsetupend2]
set col3 [string range $rawdata $colsetupstart3 $colsetupend3]
set col4 [string range $rawdata $colsetupstart4 $colsetupend4]
set col5 [string range $rawdata $colsetupstart5 $colsetupend5]
set col6 [string range $rawdata $colsetupstart6 $colsetupend6]
set col7 [string range $rawdata $colsetupstart7 $colsetupend7]
set col8 [string range $rawdata $colsetupstart8 $colsetupend8]
puts $writeFile "$col1, $col2, $col3, $col4, $col5, $col6, $col7, $col8,"
} else {
puts "error"
break
}
}

# close the files.
close $readFile
close $writeFile

puts "Finished writing file $outfile$ext"

Ralf Fassel

unread,
Dec 3, 2018, 4:25:13 AM12/3/18
to
* Rabe <rindral...@gmail.com>
| The script below converts a .str file (string file) into a .csv file.
| data entry is via guido form. the essence of the script is copied
| below. what I would like to achieve is to be able to have the output
| csv sorted by the X axis information which is stored as variable
| "col3". How do I go about it? Much appreciated.

Seems obvious: if you want the output sorted, you need to collect it,
sort it, and only then write the output file. I.e., in your loop, do
not immediately write the output, but store it in a way so you can sort
it after the loop. A list comes to mind, considering that you can
specify -index to the lsort command.

Other than that, if this is more than a BFI-get-it-done-script, I'd
highly recommend using the csv package from tcllib

https://core.tcl.tk/tcllib/doc/trunk/embedded/www/tcllib/files/modules/csv/csv.html

for reading the comma-separated input and writing the output csv, it
takes care of all the quoting required.

Some more comments:

| #Start a loop to get each line of data in the file.
| while { ![eof $readFile]} {

The usual idiom here is

while {[gets $readFile data] >= 0} { ... }

since otherwise you'll get an extra line in the output for an empty
input file.

The splitting and joining looks highly suspicious to me (i.e. when are
the 'string range' indices recalculated for each line)? I would rather
operate on the fields returned by the split (or rather csv::split as
recommended above).

My EUR 0.01 & HTH
R'

two...@gmail.com

unread,
Dec 4, 2018, 12:15:07 AM12/4/18
to
> what I would like to achieve is to be able to have the output csv sorted by the X axis information which is stored as variable "col3". How do I go about it?

As Ralf says, to do it in tcl, you need to get all your data in memory first, then sort it.

But you didn't say how big your data is. Will it fit into memory? Can you make copies of it in memory, when splitting on the newlines. And you didn't mention what system this runs on.

If the data is too large to sort in memory, you could use a sort utility program such as *nix "sort". First write to a temp file in any way you want just so it's easy to get a sort program to do the work (data in fixed column positions would make it easy to sort - use say, [format] with formatters like %20s or %10d).

You could then have tcl [exec] the sort program. Then read the sorted data back it back and convert it to your chosen output format if the output of sort isn't already what you want.

If the data is of a small size (easily fits into memory) and you're going to read it all in anyway, then the following idiom is simpler than reading line by line. (I show no error handling here).

Look at the example in the [read] manual, which for you, might be

set readFile [open "$file" "r"]
set entireFile [read $readFile]
set lines [split $entireFile "\n"] ;# split file into lines at the newline

At this point, lines is a list of the lines in the file and you can likely just sort it, though you didn't really say what the .str file looks like. And you can use [foreach] to loop through all the data.



Rich

unread,
Dec 4, 2018, 6:51:08 AM12/4/18
to
two...@gmail.com wrote:
> set readFile [open "$file" "r"]
> set entireFile [read $readFile]
> set lines [split $entireFile "\n"] ;# split file into lines at the newline

If the file ends in a terminal newline, the above will create one empty
list element at the end of the 'lines' list:

$ rlwrap tclsh
% set lines [split line1\nline2\n \n]
line1 line2 {}
%

Note that final {} in the result.

One solution is to use the -nonewline switch to read, which will
discard a terminal newline when reading the file contents, and then
split will not create a final, empty, entry:

% set lines [split line1\nline2 \n]
line1 line2
%

This second version is, usually, what is actually desired when one
wants to split a file into 'lines'.

two...@gmail.com

unread,
Dec 4, 2018, 2:39:37 PM12/4/18
to
On Tuesday, December 4, 2018 at 3:51:08 AM UTC-8, Rich wrote:

> Note that final {} in the result.
>
> One solution is to use the -nonewline switch to read, which will
> discard a terminal newline when reading the file contents, and then
> split will not create a final, empty, entry:
>

Ahh, good tip. This explains why I always would have to use [string trimright] on the file to remove that last newline (and any other trailing whitespace too, not necessarily desired). Yours is better, hadn't noticed that argument.

rindral...@gmail.com

unread,
Dec 5, 2018, 1:09:06 AM12/5/18
to
More specific info regarding the above discussion:
here is the file I am trying to convert into csv using tcl. this file is generated by Surpac 3D modelling software.

high_grade,10-Mar-12,,ssi_styles:styles.ssi
0, 0.001, 0.000, 0.000, 0.000, 0.000, 0.000
30007, 25615.414, 11208.269, 100.000, 5.72
30007, 25579.883, 11132.600, 100.000, 2.28
30007, 25548.000, 11148.000, 100.000, 18.68
30007, 25690.136, 11241.821, 100.000, 1.76
30007, 25694.000, 11196.000, 100.000, 2.04
30007, 25583.000, 11197.000, 100.000, 1.68
30007, 25710.213, 11219.506, 100.000, 52.12
30007, 25613.817, 11278.570, 100.000, 2.44
30007, 25587.201, 11250.446, 100.000, 2.28
30007, 25665.850, 11281.853, 100.000, 10.88
30007, 25660.306, 11232.379, 100.000, 3.76
30007, 25552.242, 11233.201, 100.000, 4.32
30007, 25629.431, 11093.991, 100.000, 2.52
30007, 25638.027, 11133.428, 100.000, 2.56
30007, 25677.410, 11121.594, 100.000, 4.6
30007, 25617.907, 11154.793, 100.000, 1.96
30007, 25698.000, 11149.000, 100.000, 2.64
30007, 25639.000, 11195.000, 100.000, 112.96
30007, 25662.730, 11165.327, 100.000, 1.24
30007, 25637.413, 11256.870, 100.000, 2.88
0, 0.000, 0.000, 0.000,
0, 0.000, 0.000, 0.000, END

The data starts from line #3. Workflow is to read this file, then sort the data from line #3 by the info in the index position "2" if data is split by "," (11208.269), then write data into a csv.
Obvioulsy to do this manually is the easiest way not through a tcl. But the need to write a tcl is to automate the process.

I re-did the script as below:

set readFile [open "high_grade.str" "r"]
set data [read $readFile]
set data1 [split $data ","]

set sorted_data [lsort -real -index 2 $data1]

….which didn't work



Rich

unread,
Dec 5, 2018, 1:36:09 AM12/5/18
to
> ?.which didn't work

Of course it did not work. You simply split by commas, which destroyed
the "line by line" nature of your data. You actually have a "matrix"
above (i.e., a table). Each line is a 'record' within the table, and
each line is itself a CSV. So first, you need to split by lines, then
for each line individually, you split by commas.

set readFile [open high_grade.str]

# read in file - but drop trailing newline
set data [read -nonewline $readFile]

# process each line individually - but only for lines 3+
foreach line [lrange [split $data \n] 2 end] {
lappend matrix [split $line ,]
}

# now sort the matrix that has been built up (a list of lists)
set sorted_matrix [lsort -real -index 2 $matrix]

# and, if you now want a CSV output, just use the Tcllib CSV module.
# This does require installing tcllib, but you should do that anyway
package require csv
puts [csv::joinlist $sorted_matrix]

And, if you do install Tcllib, you could do almost everything above
with the CSV and struct::queue modules:

package require csv
package require struct::queue

# create a 'queue' structure
struct::queue q

set readFile [open high_grade.str]

# read in the CSV data
csv::read2queue $readFile q

# sort rows 3+ by the third column and output a new CSV file
puts [csv::joinlist [lsort -real -index 2 [lrange [q get [q size]] 2 end]]]

q destroy ;# not required if the script would terminate here anyway

two...@gmail.com

unread,
Dec 5, 2018, 3:25:13 AM12/5/18
to
On Tuesday, December 4, 2018 at 10:09:06 PM UTC-8, rindral...@gmail.com wrote:

> 30007, 25662.730, 11165.327, 100.000, 1.24
> 30007, 25637.413, 11256.870, 100.000, 2.88
> 0, 0.000, 0.000, 0.000,
> 0, 0.000, 0.000, 0.000, END
>
> The data starts from line #3.
>

Looking at this data, if seems that at least the last line is not actual data, and probably the next to last line also, since it is missing the 5th item. If they are not removed from the list to sort, they being 0's will sort to the top, and it seems that a line with the word END should remain at the end.

If I'm right here, then Rich's example, with

foreach line [lrange [split $data \n] 2 end] {

could eliminate the last line with end changed to end-1 or end-2 to trim both of them.

You didn't mention if the output would need to include these lines and the first 2 lines. If so, then you should save a list of lines as well, into another variable so you can retrieve them later.

So, here’s my take at it with a proc to format each output line from a list of line items, in case you might want to do it by hand. Or you could use the csv package as Rich suggests.

proc line2csv {line} {
set out {}
foreach item $line {
append out "$item,"
}
return [string range $out 0 end-1] ;# trim the trailing comma
}


# .... at this point $data has all of the file sans the last newline


set lines [split $data \n]
set range [lrange $lines 2 end-2]
# process each line individually - but only for lines 3+ but not last 2 lines
foreach line $range {
lappend matrix [split $line ,]
}

# now sort the matrix that has been built up (a list of lists)
set sorted_matrix [lsort -real -index 2 $matrix]

# now dump out the 2 header lines (in csv), the sorted matrix, and finally the last 2 lines, also csv

# of course, you might have to open an output channel and do [puts $outchan ….] or else
# redirect stdout to your file (e.g. from a shell using >file)

puts [lindex $lines 0]
puts [lindex $lines 1]
foreach line $sorted_matrix {
puts [line2csv $line]
}
puts [lindex $lines end-1]
puts [lindex $lines end]


0 new messages