regards,/iaw
still not obviois. readcsv does have a dispatch for a stream (good),
but I really need a popen function.
x=readcsv(open(`gzcat myfile.csv.gz`, "r"))
is wrong. x=run(`gzcat myfiles.csv.gz`) doesn't send the output to x
for further piping as far as I can see, so readcsv(x) doesn't do it.
Start running command asynchronously, and return a tuple (stream,process)
dear tim, lex, todd (&others): thanks for responding. I really want
to learn how to preprocess input from somewhere else into the
readcsv() function. it's a good starting exercise for me to learn how
to accomplish tasks in general. there is so much to learn. [I did
not experiment with GZip.jl --- modules are new to me, and this one is
not included. I could make too many errors in this process. It will
probably make the specific task easier.]
now, the first mistake which tripped me up for a while is that I did
not grasp the difference between a string and a command. that is, I
should not have used " for my command. I had needed to use `. this
is why open("echo hi") did not work, but open(`echo hi`) does.
x=open(`gzcat myfile.csv.gz`)
is a good start. I see it contains a tuple of a Pipe and a Process.
this is printed by default on the command line. I learned I can make
this work with
d=readcsv( x[1] )
but I have a whole bunch of new questions, beyond question now.
first, try this:
julia> x1=open(`gzcat d.csv.gz`)
(Pipe(closed, 35 bytes waiting),Process(`gzcat d.csv.gz`, ProcessExited(0)))
julia> x2=open(`gzcat d.csv.gz`)
(Pipe(active, 0 bytes waiting),Process(`gzcat d.csv.gz`, ProcessRunning))
how strange---the claims are different.
even stranger, the first
readcsv(x2[1]) is very slow now (I am talking 3 seconds on a 3 by 4
data file!); but following it with readcsv(x1[1]) is fast. I can't
imagine readcsv has intelligence built-in to cache past specific
conversions.
another strange definition from a novice perspective: close(x1) is
not defined. close(x1[1]) is.
julia is the first language I have
seen where a close(open("file")) is wrong.
this is esp surprising
because julia has the dispatch ability to understand what it could do
with a close(Pipe,Process) tuple.
another strange definition from a novice perspective: close(x1) is
not defined. close(x1[1]) is.close() is defined for a stream, not a tuple (stream, process).julia is the first language I have
seen where a close(open("file")) is wrong.
This is how I used GZip.jl in the tests for the MatrixMarket package
| data = GZip.open(fname) do g |
| readcsv(g) |
| end |
FWIW, I believe that there was concern that the behavior of open(process) might cause confusion when it was defined in this way. (A quick search didn't locate the issue.)
d <- read.csv(pipe("gzcat mygzippedfile.gz"))