reading data file using fortran's D scientific notation

1,035 views
Skip to first unread message

Andrei Berceanu

unread,
Jul 18, 2014, 5:41:12 AM7/18/14
to julia...@googlegroups.com
Hi all,

I have a lot of datafiles containing numbers in Fortran's double precision notation (http://math.hawaii.edu/wordpress/fortran-3/#double), i.e. 1.23D-3, instead of the usual E scientific notation.
Is there a simple way to import the data as Float64?

Tnx!
Andrei

Ivar Nesje

unread,
Jul 18, 2014, 6:33:23 AM7/18/14
to julia...@googlegroups.com
If you have a reasonable editor, you should be able to open all your files and issue a global search and replace operation that changes D to E. If you have many files you can use sed to automate the process.

Andrei Berceanu

unread,
Jul 18, 2014, 6:51:32 AM7/18/14
to julia...@googlegroups.com
I would prefer to keep my original files intact.
In Python I can do

import string
import numpy as np

rule = string.maketrans('D', 'E')
data = np.loadtxt(fname, usecols=(2,3),\
                  converters = {2: lambda val: float(val.translate(rule)),\
                                3: lambda val: float(val.translate(rule))})

By the way, is there a Julia equivalent to numpy's loadtxt? Because the files are in 3-column format with multiple spaces separating the 3 entries on each line and eol chars to separate the lines.
Message has been deleted

Tomas Lycken

unread,
Jul 18, 2014, 7:01:55 AM7/18/14
to julia...@googlegroups.com

Although you could, probably, do this in pure Julia, is there a reason to not write a small preprocessing script using sed (or your OS’s equivalent) to create altered copies of the data before reading it? If HDD space is an issue, you could do this one file at a time using Julia’s run method, i.e. something like

dfile = "your-data-file.dat"
run(`copy_and_replace.sh $dfile`)
data = readdlm("$dfile.copy")
run(`rm $dfile.copy`)

where the shell script copy_and_replace.sh copies the raw data (with file name given in the first argument) into a new file with the string “.copy” appended to the name, and then replaces all occurences of D with E (or e, or whatever you need) using sed.

// T

Andrei Berceanu

unread,
Jul 18, 2014, 7:12:32 AM7/18/14
to julia...@googlegroups.com
First, thank you both for your time.

Space is an issue, yes, but I agree, I can process them one by one using some sed scripting. I just thought there is a simple idiom corresponding to Python's 2-liner above.
In fact, I am wondering, how difficult would it be to make julia accept the Fortran double precision format natively - is that a big change in Base?

On a separate issue, is there an equivalent to numpy's loadtxt?

ps: Tomas, how do you get the nice code-boxes?

Mauro

unread,
Jul 18, 2014, 7:57:00 AM7/18/14
to julia...@googlegroups.com
> Space is an issue, yes, but I agree, I can process them one by one using
> some sed scripting. I just thought there is a simple idiom corresponding to
> Python's 2-liner above.
> In fact, I am wondering, how difficult would it be to make julia accept the
> Fortran double precision format natively - is that a big change in Base?

I had a look: base/datafmt.jl does the file reading but it is quite
cryptic and I didn't quite figure out where the conversion from string
to float occurs. But probably it's done with the float64 function in
base/string.jl which calls into C: src/builtins.c function jl_strtod.
So, if my digging is right then it's not so easy to change and would
change how strings are parsed into floats everywhere.

Thus probably easiest to write a function which does the parsing.

> On a separate issue, is there an equivalent to numpy's *loadtxt*?

readdlm or readcsv do this. How did you do it?

Message has been deleted

Andrei Berceanu

unread,
Jul 18, 2014, 8:17:48 AM7/18/14
to julia...@googlegroups.com
Here are a few lines from one of my files, after sed preprocessing:

  -70.0000000000000       -70.0000000000000       3.098203380460164E-010
  -69.4531250000000       -70.0000000000000       2.548160684589544E-010
  -68.9062500000000       -70.0000000000000       2.234061987906998E-010

There are 2 spaces at the start of each line and then the column are separated by spaces as well.
I tried

readdlm(pumppath, '  ', Float64, '\n')

and get

file entry "" cannot be converted to Float64

Mauro

unread,
Jul 18, 2014, 8:35:18 AM7/18/14
to julia...@googlegroups.com
on julia0.3 this works:

julia> readdlm("fl")
3x3 Array{Float64,2}:
-70.0 -70.0 3.0982e-10
-69.4531 -70.0 2.54816e-10
-68.9063 -70.0 2.23406e-10

julia> readdlm("fl", Float64)
3x3 Array{Float64,2}:
-70.0 -70.0 3.0982e-10
-69.4531 -70.0 2.54816e-10
-68.9063 -70.0 2.23406e-10



On Fri, 2014-07-18 at 13:15, Andrei Berceanu <andreib...@gmail.com> wrote:
> Here is 1 line from one of my files, after sed-magic:
>
> -70.0000000000000 -70.0000000000000 3.098203380460164E-010
> -69.4531250000000 -70.0000000000000 2.548160684589544E-010
> -68.9062500000000 -70.0000000000000 2.234061987906998E-010
>
> There are 2 spaces at the start of each line and then the column are
> separated by spaces as well.
> I tried
>
> readdlm(pumppath, ' ', Float64, '\n')
>
> and get
>
> file entry "" cannot be converted to Float64
>
>
>
> On Friday, July 18, 2014 1:57:00 PM UTC+2, Mauro wrote:
>>
--

Andrei Berceanu

unread,
Jul 18, 2014, 9:39:54 AM7/18/14
to julia...@googlegroups.com
Seems like my files also contained blank lines periodically. I added blank line removal to the sed script and now it all works.
Thanks a lot!

Steven G. Johnson

unread,
Jul 18, 2014, 2:20:51 PM7/18/14
to julia...@googlegroups.com


On Friday, July 18, 2014 7:12:32 AM UTC-4, Andrei Berceanu wrote:
Space is an issue, yes, but I agree, I can process them one by one using some sed scripting. I just thought there is a simple idiom corresponding to Python's 2-liner above.

Sure, you can do:

datastring = replace(readall("foo.dat"), "D", "e")
data = readdlm(IOBuffer(datastring))

Tomas Lycken

unread,
Jul 19, 2014, 4:06:56 AM7/19/14
to julia...@googlegroups.com

OT: I get the code boxes using an extension to Chrome called Markdown Here - it works real well, but it has the disadvantage that you have to manually do the conversion before sending, and I keep forgetting :P

// T

Reply all
Reply to author
Forward
0 new messages