Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

unable to import csv-Data

119 views
Skip to first unread message

Andre Koppel

unread,
May 18, 2011, 7:17:55 AM5/18/11
to
Hello to all,

I am trying to import some data from a csv-file. But I am absolutely
unable to get any usefull result.
I have tried several options to do formating during input, but in every
case Mathematica 8 puts several csv-columns
into one result-column.
Because the csv-data contains germany encoding, I have tried several
conversion options, but nothing helps.
Here is a snapshot of the csv-data (one headline two datalines):
--------------------- cut here ------------------------
ID;KONTO_NR;KONTO_BEZ;BELEG_DAT;BELEG_NR;GKTO_NR;GKTO_BEZ;BU_TEXT;SOLL;HABEN;Buchsaldo;WAEHRUNG;Faelligkeit;Anfangsbestand;Ausgeblendet;Changed;InsoBaseUser;BuJahr
1;;;;;;;;7807477,41;6986382,79;,00;;;False;False;2010-11-01
10:24:09.997;KDLB\Conrad;
2;D_60004;Jeske, Norbert 23966 Hof
Triwalk;2008-01-01;;S_09008;Vortrag;EB-Werte durch AIS TaxAudit
berechnet und erstellt;387,37;,00;387,37;EUR;;True;True;2010-11-01
10:24:09.997;KDLB\Conrad;
--------------------- cut here ------------------------
I have tried the following import-command (and several versions of it),
but did not get useful import-result:
imp = Import["test.csv", "Table", "FieldSeparators" -> ";",
"DateStringFormat" -> { "Year", "-", "Month", "-", "Day"},
"CharacterEncoding" -> "ASCII", "HeaderLines" -> 1] ;

For me it looks like the CSV-Importer is unable to detect NULL-Values (;;)?!?

By the way reading the data into excel, writing a resulting xls-file and
importing the xls-file into mathematica works out of the box,
but I can't go this way because there are more than 200000 datalines,
and Excel did not support such a great amount and Mathematica
was unable to import Excel-2010-Formated xlsx-Data (ods didn't works
also because of the great amount of data).

Any help would be highly appreciated
Kind regards
Andre

--
Andre Koppel Software GmbH
Prinz-Handjery-Str. 38
14167 Berlin
Tel.: (+4930) 810 09 190
Fax: (+4930) 326 01 046
www.invep.de
www.akso.de

Eingetragen beim Amtsgericht
Berlin Charlottenburg HRB92600
Geschäftsführer Andre Koppel

Albert Retey

unread,
May 19, 2011, 7:44:52 AM5/19/11
to
Hi,

> I am trying to import some data from a csv-file. But I am absolutely
> unable to get any usefull result. I have tried several options to do
> formating during input, but in every case Mathematica 8 puts several
> csv-columns into one result-column. Because the csv-data contains
> germany encoding, I have tried several conversion options, but
> nothing helps. Here is a snapshot of the csv-data (one headline two
> datalines):

> For me it looks like the CSV-Importer is unable to detect
> NULL-Values (;;)?!?

I think that the option:

"RepeatedSeparators" -> False

would get rid of that problem, but I'm not sure whether it will make
everything go right. Alternatively you could also try to StringSplit bz
hand where a ; appears. This would also have the advantage that you
could read and process the data in junks with ReadList, which might or
might not be necessary with 200000 lines of data.

hth,

albert

PS: considering the high standard of data privacy in Europe and
especially Germany I would be very careful about sending personal data
of any kind to a newsgroup...

David Bailey

unread,
May 19, 2011, 7:42:55 AM5/19/11
to
I think one approach may be to:

1) Read the file into Mathematica as a list of strings with
ReadList[file,String]

2) Merge in "\n" newlines into the list with Riffle.

3) Use StringReplace to "Anglicise" the file. You may need to use
several nested calls to this function to make sure semicolons don't get
replaced by commas before the original commas get replaced.

4) Use ImportString to import the end result.

String operations are remarkably fast, so this may work more efficiently
than you expect!

David Bailey
http://www.dbaileyconsultancy.co.uk


David Annetts

unread,
May 19, 2011, 7:42:34 AM5/19/11
to
Hi Andre,

You might be better off with something older ....

inpu = OpenRead["test.csv"];
rawd = ReadList[inpu, Record]; (* read everything into single records *)
Close[inpu];

rawd = StringReplace[#, {"," -> "."}] & /@ rawd; (* account for euro
decimals & other stuff *)
spl = StringSplit[#, ";"] & /@ rawd (* ... before splitting on ;'s *)
TableForm[spl] (* just everything aligns .. *)

which looks OK, although I suspect an error in the second record after
the EUR.

So now you're left with the task of converting dates & times, but that's
straightforward.

D.

Hans Michel

unread,
May 20, 2011, 6:35:41 AM5/20/11
to
Andre:
The following does work

Import["D:\\akk\\test.csv", "Table", {"FieldSeparators" -> ";",
"CharacterEncoding" -> "ASCII", "HeaderLines" -> 1,
"EmptyField" -> "", "RepeatedSeparators" -> False,
"DateStringFormat" -> {"Year", "-", "Month", "-", "Day"}}]

See the Options for "Table" import

ref/format/Table

Hans

-----Original Message-----
From: Andre Koppel [mailto:ako...@akso.de]
Sent: Wednesday, May 18, 2011 6:18 AM
Subject: unable to import csv-Data

Hello to all,

I am trying to import some data from a csv-file. But I am absolutely
unable to get any usefull result.
I have tried several options to do formating during input, but in every
case Mathematica 8 puts several csv-columns
into one result-column.
Because the csv-data contains germany encoding, I have tried several
conversion options, but nothing helps.
Here is a snapshot of the csv-data (one headline two datalines):
--------------------- cut here ------------------------

ID;KONTO_NR;KONTO_BEZ;BELEG_DAT;BELEG_NR;GKTO_NR;GKTO_BEZ;BU_TEXT;SOLL;HABE=
N
;Buchsaldo;WAEHRUNG;Faelligkeit;Anfangsbestand;Ausgeblendet;Changed;InsoBas=
e


User;BuJahr
1;;;;;;;;7807477,41;6986382,79;,00;;;False;False;2010-11-01
10:24:09.997;KDLB\Conrad;
2;D_60004;Jeske, Norbert 23966 Hof
Triwalk;2008-01-01;;S_09008;Vortrag;EB-Werte durch AIS TaxAudit
berechnet und erstellt;387,37;,00;387,37;EUR;;True;True;2010-11-01
10:24:09.997;KDLB\Conrad;
--------------------- cut here ------------------------
I have tried the following import-command (and several versions of it),
but did not get useful import-result:

imp == Import["test.csv", "Table", "FieldSeparators" -> ";",


"DateStringFormat" -> { "Year", "-", "Month", "-", "Day"},
"CharacterEncoding" -> "ASCII", "HeaderLines" -> 1] ;

For me it looks like the CSV-Importer is unable to detect NULL-Values
(;;)?!?

By the way reading the data into excel, writing a resulting xls-file and
importing the xls-file into mathematica works out of the box,
but I can't go this way because there are more than 200000 datalines,
and Excel did not support such a great amount and Mathematica
was unable to import Excel-2010-Formated xlsx-Data (ods didn't works
also because of the great amount of data).

Any help would be highly appreciated
Kind regards
Andre

--


Andre Koppel Software GmbH
Prinz-Handjery-Str. 38
14167 Berlin
Tel.: (+4930) 810 09 190
Fax: (+4930) 326 01 046
www.invep.de
www.akso.de

Eingetragen beim Amtsgericht
Berlin Charlottenburg HRB92600

Gesch=E4ftsf=FChrer Andre Koppel

Andre Koppel

unread,
May 20, 2011, 7:18:43 AM5/20/11
to
Hi Hans,
I have tried your Import-Parameters. With my complete dataset it didn't work.
The result was a strange mix of cells.
Currently I am reading the file completely doing the convertion manually
by using resgular expressions.
Even if the csv-import would work, it's horribly slow (more than 1000
Seconds with you Import-Parameters).
It would be nice, if there would exist a csv-Import-Function where there
could be given a list that
describes the contens of the columns. In such case Import should run
much more faster producing better results.
Thank you for your suggestion.
Andre

> ID;KONTO_NR;KONTO_BEZ;BELEG_DAT;BELEG_NR;GKTO_NR;GKTO_BEZ;BU_TEXT;SOLL;HABEN
> ;Buchsaldo;WAEHRUNG;Faelligkeit;Anfangsbestand;Ausgeblendet;Changed;InsoBase


> User;BuJahr
> 1;;;;;;;;7807477,41;6986382,79;,00;;;False;False;2010-11-01
> 10:24:09.997;KDLB\Conrad;
> 2;D_60004;Jeske, Norbert 23966 Hof
> Triwalk;2008-01-01;;S_09008;Vortrag;EB-Werte durch AIS TaxAudit
> berechnet und erstellt;387,37;,00;387,37;EUR;;True;True;2010-11-01
> 10:24:09.997;KDLB\Conrad;
> --------------------- cut here ------------------------
> I have tried the following import-command (and several versions of it),
> but did not get useful import-result:

> imp = Import["test.csv", "Table", "FieldSeparators" -> ";",


> "DateStringFormat" -> { "Year", "-", "Month", "-", "Day"},
> "CharacterEncoding" -> "ASCII", "HeaderLines" -> 1] ;
>
> For me it looks like the CSV-Importer is unable to detect NULL-Values
> (;;)?!?
>
> By the way reading the data into excel, writing a resulting xls-file and
> importing the xls-file into mathematica works out of the box,
> but I can't go this way because there are more than 200000 datalines,
> and Excel did not support such a great amount and Mathematica
> was unable to import Excel-2010-Formated xlsx-Data (ods didn't works
> also because of the great amount of data).
>
> Any help would be highly appreciated
> Kind regards
> Andre
>


--
Andre Koppel Software GmbH
Prinz-Handjery-Str. 38
14167 Berlin
Tel.: (+4930) 810 09 190
Fax: (+4930) 326 01 046
www.invep.de
www.akso.de

Eingetragen beim Amtsgericht
Berlin Charlottenburg HRB92600

Geschäftsführer Andre Koppel


Hans Michel

unread,
May 21, 2011, 6:45:35 AM5/21/11
to
Andre:

I missed an item in your file.

Import["D:\\akk\\test.csv", "Table", {"FieldSeparators" -> ";",
"CharacterEncoding" -> "ASCII", "HeaderLines" -> 1,
"EmptyField" -> "", "RepeatedSeparators" -> False,

"DateStringFormat" -> {"Year", "-", "Month", "-", "Day"},
"NumberPoint" -> ","}]

The NumberPoint option tells how to treat "," which are the equivalent to
"." in US.

That too can explain the bad alignment. I was focusing on your original need
to include nulls, but you also have a need to treat decimal points also.

Gesch=E4ftsf=FChrer Andre Koppel


Hans Michel

unread,
May 21, 2011, 6:45:46 AM5/21/11
to
Andre:

I believe the reason for the slower speed is the DateStringFormat option. If
you have time remove it and see if there is any speed gains (it will still
be slow).
The date string format option is being applied to every entry in the larger
file. That was the reason I moved that format to the end, for easy removal.
Your dates are already near ISO8601 format so they can be well ordered with
simple a removal of the "-". Also the EmptyField should be used to transform
empty field to "Null".

I would place such a large dataset in a database. Mathematica can connect
with Access database or HSQL. The import of Excel from flat file works in a
similar fashion to Access.

E. Martin-Serrano

unread,
May 21, 2011, 6:46:07 AM5/21/11
to
Right now I am importing CSV (Excel) files as it follows in Mathematica
V8.01. Windows7.

(* import data form ECB European Central Bank currency files *)

NotebookDirectory[];
ratesfile = ToFileName[NotebookDirectory[], "eurofxref-hist.csv"];
Print["rates file name - > ", ratesfile];
cdata = Import[ratesfile, "Data"];

The file "eurofxref-hist.csv" can be found at

http://www.ecb.int/stats/exchange/eurofxref/html/index.en.html

Where you can download a zipped CSV file. After unzipped the file
"eurofxref-hist.csv" can be imported easily as above.

Regards.

-----Mensaje original-----
De: Andre Koppel [mailto:ako...@akso.de]
Enviado el: viernes, 20 de mayo de 2011 13:19
Para: math...@smc.vnet.net
Asunto: Re: unable to import csv-Data

GeschE4ftsf=FChrer Andre Koppel


Hans Michel

unread,
May 22, 2011, 6:53:47 AM5/22/11
to
Martin-Serrano:

It would have been nicer if you supplied the actual URI to the historical
file instead of having me hunt for it
http://www.ecb.int/stats/eurofxref/eurofxref-hist.zip?7864d71f16dd2ab9c90332
18b84780a7

More generally:
http://www.ecb.int/stats/eurofxref/eurofxref-hist.zip

Seems the file will change every day. I would think the GUID may be a way to
get back that date range, but without it you get most current available
date. This is of course a guess. I have used similar methods for clients but
used simple number to delineate past dates [?10 means ten days from current
date server time not UTC].

This will work in Mathematica.

Import["http://www.ecb.int/stats/eurofxref/eurofxref-hist.zip",
"eurofxref-hist.csv"]
(Would not use it for very large zip files)

It's all good when the data provider has provided a format in a comma
separated value, and the values are clean and all the data is well formatted
and aligned so null value are "N/A" etc.

If you have time go into your csv file and replace all the commas with
semi-colons, change the "." to "," and then see what happens when you try
Import with the method you demonstrate. How would you import your data
without the header and you can't drop First after import?

CSV files are not Excel files, they did not invent this format (I don't
think so, can't find support for this statement). Microsoft may have
hijacked this mime type to show an Excel icon. See

http://tools.ietf.org/html/rfc4180

http://en.wikipedia.org/wiki/Comma-separated_values

I prefer a quoted csv file with escapes. It is more portable.

Import["D:\\akk\\test.csv", "Table", {"FieldSeparators" -> ";",
"CharacterEncoding" -> "ASCII", "HeaderLines" -> 1,
"EmptyField" -> "", "RepeatedSeparators" -> False,

"DateStringFormat" -> {"Year", "-", "Month", "-", "Day"},
"NumberPoint" -> ","}]

The above may look very long and foreboding but Mathematica is giving us an
opportunity to define the (E)BNF for the parser (DateStringFormat excepted).

Hans

E. Martin-Serrano

unread,
May 22, 2011, 6:54:51 AM5/22/11
to
Well Hans,

I am not sure whether I understand your post.

I supplied you the link to the ECB site, which contains itself several
links to zipped "CSV", "PDF", "XML" and other file formats for EUR/USD and
many other currency exchange rates within the same files. The only thing to
do is just select programmatically the currency pair you need and the
period of time you could be interested in. Of course by using my approach
the file must be unzipped before treated, as you say.

It is true that the ECB updates the file every day (at 2 PM GMT) since
every day the market generates a new rate for each pair of currencies, and
the market in Europe closes at that time. Note that in the link (the site) I
posted we have the choice of downloading the whole file (since 1999) or just
the new rates (for all the currencies for the current day; this one is in
the file 'eurorefxref.zip'. Obviously, either, one has to insert the new
rate in the general file, or download the complete one with the new pairs
included.

I wrote myself the charting program I am using to process currency data
based on the technic 'Ichimoku Kinko Hyo', and I wrote the program before
Wolfram released its charting set of tools in v8 (I am reviewing them by
now, and as far as I know it is not included by Wolfram, am I wrong?). Back
to the ECB files, I had preferred download the file the way I do instead of
accessing the internet each time I run the system. So, I tried to make
independent the daily sessions with the available past data, from the
actual availability of Internet connectivity (and traffic), though, it is
true that I could save the unloaded file for the rest of the day. But I
never try to write perfect programs and I am not good at importing (and
exporting) files with Mathematica.

On the other hand, I do not have any problem in dropping the superfluous
items in the files, (either headers or others) . It is worse to cope with
other inconveniences related to the way the ECB provides data as it lacks
the "Open", "High" and "Low" properties, since only provides the "Close"
value; still, so does the FED and never in CSV format as far as I know.
Bloomberg and Forex data poses another kind of problems.

Anyway, I will have a look at your proposal.

Many thanks

Emilio.

P.S. I would be interested in modifying some aspects of the Candlestick
tools to include some of my stuff. But before I might go into the details of
these new tools already available.

-----Mensaje original-----
De: Hans Michel [mailto:hmi...@cox.net]
Enviado el: s=E1bado, 21 de mayo de 2011 17:39
Para: 'E. Martin-Serrano'; math...@smc.vnet.net
Asunto: RE: Re: unable to import csv-Data

Martin-Serrano:

More generally:
http://www.ecb.int/stats/eurofxref/eurofxref-hist.zip

http://tools.ietf.org/html/rfc4180

http://en.wikipedia.org/wiki/Comma-separated_values

Hans

http://www.ecb.int/stats/exchange/eurofxref/html/index.en.html

Regards.

GeschE4ftsf==FChrer Andre Koppel


Hans Michel

unread,
May 23, 2011, 6:24:19 AM5/23/11
to
Emilio:

I did not understand your original post. Since it was tied to my thread I
took it as a comment on "why all these options I just do this and it works".
As am not familiar with the ECB data I had to actually go open it and view
what you were indicating. I thought hey European they may be saving "CSV"
files using ";" for separators and "," for decimal delimiters. Simply
putting in "Data" as an Import parameter would import this type of file into
Mathematica with no other needs.

It may be the case that depending on a users IP that the ECB provides
regional "CSV" files but I got "US/UK" "CSV" files which Mathematica has no
problems parsing. The original users post was asking help with "CSV" files
with "European" "CSV" format. If just simply adding "Data" and Mathematica
would parse "European" "CSV" format it would be a learning point.

I was under the impression that most of the "locale and
internationalization" in Mathematica were generally for output not input. So
say if you are in a EU region your OS sets decimal to "," you go to the ECB
and they produce a file for your region even though the call it "CSV" by
just using "Data" in Import all will go well in Mathematica.

All this can be answered simply by you. Open the "CSV" file from ECB that
you unzip does it contain ";" instead of "," and "," instead of ".". If it
does then Mathematica is working in a smooth way. If it does not contain
those transforms then this is "not" the issue of the original poster.

Please excuse any misleading comments. I simply meant to say that some time
was invested figuring how to use a web site. Using the ECB web site may be
obvious to you, but it was not to me, until time was invested. The point of
the original question was not to learn to use a website but how to parse
files that are EU formatted "CSV" files with null entries.

I do not try to write perfect programs but I do try to automate as much as
possible, alert when things fail, and fix by hand when necessary. This
allows me to move on to other projects. Even if you can do all the ECB work
at 10 minutes per day, if you add 10 more projects to your plate that use
the same method of needing the human element to download each file and
process that is now 100 minutes each day. I always hear the argument that it
took you 1000 minutes to code a process that will save you 10 minutes per
day. At that rate it would take 100 days before one breaks even. But the
point is to not be tied down by task that can be automated. The trivial
solution to do some of the process by hand is always going to be available
as a solution path.

Mathematica 8 has many new features that help in automation "Timed
Evaluations".

Regards

BernieTheJet

unread,
May 25, 2011, 5:56:42 AM5/25/11
to
I find it is easiest to change the CSV extension to DAT and just use
Import.

B

0 new messages