get data into dataframe with read_table

84 views
Skip to first unread message

Fabian Braennstroem

unread,
Jan 18, 2017, 3:25:41 AM1/18/17
to pyd...@googlegroups.com
Hello,

I am struggling reading the attached file into a dataframe.

My current try with:

df = pd.read_table(datFil, delim_whitespace=True)

does not work and gives me the following error:

File
"/home/fbraenns/CFD/Programme/CONDA_2/lib/python2.7/site-packages/pandas/io/parsers.py",
line 1314, in read
data = self._reader.read(nrows)
File "pandas/parser.pyx", line 805, in pandas.parser.TextReader.read
(pandas/parser.c:8748)
File "pandas/parser.pyx", line 827, in
pandas.parser.TextReader._read_low_memory (pandas/parser.c:9003)
File "pandas/parser.pyx", line 881, in
pandas.parser.TextReader._read_rows (pandas/parser.c:9731)
File "pandas/parser.pyx", line 868, in
pandas.parser.TextReader._tokenize_rows (pandas/parser.c:9602)
File "pandas/parser.pyx", line 1865, in
pandas.parser.raise_parser_error (pandas/parser.c:23325)
pandas.io.common.CParserError: Error tokenizing data. C error: Expected
11 fields in line 33, saw 12


It seems that the actual column numbers are not so clear for each row.

Do you have an advice how to get this into a dataframe?

Thank you!

Best Regards

Fabian

table_old_extraction_part.dat

Joris Van den Bossche

unread,
Jan 18, 2017, 3:55:24 AM1/18/17
to PyData
The reason you get such an error is because there is also whitespace within your columns, so you cannot use this solely as the delimiter.
But your file looks like to use a fixed whitespace formatting, in that case you should try pd.read_fwf

Regards,
Joris



Fabian

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Pietro Battiston

unread,
Jan 18, 2017, 4:06:10 AM1/18/17
to pyd...@googlegroups.com
Il giorno mer, 18/01/2017 alle 09.25 +0100, Fabian Braennstroem ha
scritto:
> Hello,
>
> I am struggling reading the attached file into a dataframe.
>
> My current try with:
>
> df = pd.read_table(datFil, delim_whitespace=True)
>

"delim_whitespace" won't work because you also have whitespaces in the
data (and not always the same number, e.g. "Apple Inc/aapl" is read as
two fields and "Bank Of America" as three).

I think the following should work:
df = pd.read_table(datFil, sep='\s\s+')

(that is a separator is made by two or more spaces).

Pietro

Fabian Braennstroem

unread,
Jan 18, 2017, 6:51:04 AM1/18/17
to pyd...@googlegroups.com
Hello to you both,

thank you for your quick help. The below option works good.

Thank you!
Fabian

Julien Marrec

unread,
Jan 21, 2017, 1:20:45 PM1/21/17
to pyd...@googlegroups.com
This should work:

df = pd.read_csv('table_old_extraction_part.dat', sep='\s\s+', thousands=',',  header=None)

--
Julien Marrec, EBCP, BPI MFBA
Energy&Sustainability Engineer
T: +33 6 95 14 42 13

LinkedIn (en) | (fr) :

2017-01-18 9:25 GMT+01:00 Fabian Braennstroem <f.braen...@gmx.de>:


Fabian

Reply all
Reply to author
Forward
0 new messages