pandas read_html with german decimal

50 views
Skip to first unread message

Fabian Braennstroem

unread,
Sep 8, 2014, 2:01:51 PM9/8/14
to pyd...@googlegroups.com
Hello,

does anyone have an idea, how to read html tables of german sites with
its decimal writing;
e.g. the tables on the bottom of http://www.finanzen.net/bilanz_guv/SAP

Right now, I try to read that with:
df =
pd.read_html(url,infer_types=False,parse_dates=False,header=0,skiprows=0,thousands=".",match=pattern,index_col=0)
dfR = df[0].replace(",",".")
dfR = pd.DataFrame(dfR, dtype='float')

But, this is not really working.

Thanks in advance!
Best Regards
Fabian

Fabian Braennstroem

unread,
Sep 9, 2014, 1:46:28 PM9/9/14
to pyd...@googlegroups.com
Hello,

with this it works:
for col in dfR.columns:
dfR[col] = dfR[col].str.replace(r".","")
dfR[col] = dfR[col].str.replace(r",",".").astype("float")

Best Regards
Fabian

Andy Hayden

unread,
Sep 9, 2014, 1:54:52 PM9/9/14
to pyd...@googlegroups.com
read_csv has a decimal kwarg, perhaps we should add it to read_html. In fact I think it's related/could be incorporated to this (recent) issue: https://github.com/pydata/pandas/issues/8200.

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Fabian Braennstroem

unread,
Sep 24, 2014, 11:43:59 AM9/24/14
to pyd...@googlegroups.com
Thanks for the info!
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages