Getting xml in R using httr and/or xml2

2,271 views
Skip to first unread message

ArjunaCap

unread,
Dec 5, 2015, 8:21:32 PM12/5/15
to manipulatr
I'm trying to do something unfamiliar, but here's the gist

I want to download xml from a page requiring authenitication (simple login/ pwd)

I can sort of get started with the following pseudocode:

library(httr)


my_url
<- "http://myurl/somexml.xml"
x
<- GET(my_url, authenticate("user", "password"))
my_xml
<- content(x)


This connects successfully.

str(my_xml)

yields the following:

Classes 'HTMLInternalDocument', 'HTMLInternalDocument', 'XMLInternalDocument', 'XMLAbstractDocument' <externalptr>

If I then try to use an xml2 command, such as:

library(xml2)

y
<- read_xml(my_xml)



I end up with the following error

  no applicable method for 'read_xml' applied to an object of class "c('HTMLInternalDocument', 'HTMLInternalDocument', 'XMLInternalDocument', 'XMLAbstractDocument')" 

Finally, if I try:

y <- read_xml(my_url, authenticate("user", "password"))


It seems to fail to connect, believe because 'authenticate' doesn't work with read_xml

Finally, finally; in Hadley's fine webinar on Getting Data in R, he mentions that xml is painful generally, but I'd love any direction to my specific problem above, or anything more general (like a tutorial and/or best practices) about going from xml to a data frame.  Google has yielded little.

Thanks in advance.

jim holtman

unread,
Dec 5, 2015, 8:48:45 PM12/5/15
to ArjunaCap, manipulatr
Have you tried the XML package?  I have use it to parse both XML and HTTP.  If you plan to make heavy use, I recommend the book "

XML and Web Technologies for Data Sciences with R"




Jim Holtman
Data Munger Guru
 
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

--
You received this message because you are subscribed to the Google Groups "manipulatr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to manipulatr+...@googlegroups.com.
To post to this group, send email to manip...@googlegroups.com.
Visit this group at http://groups.google.com/group/manipulatr.
For more options, visit https://groups.google.com/d/optout.

Hadley Wickham

unread,
Dec 5, 2015, 9:09:55 PM12/5/15
to ArjunaCap, manipulatr
The problem is that httr is giving you xml from the XML package, which I no longer recommend using. I think you can call xml2::read_xml on the httr object instead. 

Hadley
--
You received this message because you are subscribed to the Google Groups "manipulatr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to manipulatr+...@googlegroups.com.
To post to this group, send email to manip...@googlegroups.com.
Visit this group at http://groups.google.com/group/manipulatr.
For more options, visit https://groups.google.com/d/optout.


--
http://had.co.nz/

Michael Cawthon

unread,
Dec 6, 2015, 11:18:04 AM12/6/15
to Hadley Wickham, manipulatr
Thank you, but the recommended solution doesn't work; ie,

my_url <- "http://myurl/somexml.xml"
x
<- GET(my_url, authenticate("user", "password"
))
xml2::read_xml(x)


yields the following:

Error in UseMethod("read_xml") :
  no applicable method for 'read_xml' applied to an object of class "response".


and

xml2::read_xml(content(x))

Gives the same XML related error

Is there a way to use authenticate with xml2, and simply bypass httr for my case?
-- 

Michael Cawthon
Chief Investment Officer
Green Street Energy LLC
mcaw...@greenstenergy.com
p: 479-442-1407

Hadley Wickham

unread,
Dec 6, 2015, 11:52:44 AM12/6/15
to Michael Cawthon, manipulatr
No, you'll need to explore xml2 more to figure it out. I'm on my phone on a plane but I can look later in the week. 

Hadley


--
http://had.co.nz/

Hadley Wickham

unread,
Dec 10, 2015, 10:16:20 AM12/10/15
to Michael Cawthon, manipulatr
Ok, currently the best way would be:

read_xml(content(x, "raw"))

but I've filed an issue so that read_xml(x) can work directly -
https://github.com/hadley/xml2/issues/63.

Hadley
--
http://had.co.nz/
Reply all
Reply to author
Forward
0 new messages