need help reading web response with content-type: text/xml

557 views
Skip to first unread message

Ramakrishna T

unread,
Jun 23, 2013, 8:34:04 AM6/23/13
to phan...@googlegroups.com
Hi,
Could someone help me with reading the web response with content-type: text/xml. I have the following program that works correctly when the data sent from the server is HTML. But, in some cases, I receive a plain string of text with content-type set to text/xml and when that happens I see some HTML markup that has the words 'Parse Error' in page.content property.

var page = require('webpage').create();
var system = require('system');
var chapterinfo = system.args[1];
var urlproviderurl = 'http://www.example.com/somepage';

page.settings.userAgent = 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.52 Safari/537.36';
page.injectJs('jquery.min.js');

page.customHeaders = {
    "Content-Type": "application/json;charset=UTF-8",
    "Accept": "application/json, text/plain, */*"
};

page.open(urlproviderurl, 'post', chapterinfo, function (status) {
    if (status !== 'success') {
        console.log('Network failure.');
        phantom.exit(1);
    }

    console.log(page.content);
    phantom.exit();
});
 
Any help is much appreciated.
Ram

Darren Cook

unread,
Jun 23, 2013, 8:55:39 AM6/23/13
to phan...@googlegroups.com
> Could someone help me with reading the web response with content-type:
> text/xml. I have the following program that works correctly when the data
> sent from the server is HTML. But, in some cases, I receive a plain string
> of text with content-type set to text/xml and when that happens I see some
> HTML markup that has the words 'Parse Error' in page.content property.

Have you tried using page.plainText instead of page.content?

https://github.com/ariya/phantomjs/wiki/API-Reference-WebPage#wiki-webpage-plainText

Though, really, phantomJS is the wrong tool for this job; curl would be
a better choice. (Unless this is just one step in a more complicated
screen-scraping or functional test system, of course.)

Darren


>
> var page = require('webpage').create();
> var system = require('system');
> var chapterinfo = system.args[1];
> var urlproviderurl = 'http://www.example.com/somepage';
>
> page.settings.userAgent = 'Mozilla/5.0 (Windows NT 6.2; WOW64)
> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.52 Safari/537.36';
> page.injectJs('jquery.min.js');
>
> page.customHeaders = {
> "Content-Type": "application/json;charset=UTF-8",
> "Accept": "application/json, text/plain, */*"
> };
>
> page.open(urlproviderurl, 'post', chapterinfo, function (status) {
> if (status !== 'success') {
> console.log('Network failure.');
> phantom.exit(1);
> }
>
> console.log(page.content);
> phantom.exit();
> });
>
> Any help is much appreciated.
> Ram
>


--
Darren Cook, Software Researcher/Developer

http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)

Ramakrishna T

unread,
Jun 23, 2013, 9:32:33 AM6/23/13
to phan...@googlegroups.com
Darren,
I did try using page.plainText. And it basically returned the same stuff as page.content without the markup. And as for your suggestion to use Curl, I can't use it because it isn't a good fit for what I am attempting to do.

Ram.

James Greene

unread,
Jun 23, 2013, 9:58:34 AM6/23/13
to phan...@googlegroups.com

Does this response come back the same when you use cURL to send the request? I'm curious if the post data you are sending to the server is what contains a parse error, thus causing the server to respond with an XML error page.

Sincerely,
   James Greene

--
You received this message because you are subscribed to the Google Groups "phantomjs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to phantomjs+...@googlegroups.com.
Visit this group at http://groups.google.com/group/phantomjs.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Ramakrishna T

unread,
Jun 23, 2013, 10:58:21 AM6/23/13
to phan...@googlegroups.com
James,
There is no problem with the response being sent by the server. I get the correct response. I verified it with the help of a proxy server. Following is what I am getting inside of Phantomjs.

console.log(page.content) output
--------------------------------
<html xmlns="http://www.w3.org/1999/xhtml"><body><parsererror style="display: block; white-space: pre; border: 2px solid #c77; padding: 0 1em 0 

1em; margin: 1em; background-color: #fdd; color: black"><h3>This page contains the following errors:</h3><div style="font-

family:monospace;font-size:12px">error on line 1 at column 1: Start tag expected.</div><h3>Below is a rendering of the page up to the first 

error.</h3></parsererror></body></html>

console.log(page.plainText) output
----------------------------------
This page contains the following errors:

error on line 1 at column 1: Start tag expected.
Below is a rendering of the page up to the first error.

Thanks,
Ram

James Greene

unread,
Jun 23, 2013, 11:38:37 AM6/23/13
to phan...@googlegroups.com

What does the actual response data look like, though?

Sincerely,
   James Greene

Ramakrishna T

unread,
Jun 23, 2013, 11:54:57 AM6/23/13
to phan...@googlegroups.com
James,
The response contains an http URL in a plain string. Basically, I post some data to a URL and the URL returns another URL in a plain string with content-type set to text/xml.

Thanks,
Ram

Darren Cook

unread,
Jun 23, 2013, 6:45:51 PM6/23/13
to phan...@googlegroups.com
> The response contains an http URL in a plain string. Basically, I post some
> data to a URL and the URL returns another URL in a plain string with
> content-type set to text/xml.
> Sample response : http://www.example.com/xxxx/yyyy/xyz.dat

(I'm assuming you mean it is sending that literal string, i.e. a URL,
and therefore not sending any xml tags.)

In that case I guess your question could be changed to: can PhantomJS
change the mime-type of a response when the server sends the wrong one?

I don't know the answer to that. If the obvious fixes [1][2] are not
possible, perhaps you could use a proxy server running on localhost to
rewrite the response?

Darren

[1]: Obvious fix #1: Have the server send the correct content-type of
text/plain.

[2]: Obvious fix #2: Have the server send XML. E.g.
<url>http://www.example.com/xxxx/yyyy/xyz.dat</url>

Ramakrishna T

unread,
Jun 23, 2013, 7:56:27 PM6/23/13
to phan...@googlegroups.com
Darren,
Thanks for your suggestions. But,it doesn't work for me as the server is not under my control. 

Ram

James Greene

unread,
Jun 25, 2013, 3:28:56 PM6/25/13
to phan...@googlegroups.com
Darren's suggestion of using a local proxy server, however, would work very well for you. You should look into that.

Sincerely,
    James Greene



Reply all
Reply to author
Forward
0 new messages