[erlang-questions] kick start a newb!

0 views
Skip to first unread message

Dustin Whitney

unread,
Feb 6, 2008, 11:57:39 PM2/6/08
to erlang-q...@erlang.org
I'd really like to get into Erlang, but I'm stumbling over some UTF 8 troubles.  Could someone please explain to me why the code below results in a list of integers instead of the XML document I'm supposed to get (I think it has something to do with UTF8), I'd really appreciate it.  The second listing of code works just fine, and I believe it works because slashdot encodes their data in iso-8859-1.   How can I output UTF8 in a format that is readable?  And, ultimately I want to run an XPath query against the XML document I get from the GET request... baby steps I suppose.  Any help would be great, especially links to documentation!

Thanks,
Dustin

Listing #1

-module(tmp).
-export([get_url/0]).

get_url() ->
    {_,{_, Header, Body}} = http:request("http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&should-sponge=&query=PREFIX+xsd%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0D%0APREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0D%0APREFIX+rdf%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%0D%0APREFIX+dbpedia2%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fproperty%2F%3E%0D%0A%0D%0ASELECT+*+WHERE+%7B%0D%0A%3Fsubject+rdf%3Atype+%3Chttp%3A%2F%2Fdbpedia.org%2Fclass%2Fyago%2FCity108524735%3E.%0D%0A%3Fsubject+rdfs%3Alabel+%3Flabel.%0D%0A%3Fsubject+dbpedia2%3Apopulation+%3Fpopulation.%0D%0AFILTER+%28lang%28%3Flabel%29+%3D+%22en%22+%26%26+xsd%3Ainteger%28%3Fpopulation%29+%3E+200000%29%0D%0A%7D&format=application%2Fsparql-results+xml&debug=on"),
    Body.



Listing #2

-module(tmp).
-export([get_url/0]).

get_url() ->
    {_,{_, Header, Body}} = http:request("http://slashdot.org"),
    Body.

Lev Walkin

unread,
Feb 7, 2008, 4:02:55 PM2/7/08
to Dustin Whitney, erlang-q...@erlang.org

Try printing out [65, 65, 65], the rest should be obvious.

> ------------------------------------------------------------------------
>
> _______________________________________________
> erlang-questions mailing list
> erlang-q...@erlang.org
> http://www.erlang.org/mailman/listinfo/erlang-questions


Samuel Tesla

unread,
Feb 7, 2008, 5:42:30 PM2/7/08
to Dustin Whitney, erlang-q...@erlang.org
Dustin:

Looking at the URL you included in your post, I'd agree it probably is UTF-8 related. As Lev was alluding to in his response, in Erlang strings are just a list of integers. At the top level, when you have a list of integers, if all of them are printable ASCII characters, it will display them as a string, hence [65,65,65] will be printed "AAA", whereas [1,2,3] will print [1,2,3].

Not all of the bytes in a UTF-8 encoded string are necessarily printable ASCII. So, it shows up as a list of integers. What you may want to do is look at xmerl (http://erlang.org/doc/apps/xmerl/index.html) to parse the XML. It will handle the character set conversions. If, for some reason, you want to do it on your own, I believe xmerl_ucs:from_utf8/1 may serve you well. But, that module is undocumented and thus not guaranteed to remain the same from release to release.

I hope that helps!

-- Samuel

Dustin Whitney

unread,
Feb 7, 2008, 5:57:12 PM2/7/08
to Samuel Tesla, erlang-q...@erlang.org
I modified the last line of my script to io:fwrite(Body). and that printed it out for me in a human readable format.  I will check out xmerl_ucs:from_utf8/1 and see what it does.  I want to run some XPath against the document anyway, so I'll be looking at that module anyway.  I really appreciate the help.

-Dustin
Reply all
Reply to author
Forward
0 new messages