Cannot parse directory listing as XML

79 views
Skip to first unread message

Zac

unread,
Aug 7, 2011, 10:59:14 AM8/7/11
to Home Media Server
Hi everyone. I am having a hard time getting Brians code to work with
my server. I have figured out where the code is failing but I am not
sure why and I am not sure how to correct the error.

Here is the dump from the debugger.

------ Running ------
HTTP result:
str: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html>
<head>
<title>Index of /media</title>
</head>
<body>
<h1>Index of /media</h1>
<table><tr><th><img src="/icons/blank.gif" alt="[ICO]"></th><th><a
href="?C=N;O=D">Name</a></th><th><a href="?C=M;O=A">Last modified</a></
th><th><a href="?C=S;O=A">Size</a></th><th><a href="?
C=D;O=A">Description</a></th></tr><tr><th colspan="5"><hr></th></tr>
<tr><td valign="top"><img src="/icons/back.gif" alt="[DIR]"></
td><td><a href="/">Parent Directory</a></td><td>&nbsp;</td><td
align="right"> - </td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/folder.gif" alt="[DIR]"></
td><td><a href="episodes/">episodes/</a></td><td align="right">06-
Aug-2011 16:48 </td><td align="right"> - </td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/folder.gif" alt="[DIR]"></
td><td><a href="movies/">movies/</a></td><td align="right">06-Aug-2011
10:30 </td><td align="right"> - </td><td>&nbsp;</td></tr>
<tr><th colspan="5"><hr></th></tr>
</table>
<address>Apache/2.2.17 (Fedora) Server at 192.168.0.14 Port 80</
address>
</body></html>

response: 0
reason:
error: false

Server set to 192.168.0.14/media
HTTP result:
str: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html>
<head>
<title>Index of /media</title>
</head>
<body>
<h1>Index of /media</h1>
<table><tr><th><img src="/icons/blank.gif" alt="[ICO]"></th><th><a
href="?C=N;O=D">Name</a></th><th><a href="?C=M;O=A">Last modified</a></
th><th><a href="?C=S;O=A">Size</a></th><th><a href="?
C=D;O=A">Description</a></th></tr><tr><th colspan="5"><hr></th></tr>
<tr><td valign="top"><img src="/icons/back.gif" alt="[DIR]"></
td><td><a href="/">Parent Directory</a></td><td>&nbsp;</td><td
align="right"> - </td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/folder.gif" alt="[DIR]"></
td><td><a href="episodes/">episodes/</a></td><td align="right">06-
Aug-2011 16:48 </td><td align="right"> - </td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/folder.gif" alt="[DIR]"></
td><td><a href="movies/">movies/</a></td><td align="right">06-Aug-2011
10:30 </td><td align="right"> - </td><td>&nbsp;</td></tr>
<tr><th colspan="5"><hr></th></tr>
</table>
<address>Apache/2.2.17 (Fedora) Server at 192.168.0.14 Port 80</
address>
</body></html>

response: 0
reason:
error: false

Cannot parse directory listing as XML


Here is the code that is failing.
Function getDirectoryListing(url As String) As Object
result = getHTMLWithTimeout(url, 60)

if result.error then
print "ERROR: Could not get directory listing"
return invalid
end if
dir = result.str
' Try parsing the html as if it is XML
xml=CreateObject("roXMLElement")
if not xml.Parse(dir) then
print "Cannot parse directory listing as XML"
return invalid
end if

' grab all the <a href /> elements
urls = getUrls({}, xml)

return urls
End Function

Function getUrls(array as Object, element as Object) As Object
if element.GetName() = "a" and element.HasAttribute("href") then
array.AddReplace(element.GetAttributes()["href"], "")
end if
if element.GetChildElements()<>invalid then
for each e in element.GetChildElements()
getUrls(array, e)
end for
end if
return array
End Function

its failing when it trys xml.parse(dir) but as I mentioned I am not
sure how to correct that situation. I am thinking maybe it is
something on my server but I am honestly not sure.

Any help would be greatly appreciated. Thank you.

Zac

unread,
Aug 7, 2011, 6:35:02 PM8/7/11
to Home Media Server
Ok so I think I have resolved my problem here is what I did.

I changed my website information from what it was index/of to this
<html>
<head>
<title>
Home-Media-Server
</title>
</head>
<body>
<a href="episodes/">episodes</a>
<a href="movies/">movies</a>
</body>

</html>


that worked out great, the channel loaded it has the two different
categories to go to episodes and movies. The only problem is now that
is when I select the movies channel it trys to parse this website and
fails. Here is the website source.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html>
<head>
<title>Index of /movies</title>
</head>
<body>
<h1>Index of /movies</h1>
<table><tr><th><img src="/icons/blank.gif" alt="[ICO]"></th><th><a
href="?C=N;O=D">Name</a></th><th><a href="?C=M;O=A">Last modified</a></
th><th><a href="?C=S;O=A">Size</a></th><th><a href="?
C=D;O=A">Description</a></th></tr><tr><th colspan="5"><hr></th></tr>

<tr><td valign="top"><img src="/icons/back.gif" alt="[DIR]"></
td><td><a href="/">Parent Directory</a></td><td>&nbsp;</td><td
align="right"> - </td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/movie.gif" alt="[VID]"></
td><td><a href="300.mp4">300.mp4</a></td><td align="right">02-Aug-2011
22:00 </td><td align="right">1.7G</td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/movie.gif" alt="[VID]"></
td><td><a href="American%20Psycho.mp4">American Psycho.mp4</a></td><td
align="right">01-Aug-2011 23:27 </td><td align="right">907M</
td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/movie.gif" alt="[VID]"></
td><td><a href="Curious%20George.mp4">Curious George.mp4</a></td><td
align="right">01-Aug-2011 19:10 </td><td align="right">867M</
td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/movie.gif" alt="[VID]"></
td><td><a href="Finding%20Nemo.mp4">Finding Nemo.mp4</a></td><td
align="right">01-Aug-2011 15:49 </td><td align="right">1.8G</
td><td>&nbsp;</td></tr>

<tr><td valign="top"><img src="/icons/movie.gif" alt="[VID]"></
td><td><a href="Gnomeo%20and%20Juliet.mp4">Gnomeo and Juliet.mp4</a></
td><td align="right">05-Aug-2011 17:58 </td><td align="right">1.0G</
td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/movie.gif" alt="[VID]"></
td><td><a href="Happy%20Feet.mp4">Happy Feet.mp4</a></td><td
align="right">01-Aug-2011 09:31 </td><td align="right">1.8G</
td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/movie.gif" alt="[VID]"></
td><td><a href="Pulp%20Fiction.mp4">Pulp Fiction.mp4</a></td><td
align="right">01-Aug-2011 08:35 </td><td align="right">1.5G</
td><td>&nbsp;</td></tr>
<tr><td valign="top"><img src="/icons/movie.gif" alt="[VID]"></
td><td><a href="Yellow%20Handkerchief.mp4">Yellow Handkerchief.mp4</
a></td><td align="right">01-Aug-2011 08:37 </td><td
align="right">1.1G</td><td>&nbsp;</td></tr>
<tr><th colspan="5"><hr></th></tr>

</table>
<address>Apache/2.2.17 (Fedora) Server at 192.168.0.14 Port 80</
address>
</body></html>


so.... do I need to create a simplified websites for each folder? or
should it pull the data automatically? I was thinking that some how I
might need to modify mod_autoindex but I was not sure if that was
necessary or how to do it.

Does anyone have any thoughts on all this?

Thanks for the input.

coderpunk

unread,
Aug 7, 2011, 11:31:33 PM8/7/11
to Home Media Server
Try turning off Apache's fancyindexing option and see if that helps --
I think it must be something in the extra code that embeds that is
making the Roku's XML parser choke.

Zac

unread,
Aug 9, 2011, 12:55:58 AM8/9/11
to Home Media Server
So after reviewing everything. I tried removing the fancyindexing and
that didnt help. What I figure out is that <!DOCTYPE HTML PUBLIC "-//
W3C//DTD HTML 3.2 Final//EN"> line is causing the problem. To the best
of my knowledge it is causing a problem due to the lack of the closing
bracking or </!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
similar to all other XML/HTML commands. Does anyone know how to modify
autoindex to remove that line? Did anyone else have this problem?

That being said I get the channel to parse the first layer (the
folders). I also dont get an error WHEN I click on movies but I always
end up coming right back to the parent folder. I think it has
something to do with the blank file as mentioned in the readme which
make the roku understand how playback should work. I have a file name
movies with not extension on it. It is blank inside. Other then that
here is the HTML to my movies directory on the server.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html>
<head>
<title>Index of /media/movies</title>
</head>
<body>
<h1>Index of /media/movies</h1>
<ul><li><a href="/media/"> Parent Directory</a></li>
<li><a href="300-SD.mp4"> 300-SD.mp4</a></li>

<li><a href="American%20Psycho.mp4"> American Psycho.mp4</a></li>
<li><a href="Curious%20George.mp4"> Curious George.mp4</a></li>
<li><a href="Finding%20Nemo.mp4"> Finding Nemo.mp4</a></li>
<li><a href="Gnomeo%20and%20Juliet.mp4"> Gnomeo and Juliet.mp4</a></
li>
<li><a href="Happy%20Feet.mp4"> Happy Feet.mp4</a></li>
<li><a href="Pulp%20Fiction.mp4"> Pulp Fiction.mp4</a></li>

<li><a href="Yellow%20Handkerchief.mp4"> Yellow Handkerchief.mp4</a></
li>
</ul>
<address>Apache/2.2.17 (Fedora) Server at 192.168.0.14 Port 80</
address>
</body></html>

Not having Fancyindexing on cleans up the html a lot. Course it sure
isnt as fancy but whatever.


Let me know what you think.

Talk to you soon.

--Zac

Zac

unread,
Aug 10, 2011, 1:06:35 AM8/10/11
to Home Media Server
So after a ton of trial and error researching attempting to figure out
how the parse command works for the Roku and XML I think I figure out
everything I wanted to.

So that being said this is what I figured out.
1. every command needs to have a closing bracket. Such as <HTML></
HTML> commands such as <img> and <hr> and <!doc... screw things up.
That being said I am going to see about writing a function of some
sort to remove all the commands that can halt the parse.
2. I have presently turned off FancyIndexing, SupressedIcons, to parse
the data as is. I also wrote a additional command in UrlUtils.brs at
line 245
startLocation = Instr(1,result.str,"<html>")
if startLocation>1 then
result.str = right(result.str,len(result.str)-startLocation+1)
endif
It removes the <!Doc... stuff from the beginning of the document since
I couldnt figure out how to remove it through apache. I find it odd
that I was the only one with this issue but never the less I think
this should fix it up. I will keep you posted on the function to parse
the <img>, <hr> and any other none closing bracket HTML.

Does anyone have any tips on my code? Should I place it further up? or
is there an easier way to write this line of code?

Thank everyone.
Reply all
Reply to author
Forward
0 new messages