Is there any quick way to list the directories under a specified
URL without downloading any actual files?
For example, I want to get the directories under http://127.0.0.1/,
I hope I can get something like this:
http://127.0.0.1/
|-- a
| |--a.txt
|-- b
`-- c.txt
like what 'tree' did for local directories.
I checked the man pages of wget and curl, it seems that
'wget -r --delete-after' is near what I want, but I even
don't want any actual downloads.
Thanks!
> Is there any quick way to list the directories under a specified
> URL without downloading any actual files?
URLs lead to documents. Those documents may or may not be HTML
documents, which may or may not contain links to other URLs.
In other words, the web is a *web*, not a hierarchy. Sometimes it is
used to represent a hierarchy, but there's nothing in the web that
implies a hierarchy of documents, and nothing to say that any given URL
will or will not lead to such a hierarchy.
So, if you *suspect* that a URL will lead you to a document that will
lead to other documents via links, then the only way to discover that
structure (via HTTP) is to download the documents and follow the links
in any HTML documents you find.
> I checked the man pages of wget and curl, it seems that 'wget -r
> --delete-after' is near what I want, but I even don't want any actual
> downloads.
You can't avoid that if HTTP is your method of interacting with the
site. Only the documents contain the information you need to traverse
the “tree”, or even to discover whether it's a tree at all.
--
\ “The optimist thinks this is the best of all possible worlds. |
`\ The pessimist fears it is true.” —J. Robert Oppenheimer |
_o__) |
Ben Finney
WebDAV gets it a little closer to being a hierarchy. But yeah, if you like
hierarchies, you're gonna find that the web is not very likeable.
--
Alan Curry
> WANG Cong <xiyou.w...@gmail.com> writes:
>
>> Is there any quick way to list the directories under a specified URL
>> without downloading any actual files?
>
> URLs lead to documents. Those documents may or may not be HTML
> documents, which may or may not contain links to other URLs.
>
> In other words, the web is a *web*, not a hierarchy. Sometimes it is
> used to represent a hierarchy, but there's nothing in the web that
> implies a hierarchy of documents, and nothing to say that any given URL
> will or will not lead to such a hierarchy.
Yeah, but if a URL is not a not HTML, say txt, we can end to follow it.
>
> So, if you *suspect* that a URL will lead you to a document that will
> lead to other documents via links, then the only way to discover that
> structure (via HTTP) is to download the documents and follow the links
> in any HTML documents you find.
>
>> I checked the man pages of wget and curl, it seems that 'wget -r
>> --delete-after' is near what I want, but I even don't want any actual
>> downloads.
>
> You can't avoid that if HTTP is your method of interacting with the
> site. Only the documents contain the information you need to traverse
> the “tree”, or even to discover whether it's a tree at all.
Right, I think I need to clarify this, I know I must retrieve the doc,
but I don't need and want to save it into my disk, I suppose wget
or something could save it into a buffer and then parse it... Can't it?
Thanks!