Get URL tree

WANG Cong

unread,

Nov 23, 2009, 1:03:50 AM11/23/09

to

Hello, experts,

Is there any quick way to list the directories under a specified
URL without downloading any actual files?

For example, I want to get the directories under http://127.0.0.1/,
I hope I can get something like this:

http://127.0.0.1/
|-- a
| |--a.txt
|-- b
`-- c.txt

like what 'tree' did for local directories.

I checked the man pages of wget and curl, it seems that
'wget -r --delete-after' is near what I want, but I even
don't want any actual downloads.

Thanks!

Ben Finney

unread,

Nov 23, 2009, 1:26:35 AM11/23/09

to

WANG Cong <xiyou.w...@gmail.com> writes:

> Is there any quick way to list the directories under a specified
> URL without downloading any actual files?

URLs lead to documents. Those documents may or may not be HTML
documents, which may or may not contain links to other URLs.

In other words, the web is a *web*, not a hierarchy. Sometimes it is
used to represent a hierarchy, but there's nothing in the web that
implies a hierarchy of documents, and nothing to say that any given URL
will or will not lead to such a hierarchy.

So, if you *suspect* that a URL will lead you to a document that will
lead to other documents via links, then the only way to discover that
structure (via HTTP) is to download the documents and follow the links
in any HTML documents you find.

> I checked the man pages of wget and curl, it seems that 'wget -r
> --delete-after' is near what I want, but I even don't want any actual
> downloads.

You can't avoid that if HTTP is your method of interacting with the
site. Only the documents contain the information you need to traverse
the “tree”, or even to discover whether it's a tree at all.

--
\ “The optimist thinks this is the best of all possible worlds. |
`\ The pessimist fears it is true.” —J. Robert Oppenheimer |
_o__) |
Ben Finney

Alan Curry

unread,

Nov 23, 2009, 2:35:12 AM11/23/09

to

In article <87bpita...@benfinney.id.au>,

Ben Finney <bignose+h...@benfinney.id.au> wrote:
>WANG Cong <xiyou.w...@gmail.com> writes:
>
>> Is there any quick way to list the directories under a specified
>> URL without downloading any actual files?
>
>URLs lead to documents. Those documents may or may not be HTML
>documents, which may or may not contain links to other URLs.
>
>In other words, the web is a *web*, not a hierarchy. Sometimes it is

WebDAV gets it a little closer to being a hierarchy. But yeah, if you like
hierarchies, you're gonna find that the web is not very likeable.

--
Alan Curry

Message has been deleted

WANG Cong

unread,

Nov 24, 2009, 8:17:22 AM11/24/09

to

On Mon, 23 Nov 2009 17:26:35 +1100, Ben Finney wrote:

> WANG Cong <xiyou.w...@gmail.com> writes:
>
>> Is there any quick way to list the directories under a specified URL
>> without downloading any actual files?
>
> URLs lead to documents. Those documents may or may not be HTML
> documents, which may or may not contain links to other URLs.
>
> In other words, the web is a *web*, not a hierarchy. Sometimes it is
> used to represent a hierarchy, but there's nothing in the web that
> implies a hierarchy of documents, and nothing to say that any given URL
> will or will not lead to such a hierarchy.

Yeah, but if a URL is not a not HTML, say txt, we can end to follow it.

>
> So, if you *suspect* that a URL will lead you to a document that will
> lead to other documents via links, then the only way to discover that
> structure (via HTTP) is to download the documents and follow the links
> in any HTML documents you find.
>
>> I checked the man pages of wget and curl, it seems that 'wget -r
>> --delete-after' is near what I want, but I even don't want any actual
>> downloads.
>
> You can't avoid that if HTTP is your method of interacting with the
> site. Only the documents contain the information you need to traverse
> the “tree”, or even to discover whether it's a tree at all.

Right, I think I need to clarify this, I know I must retrieve the doc,
but I don't need and want to save it into my disk, I suppose wget
or something could save it into a buffer and then parse it... Can't it?

Thanks!