Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Path without filename for a relative URL

1,112 views
Skip to first unread message

Jack Marks

unread,
Oct 15, 2001, 2:35:33 PM10/15/01
to
I need to construct an absolute URL
(protocol[:port]//[host][/path][/file]) given a relative url in an
HTML page (<a href="somepage.html">). I am using java.net.URL to
retrieve the content, then I walk through the document and use regular
expressions to identify the links. The problem is that I cannot
separate out the path from the file, as the URL class returns the same
value from getPath() and getFile(). Is there a clean way to get only
the path and not the file? How does my browser do it?

An example may also help:

1st page is http://www.myhost.com. On that page is a link, <a
href="news">News</a>, which dynamically displays the links to a list
of news stories. These links are relative to news, <a
href="story.html">Story<a/>. Therefore, absolute URL for story.html is
http://www.myhost.com/news/story.html. Clearly I could use getHost()
getPath() to build this link, but the problem comes when story.html
contains links that are also relative to news.html. Now getPath()
returns "/news/story.html", not just "news".

Thanks,

Jack
j_m...@yahoo.com

Paul Lutus

unread,
Oct 15, 2001, 3:55:09 PM10/15/01
to
"Jack Marks" <j_m...@yahoo.com> wrote in message
news:76d9084.01101...@posting.google.com...

Have you tried String.lastIndexOf('/') ? That is the usual approach to
splitting a file name from a path.

--
Paul Lutus
www.arachnoid.com


Ossie J. H. Moore

unread,
Oct 16, 2001, 8:21:59 AM10/16/01
to
I think this is what you are trying to do...

[omoore@okmoore tmp]$ java Test http://www.seg.org/index.html
http://www.seg.org

[omoore@okmoore tmp]$ cat Test.java
public class Test
{
public static void main(String args[])
{
//in your case, this would set 's' to the full URL
String s = args[0];
//store only the path in the 'path' variable
String path = s.substring(0,s.lastIndexOf("/"));
//prin out the path
System.out.println(path);
}
}


--

Ossie J. H. Moore
ossie...@home.com

Online JDK documentaiton:
http://java.sun.com/j2se/1.3/docs/api/index.html

Give Star Office 6.0 Beta a try...
http://www.sun.com/staroffice/6.0beta/

Babu Kalakrishnan

unread,
Oct 17, 2001, 4:58:17 AM10/17/01
to
On 15 Oct 2001 11:35:33 -0700, Jack Marks <j_m...@yahoo.com> wrote:
>I need to construct an absolute URL
>(protocol[:port]//[host][/path][/file]) given a relative url in an
>HTML page (<a href="somepage.html">). I am using java.net.URL to
>retrieve the content, then I walk through the document and use regular
>expressions to identify the links. The problem is that I cannot
>separate out the path from the file, as the URL class returns the same
>value from getPath() and getFile(). Is there a clean way to get only
>the path and not the file? How does my browser do it?

How about using the constructor

new URL(documentURL,"somepage.html");

?

BK

0 new messages