'content' and showing URL

Stan Brown

unread,

Dec 23, 2012, 7:13:27 PM12/23/12

to

Hi, folks! I'm trying to show the actual URLs of selected links when
someone prints a page. Here's my CSS, and it works fine for external
links:

@media print {
a.showURL { text-decoration:none }
a.showURL:after { content: "[URL: " attr(href) " " attr(title) "]";
padding:0 0.25em; text-decoration:underline }
}

So where's the problem, you say? It's with internal links, all of
which are relative links. If someone looks at a printed page and
sees a "URL" of ../ti83dir/math200a.htm, they won't know what to do
with it.

Is there a way to display the full URL of the target through CSS, or
do I need to give up using relative links in those cases? (I really
like relative links, because I edit and test my site locally and then
upload pages without altering them in any way.)

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2.1 spec: http://www.w3.org/TR/CSS21/
validator: http://jigsaw.w3.org/css-validator/
Why We Won't Help You:
http://diveintomark.org/archives/2003/05/05/why_we_wont_help_you

dorayme

unread,

Dec 23, 2012, 8:23:08 PM12/23/12

to

In article <MPG.2b4168301...@news.individual.net>,

Stan Brown <the_sta...@fastmail.fm> wrote:

> Hi, folks! I'm trying to show the actual URLs of selected links when
> someone prints a page. Here's my CSS, and it works fine for external
> links:
>
> @media print {
> a.showURL { text-decoration:none }
> a.showURL:after { content: "[URL: " attr(href) " " attr(title) "]";
> padding:0 0.25em; text-decoration:underline }
> }
>
> So where's the problem, you say? It's with internal links, all of
> which are relative links. If someone looks at a printed page and
> sees a "URL" of ../ti83dir/math200a.htm, they won't know what to do
> with it.
>
> Is there a way to display the full URL of the target through CSS, or
> do I need to give up using relative links in those cases? (I really
> like relative links, because I edit and test my site locally and then
> upload pages without altering them in any way.)

If you could settle for a base from which all links go, you could try
adding:

a.showURL:after { content: "[URL: "
"http://yourserver.com/"attr(href) attr(title) "]";

padding:0 0.25em; text-decoration:underline }
}

--
dorayme

Jukka K. Korpela

unread,

Dec 24, 2012, 4:30:09 AM12/24/12

to

2012-12-24 3:23, dorayme wrote:

> If you could settle for a base from which all links go, you could try
> adding:
>
> a.showURL:after { content: "[URL: "
> "http://yourserver.com/"attr(href) attr(title) "]";
> padding:0 0.25em; text-decoration:underline }
> }

Yes, but this would mean that *all* links have to be relative, and if
you ever add a link that isn't, its URL will be displayed wrong.

The attr(...) notation is simplistic: it denotes the attribute value as
such, as a string, and there is no processing of URLs possible in CSS.

In CSS4 Selectors, an addition to this has been suggested, indirectly,
as :local-link(...) pseudo-classes. They could be used to distinguish
between links by their URLs - but it's questionable whether this would
really be useful here, and in case it's just a sketchy draft, probably
with no attempted implementations.

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Stan Brown

unread,

Dec 24, 2012, 8:44:58 AM12/24/12

to

On Mon, 24 Dec 2012 12:23:08 +1100, dorayme wrote:
>
> In article <MPG.2b4168301...@news.individual.net>,
> Stan Brown <the_sta...@fastmail.fm> wrote:
> > @media print {
> > a.showURL { text-decoration:none }
> > a.showURL:after { content: "[URL: " attr(href) " " attr(title) "]";
> > padding:0 0.25em; text-decoration:underline }
> > }
> >

> > Is there a way to display the full URL of the target through CSS,
> > or do I need to give up using relative links in those cases?
>

> If you could settle for a base from which all links go, you could try
> adding:
>
> a.showURL:after { content: "[URL: "
> "http://yourserver.com/"attr(href) attr(title) "]";
> padding:0 0.25em; text-decoration:underline }
> }

Thanks to you and Jukka for your speedy responses.

I think this has promise. It would mean having two show-URL-type
styles, one for external links (my original) and one for internal
(your suggestion), but that doesn't seem like much of a burden. And
in fact on internal links I don't really want the title attribute
anyway; see below. Thanks!

Follow-up question: I'm using the title="..." attribute in the
<a href> tag to show the last date I accessed an external link. I
confess I've never understood the _purpose_ of title="...", since the
link text in <a>...</a> serves that purpose, but its _function_ seems
to be to pop up a tool tip in graphical browsers, and I like the idea
of showing how old a link is.

Does title="..." serve any useful purpose? Am I grossly misusing it?
Is there a better way to show when an external link was last
verified, or is that not even worth doing?

Jukka K. Korpela

unread,

Dec 24, 2012, 9:49:23 AM12/24/12

to

2012-12-24 15:44, Stan Brown wrote:

>> a.showURL:after { content: "[URL: "
>> "http://yourserver.com/"attr(href) attr(title) "]";
>> padding:0 0.25em; text-decoration:underline }
>> }

[...]

> I think this has promise. It would mean having two show-URL-type
> styles, one for external links (my original) and one for internal
> (your suggestion), but that doesn't seem like much of a burden.

It's possible to distinguish between internal and external links using
classes, of course, but you can avoid the need for extra classes if it
is ok to distinguish between relative and absolute URLs in links.
Usually internal links are relative, external aren't. This means you
could use something like the following:

a[href]:after {
content: "[URL: http://www.example.com/" attr(href) "]";
padding:0 0.25em;
}
a[href ^= "http"]:after {
content: "[URL: " attr(href) attr(title) "]";
}

The selector [href ^= "http"] matches an element with an href attribute
that has a value starting with "http".

This is a bit simplistic, and it postulates that no relative link starts
with "http". (I'm using "http" and not "http://" to cover "https://",
too.) More seriously, it fails for URLs like "//www.example.com",
"/foo/bar", and "../foo", because the URLs won't be resolved in CSS -
they are just treated as strings. And naturally e.g. "ftp://" links
would be incorrectly treated as relatuve in this setup.

Regarding text-decoration: underline, I don't see why you would
underline a URL that is clearly indicated as a URL with the "[URL: " and
"]" wrappers. Besides, the :after pseudo-element content will be
underlined anyway if the link itself is. It's really a problem how to
prevent that, but on the other hand, is there any reason to underline
links in print media, especially if they are a followed by their URLs in
brackets?

I might also consider adding word-break: break-all to allow free line
breaking within the URL, but that's debatable. Long URLs might otherwise
cause messy appearance, but on the other hand, URLs shouldn't really be
*freely* wrapped even when inside wrappers.

> Follow-up question: I'm using the title="..." attribute in the
> <a href> tag to show the last date I accessed an external link.

It's a bit odd.

> I confess I've never understood the _purpose_ of title="...", since the
> link text in <a>...</a> serves that purpose,

The idea is that the title attribute may provide an advisory title that
is not essential for the page content but helps the user to see, without
following the link, additional information about the destination of the
link. Jakob Nielsen, the usability expert, wrote about this as early as
in 1998: "Using Link Titles to Help Users Predict Where They Are Going",
http://www.useit.com/alertbox/980111.html

For example, your link text might be something as short as "WCAG 2.0"
and the advisory title might be "Web Content Accessibility Guidelines
(WCAG) 2.0 - W3C Recommendation 11 December 2008". This should not be an
excuse for using cryptic abbreviations as link texts; but it might be OK
in a context where the abbreviation has been explained or can otherwise
be assumed to be known - the advisory title would then be just a
friendly reminder.

However, although the basic idea is good, the implementation of title
attributes as tooltips in browsers is lousy: tiny font (unaffected by
CSS) in box that vanishes after a few seconds. The progress of CSS has
made "CSS tooltips" a better idea: write the advisory title as normal
content after the link, wrap it in an element, make it initially hidden,
and make it appear (in a positioned element) on mouseover.

> but its _function_ seems
> to be to pop up a tool tip in graphical browsers,

That's one of its functions. It may also matter to search engines, and
speech browsers may give the user optional access to it. But although
these are useful features, the most important feature is the tooltip
effect, which not that good an idea. (And if you use a "CSS tooltip",
you probably should not use the title attribute, since the effect would
be annoying.)

> and I like the idea
> of showing how old a link is.

I would say that the date would be a suitable part of an advisory title
but not an advisory title as such. When displayed as a title tooltip or
in a CSS tooltip, a mere date or a date preceded by e.g. "Last verified"
might look a bit odd.

> Is there a better way to show when an external link was last
> verified, or is that not even worth doing?

Well, it could be displayed as part of the information in a CSS tooltip.

But I would not do it. No matter how you formulate the explanation of
the date, people will often take it as indicating the last update time
of the linked document.

--
Yucca, http://www.cs.tut.fi/~jkorpela/

tlvp

unread,

Dec 24, 2012, 4:09:08 PM12/24/12

to

On Mon, 24 Dec 2012 08:44:58 -0500, Stan Brown wrote:

> ... I'm using the title="..." attribute in the

> <a href> tag to show the last date I accessed an external link. I
> confess I've never understood the _purpose_ of title="...", since the
> link text in <a>...</a> serves that purpose, but its _function_ seems
> to be to pop up a tool tip in graphical browsers, and I like the idea
> of showing how old a link is.
>
> Does title="..." serve any useful purpose? Am I grossly misusing it?
> Is there a better way to show when an external link was last
> verified, or is that not even worth doing?

Since you seem to be asking not for normative recommendations but for ideas
how to put the title="..." attribute to good use in an A href="..." tag,
I'll tell you what all I've occasionally used as values for it:

. ISO file-date of item being linked to (perhaps prefixed with a "New!");
. likely loading time of item being linked to;
. width and height of (graphical) item being linked to;
. file-size of item being linked to;
. rough indication of content of item being linked to;
. language used in text-item being linked to;
. full title of numbered adventure-episode being linked to.

Your "Link last checked on ..." seems like a valid use of title="..." ; and
I'm sure title= values can be put to many other good uses as well.

Cheers, -- tlvp
--
Avant de repondre, jeter la poubelle, SVP.

Stan Brown

unread,

Dec 24, 2012, 6:44:07 PM12/24/12

to

On Mon, 24 Dec 2012 16:49:23 +0200, Jukka K. Korpela wrote:
> a[href ^= "http"]:after {
> content: "[URL: " attr(href) attr(title) "]";
> }
>
> The selector [href ^= "http"] matches an element with an href attribute
> that has a value starting with "http".
>

Thanks for responding to my follow-ups, Jukka. I want to read your
whole article with more care, but I have an initial question: how
good is browser support for this ^= construct?

(I suppose it's not terribly critical if a browser ignores it, since
all that means is that printed copies won't get the URL, but still
I'd like to have some idea how well supported it is.)

Jukka K. Korpela

unread,

Dec 25, 2012, 4:29:42 AM12/25/12

to

2012-12-25 1:44, Stan Brown wrote:

> how good is browser support for this ^= construct?

Pretty good: virtually all browsers except IE 6 (which still has 6.0%
share worldwide, mostly due to continued use in Asia, especially in
China). According to
http://reference.sitepoint.com/css/css3attributeselectors
support is "buggy" in IE 7+. However, the problems are mainly in IE 7,
and the more serious support issue is that IE 7 does not support
generated content.

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Stan Brown

unread,

Dec 25, 2012, 5:55:30 PM12/25/12

to

On Mon, 24 Dec 2012 16:49:23 +0200, Jukka K. Korpela wrote:
>

> It's possible to distinguish between internal and external links using
> classes, of course, but you can avoid the need for extra classes if it
> is ok to distinguish between relative and absolute URLs in links.
> Usually internal links are relative, external aren't. This means you
> could use something like the following:
>
> a[href]:after {
> content: "[URL: http://www.example.com/" attr(href) "]";
> padding:0 0.25em;
> }
> a[href ^= "http"]:after {
> content: "[URL: " attr(href) attr(title) "]";
> }

Yes, all my external links are absolute and all my internal ones are
relative. But I think I have a further problem that does force me to
use extra classes:

The root of "my" site is http://www.tc3.edu/instruct/sbrown/ and the
HTML files are in subdirectories under that, such as
http://www.tc3.edu/instruct/sbrown/stat/ and
http://www.tc3.edu/instruct/sbrown/ti83/ .
(Those two account for all or nearly all of the pages where I'd want
the showURL feature for internal links.)

If I have a showURL-type link from a page in stat to another page in
stat, it will be like <a class="showURL" href="gof_ht.htm"> but the
generated content needs to include "stat/" and not just the root URL.
Maybe it's a failure of imagination, but I can't think of a way to
write this such that the printed URL and the href="..." are both
correct.

What I've done is visible at
http://www.tc3.edu/instruct/sbrown/tc3.css
with one within-directory link just below
http://www.tc3.edu/instruct/sbrown/stat/twoway.htm#Summary
and one between-directory link just below
http://www.tc3.edu/instruct/sbrown/stat/twoway.htm#TW_HT34

I've created a "statURL" class that knows to add the home URL plus
"stat/". When I start doing this in the "ti83" documents I'll create
another "ti83URL" class that knows to add the home URL plus "ti83/".

Again, if I'm missing something and the extra classes truly aren't
necessary, please let me know. But if they _are_ necessary, I'm not
unduly concerned because it's only a couple of extra lines in the
CSS.

> The selector [href ^= "http"] matches an element with an href attribute
> that has a value starting with "http".

Thanks for your answer about browser support. I think this may be a
moot point because of my directory structure, but it did cause me to
look at CSS3 selectors for the first time. There's a lot there!

> This is a bit simplistic, and it postulates that no relative link starts
> with "http". (I'm using "http" and not "http://" to cover "https://",
> too.) More seriously, it fails for URLs like "//www.example.com",
> "/foo/bar", and "../foo", because the URLs won't be resolved in CSS -
> they are just treated as strings. And naturally e.g. "ftp://" links
> would be incorrectly treated as relatuve in this setup.

All noted. I don't have any pages whose internal links would look
like any of the patterns you mention, and I never omit the http[s]://
from external links. I don't think I have any external ftp links,
but I'll bear that restriction in mind.

> Regarding text-decoration: underline, I don't see why you would
> underline a URL that is clearly indicated as a URL

You're right about this. I originally had rather different content,
and I modified it piece by piece without really stepping back and
looking at it overall. I've removed the underline and also the
padding.

> I might also consider adding word-break: break-all to allow free
> line breaking within the URL, but that's debatable. Long URLs might
> otherwise cause messy appearance, but on the other hand, URLs
> shouldn't really be *freely* wrapped even when inside wrappers.

That's my feeling too.

> > Follow-up question: I'm using the title="..." attribute in the
> > <a href> tag to show the last date I accessed an external link.
>

> But I would not do it. No matter how you formulate the explanation
> of the date, people will often take it as indicating the last
> update time of the linked document.

Thanks for the advice. I will have to think on this a bit more. The
standard bibliographic citation of Web pages that we follow includes
"accessed" and the date, not only because URLs tend to disappear but
because there's no way to know when many Web pages were updated.

I think people understand "accessed", but I understand you're saying
they don't. I'll give it some more thought. One nice thing for me is
that I can do say

grep showURL | grep -v showURL.*2012

to find all the links I haven't verified this year. I try to prune
dead links fairly regularly. But maybe there's a way I can save this
information without putting it in title="..."; a  comment,
for instance.

dorayme

unread,

Dec 25, 2012, 7:45:37 PM12/25/12

to

In article <MPG.2b4226608...@news.individual.net>,

Stan Brown <the_sta...@fastmail.fm> wrote:

> On Mon, 24 Dec 2012 12:23:08 +1100, dorayme wrote:
> >
> > In article <MPG.2b4168301...@news.individual.net>,
> > Stan Brown <the_sta...@fastmail.fm> wrote:
> > > @media print {
> > > a.showURL { text-decoration:none }
> > > a.showURL:after { content: "[URL: " attr(href) " " attr(title) "]";
> > > padding:0 0.25em; text-decoration:underline }
> > > }
> > >
> > > Is there a way to display the full URL of the target through CSS,
> > > or do I need to give up using relative links in those cases?
> >
> > If you could settle for a base from which all links go, you could try
> > adding:
> >
> > a.showURL:after { content: "[URL: "

> > "http://stanbrownserver.com/"attr(href) attr(title) "]";

> > padding:0 0.25em; text-decoration:underline }
> > }
>
> Thanks to you and Jukka for your speedy responses.
>
> I think this has promise. It would mean having two show-URL-type
> styles, one for external links (my original) and one for internal
> (your suggestion), but that doesn't seem like much of a burden. And
> in fact on internal links I don't really want the title attribute
> anyway; see below. Thanks!
>

You could have the following plan, using CSS but not to generate text,
rather to suppress it. Style all links to include the address but it
only showing up in a printed page, not a normal visual browser page
that has CSS working.

Suppose you wish to link to another page on your site when directing
someone on your home page to a page with your personal contact
details. You might normally have something like:

See <a href="contact.html">contact me</a>, etc

and want for the webpage on a screen to look like just

See "contact me" (underlined and in blue - plainly a link and nothing
out of the ordinary), etc

But, in case it is printed, you complicate it on your html page to:

See <a href="contact.html"><span class="foronlinepageonly">contact
details</span><span class="forprintedpageonly">
stanbrownserver.com/contact.html</span></a>, feel free to get in touch
...

And have the CSS for the screen view CSS:

a .forprintedpageonly {display: none}

and for the print view CSS:

a .foronlinepageonly {display: none}

Vary or simplify to suit your needs, if you word things differently
you might not need a second class for the print styling.

> Follow-up question: I'm using the title="..." attribute in the
> <a href> tag to show the last date I accessed an external link. I
> confess I've never understood the _purpose_ of title="...", since the
> link text in <a>...</a> serves that purpose, but its _function_ seems
> to be to pop up a tool tip in graphical browsers, and I like the idea
> of showing how old a link is.
>
> Does title="..." serve any useful purpose? Am I grossly misusing it?
> Is there a better way to show when an external link was last
> verified, or is that not even worth doing?

About title="" in links, I have used them as additional help to
explain the nature of the destination (the permanent link text might
be too long winded and ugly otherwise). JK has listed the advantages
and disadvantages of this and some modern improvements to this concept.

--
dorayme

Martin Leese

unread,

Dec 25, 2012, 10:16:25 PM12/25/12

to

Stan Brown wrote:
...

> I think people understand "accessed", but I understand you're saying
> they don't. I'll give it some more thought.

Wikipedia uses "accessdate" in its citation
templates. You might want to use that or
"access date".

--
Regards,
Martin Leese
E-mail: ple...@see.Web.for.e-mail.INVALID
Web: http://members.tripod.com/martin_leese/

Evertjan.

unread,

Dec 26, 2012, 4:40:05 AM12/26/12

to

Stan Brown wrote on 25 dec 2012 in
comp.infosystems.www.authoring.stylesheets:

> If I have a showURL-type link from a page in stat to another page in
> stat, it will be like <a class="showURL" href="gof_ht.htm"> but the
> generated content needs to include "stat/" and not just the root URL.
> Maybe it's a failure of imagination, but I can't think of a way to
> write this such that the printed URL and the href="..." are both
> correct.

That's where serverside scripting comes in.
You can just script into the css-code.

--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)

Martin Leese

unread,

Dec 26, 2012, 1:09:47 PM12/26/12

to

Martin Leese wrote:
> Stan Brown wrote:
> ...
>> I think people understand "accessed", but I understand you're saying
>> they don't. I'll give it some more thought.
>
> Wikipedia uses "accessdate" in its citation
> templates. You might want to use that or
> "access date".

Even better, use "Retrieved", which is what
Wikipedia citation templates actually
display to the reader.

Ed Mullen

unread,

Dec 26, 2012, 1:42:21 PM12/26/12

to

Stan Brown wrote:

>
> Thanks for the advice. I will have to think on this a bit more. The
> standard bibliographic citation of Web pages that we follow includes
> "accessed" and the date, not only because URLs tend to disappear but
> because there's no way to know when many Web pages were updated.
>
> I think people understand "accessed", but I understand you're saying
> they don't. I'll give it some more thought. One nice thing for me is
> that I can do say
>
> grep showURL | grep -v showURL.*2012
>
> to find all the links I haven't verified this year. I try to prune
> dead links fairly regularly. But maybe there's a way I can save this
> information without putting it in title="..."; a  comment,
> for instance.
>

Stan, are you manually verifying links? If so you might want to try
Xenu's Link Sleuth:

http://home.snafu.de/tilman/xenulink.html

I've been using it for years to verify links on the sites I manage and
it works very well. Good reporting features as well.

--
Ed Mullen
http://edmullen.net/
"Who the hell wants to hear actors talk?" - H.M. Warner, Warner
Brothers, 1927.

Martin Leese

unread,

Dec 26, 2012, 7:24:32 PM12/26/12

to

Ed Mullen wrote:

> Stan, are you manually verifying links? If so you might want to try
> Xenu's Link Sleuth:
>
> http://home.snafu.de/tilman/xenulink.html
>
> I've been using it for years to verify links on the sites I manage and
> it works very well. Good reporting features as well.

Also Link Checker from W3:
http://validator.w3.org/checklink

I have used Xenu's Link Sleuth but, for me,
it timed out on a number of valid links. I
guess my ISP is not that well connected.
The Link Checker runs on the W3 server, and
that seems well connected.

Stan Brown

unread,

Dec 26, 2012, 8:26:36 PM12/26/12

to

On Wed, 26 Dec 2012 17:24:32 -0700, Martin Leese wrote:
>
> Ed Mullen wrote:
>
> > Stan, are you manually verifying links? If so you might want to try
> > Xenu's Link Sleuth:
> >
> > http://home.snafu.de/tilman/xenulink.html
> >
> > I've been using it for years to verify links on the sites I manage and
> > it works very well. Good reporting features as well.
>
> Also Link Checker from W3:
> http://validator.w3.org/checklink
>
> I have used Xenu's Link Sleuth but, for me,
> it timed out on a number of valid links. I
> guess my ISP is not that well connected.
> The Link Checker runs on the W3 server, and
> that seems well connected.

Thanks for both suggestions.

I do verify external links manually, as it happens. (Internal links
are verified by a couple of AWK scripts that I wrote. I use the same
scripts to verify a few private collections that are never intended
for a public server.)

I actually had Xenu on my previous computer, but I don't think I used
it. I'll give one or both of these a try.

Stan Brown

unread,

Dec 26, 2012, 8:33:57 PM12/26/12

to

On Wed, 26 Dec 2012 11:45:37 +1100, dorayme wrote:
> You could have the following plan, using CSS but not to generate text,
> rather to suppress it. Style all links to include the address but it
> only showing up in a printed page, not a normal visual browser page
> that has CSS working.
>

I hadn't thought of doing this, but actually it has potential.
Without going into a big long explanation, I'm already generating
links of the type we talk about via a macro that inserts the
appropriate text into the HTML while I'm building the page, so this
bit of extra wouldn't be a problem.

It would also get around one problem I have with JK's scheme.
Namely, some links when printed look like
http://www.tc3.edu/instruct/sbrown/stat/../ti83/math200a.htm
which of course is valid but unattractive. My macro at page-
generation time can easily tidy that up.

BootNic

unread,

Dec 27, 2012, 3:31:02 AM12/27/12

to

In article <MPG.2b456f948...@news.individual.net>, Stan Brown
<the_sta...@fastmail.fm> wrote:

[snip]

> It would also get around one problem I have with JK's scheme. Namely,
> some links when printed look like
> http://www.tc3.edu/instruct/sbrown/stat/../ti83/math200a.htm which of
> course is valid but unattractive. My macro at page- generation time can
> easily tidy that up.

From a printed page, someone would have to type that. Would be much easier to
type: http://www.tc3.edu/instruct/sbrown/ti83/math200a.htm

Not sure what you are using a relative path instead of an absolute path.

url: http://www.tc3.edu/instruct/sbrown/stat/../ti83/math200a.htm
relative path: ../ti83/math200a.htm
absolute path: /instruct/sbrown/ti83/math200a.htm

If you were to convert your links to a absolute path the css could be something
like:

a[href].showURL:after {
content:" "attr(href);
}
a[href^='/'].showURL:after {
content:" http://example.org"attr(href);
}

Javascript solution? Extremely simple but limited to users with javascript
enabled. Convert all links to absolute url.

Server side convert relative path to absolute path or absolute url.

--
BootNic Thu Dec 27, 2012 03:30 am
The human mind treats a new idea the same way the body treats a strange
protein; it rejects it.
*P. B. Medawar*

signature.asc

Stan Brown

unread,

Jan 1, 2013, 8:53:22 PM1/1/13

to

On Wed, 26 Dec 2012 17:24:32 -0700, Martin Leese wrote:
>

> Ed Mullen wrote:
>
> > Stan, are you manually verifying links? If so you might want to try
> > Xenu's Link Sleuth:
> >
> > http://home.snafu.de/tilman/xenulink.html
> >
> > I've been using it for years to verify links on the sites I manage and
> > it works very well. Good reporting features as well.
>
> Also Link Checker from W3:
> http://validator.w3.org/checklink
>
> I have used Xenu's Link Sleuth but, for me,
> it timed out on a number of valid links. I
> guess my ISP is not that well connected.
> The Link Checker runs on the W3 server, and
> that seems well connected.

Thanks again to you both for the suggestions. I tried the validator,
but found the output was not as well set up for usability as it might
be. Also the limit of 150 pages was a problem. Xenu did a better
job on both counts for me.

But both of them shared a problem: when I give a URL like
http://www.tc3.edu/instruct/sbrown/ thee seems to be no setting that
will cause them to examine pages under that URL, including verifying
external links, *without* going into the external pages and
complaining about bad links included in those external pages. Xenu
has a check box, and I tried both settings.

So I think both are better than doing it strictly manually, but both
have problems.

Osmo Saarikumpu

unread,

Jan 3, 2013, 3:39:39 AM1/3/13

to

On 2.1.2013 3:53, Stan Brown wrote:

[About Xenu and W3C Link Checker]

> But both of them shared a problem: when I give a URL like
> http://www.tc3.edu/instruct/sbrown/ thee seems to be no setting that
> will cause them to examine pages under that URL, including verifying
> external links, *without* going into the external pages and
> complaining about bad links included in those external pages. Xenu
> has a check box, and I tried both settings.

My guess is that you are misreading the results. For example, this page:

http://www.tc3.edu/instruct/sbrown/calc/

has three broken links, as reported by both checkers:

http://mat.gsia.cmu.edu/QUANT/NOTES/chap4/node3.html
http://mat.gsia.cmu.edu/QUANT/NOTES/chap4/node4.html
http://mat.gsia.cmu.edu/QUANT/NOTES/chap4/node5.html

Perhaps you assumed that the checkers were reporting that those pages at
mat.gsia.cmu.edu had broken links, instead of referring to your page
with the three broken links?

Follow up set to: comp.infosystems.www.authoring.misc

--
Best wishes, Osmo

Ed Mullen

unread,

Jan 3, 2013, 1:41:00 PM1/3/13

to

Stan, you might try changing the "Maximum depth" setting in Xenu. Click
the Check URL (first button on left of toolbar). Click the More Options
button.

When I run your URL with Maximum depth set to 2 it checks 323 URLs and
returns only 3 "no connection" and 3 "not found".

Also, if you need more info there is a Yahoo Group for Xenu.

http://groups.yahoo.com/group/xenu-usergroup/

--
Ed Mullen
http://edmullen.net/

Ever wonder what the speed of lightning would be if it didn't zigzag?

Dr J R Stockton

unread,

Jan 3, 2013, 2:02:18 PM1/3/13

to

In comp.infosystems.www.authoring.stylesheets message <MPG.2b4d5d20e6bc9
f639...@news.individual.net>, Tue, 1 Jan 2013 20:53:22, Stan Brown
<the_sta...@fastmail.fm> posted:

>On Wed, 26 Dec 2012 17:24:32 -0700, Martin Leese wrote:

>>
>> Also Link Checker from W3:
>> http://validator.w3.org/checklink
>>
>> I have used Xenu's Link Sleuth but, for me,
>> it timed out on a number of valid links. I
>> guess my ISP is not that well connected.
>> The Link Checker runs on the W3 server, and
>> that seems well connected.
>
>Thanks again to you both for the suggestions. I tried the validator,
>but found the output was not as well set up for usability as it might
>be. Also the limit of 150 pages was a problem. Xenu did a better
>job on both counts for me.
>
>But both of them shared a problem: when I give a URL like
>http://www.tc3.edu/instruct/sbrown/ thee seems to be no setting that
>will cause them to examine pages under that URL, including verifying
>external links, *without* going into the external pages and
>complaining about bad links included in those external pages. Xenu
>has a check box, and I tried both settings.
>
>So I think both are better than doing it strictly manually, but both
>have problems.

I have a checker which tests internal links and anchors in the master
copy of a Web site, held locally :
<http://www.merlyn.demon.co.uk/linxchek.htm>; I use it a lot.

Clearly that is not what you want; but you might be able to adapt some
of its methods.

A queue "to be scanned" is initialised with the initial URL, and the
queue is scanned until empty.

The queued URL is read into an iframe, which makes the browser's HTML
engine parse it and then scans the DOM tree as a tree of Objects for
links and anchors (except that the scans) and copies what is needed to
represent only the parts of the DOM tree that are of interest. Any URL
linked to is queued (if not already queued) for future scanning. When
it is reached in the queue, its existence or otherwise is detected.
Anything else of interest is found, from the stored data, after the
queue is empty.

That runs in an ordinary browser, at a great rate - 40s for my whole
site.

Now you would have the cross-domain problem, but that MIGHT not apply if
you were running the page as an HTA in MSIE (caveat - MSIE versions?
maybe not IE10).

Alternatively, ISTM possible that the above could be done in non-GUI
JScript running in Windows Scripting Host, or Powershell, ... ???

Feel free to test LINXCHEK on the master copy of the Oak Road site -
adaptation for UNIX might be needed, but the instructions have something
to say about that.

--
(c) John Stockton, nr London UK Reply address via Home Page.
news:comp.lang.javascript FAQ <http://www.jibbering.com/faq/index.html>.
<http://www.merlyn.demon.co.uk/js-index.htm> jscr maths, dates, sources.
<http://www.merlyn.demon.co.uk/> TP/BP/Delphi/jscr/&c, FAQ items, links.

Michael Stemper

unread,

Jan 15, 2013, 1:45:57 PM1/15/13

to

In article <MPG.2b4168301...@news.individual.net>, Stan Brown <the_sta...@fastmail.fm> writes:

>Hi, folks! I'm trying to show the actual URLs of selected links when
>someone prints a page.

If you don't mind a nosy question, what's the use case for this? I'm
trying to figure out why somebody would want to see URLs in hard copy.
Do people really retype stuff from paper into a browser?

--
Michael F. Stemper
#include <Standard_Disclaimer>
Visualize whirled peas!

David Stone

unread,

Jan 15, 2013, 2:22:56 PM1/15/13

to

In article <kd4855$28n$2...@dont-email.me>,

mste...@walkabout.empros.com (Michael Stemper) wrote:

> In article <MPG.2b4168301...@news.individual.net>, Stan Brown
> <the_sta...@fastmail.fm> writes:
>
> >Hi, folks! I'm trying to show the actual URLs of selected links when
> >someone prints a page.
>
> If you don't mind a nosy question, what's the use case for this? I'm
> trying to figure out why somebody would want to see URLs in hard copy.
> Do people really retype stuff from paper into a browser?

If the link text is not actually the URL. Example:

<div>Please consider
<a href="http://www.example.com/subscribe>subscribing to
our mailing list</a> to keep informed about HTML and CSS
issues.</div>

dorayme

unread,

Jan 15, 2013, 5:09:27 PM1/15/13

to

In article <kd4855$28n$2...@dont-email.me>,
mste...@walkabout.empros.com (Michael Stemper) wrote:

> I'm
> trying to figure out why somebody would want to see URLs in hard copy.
> Do people really retype stuff from paper into a browser?

Yes, sometimes they do. Or at least, I assume they do now and then
(not that I am quite a person).

--
dorayme

tlvp

unread,

Jan 16, 2013, 1:15:42 AM1/16/13

to

On Tue, 15 Jan 2013 18:45:57 +0000 (UTC), Michael Stemper wrote:

> Do people really retype stuff from paper into a browser?

Yup. Cheers, -- tlvp

Jukka K. Korpela

unread,

Jan 16, 2013, 1:43:41 AM1/16/13

to

2013-01-16 8:15, tlvp wrote:

> On Tue, 15 Jan 2013 18:45:57 +0000 (UTC), Michael Stemper wrote:
>
>> Do people really retype stuff from paper into a browser?
>
> Yup. Cheers, -- tlvp

Well, yes we do... but in writing books, I have adopted the googlability
principle: I included just the title of a page or site I refer to, if
this lets the user access it more conveniently than using the URL. This
requires that the title is unique enough, so that typing it in Google
probably yields the right page as the first hit, or among the first few
at least.

A page that I recently referred to when writing a manuscript is
http://www.khronos.org/registry/webgl/specs/latest/
but I include just its title "WebGL Specification". These words, even
without the quotes, give the right page as the first hit in Google.

Previously, I sometimes included the URL after a link when authoring web
pages, using CSS to hide the URL except for print media. With generated
content, this would now be essentially easier to do. But by the
googlability principle, I don't usually do that.

In fact, the title of a document is often more permanent than its URL.

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Stan Brown

unread,

Jan 16, 2013, 9:13:37 AM1/16/13

to

On Tue, 15 Jan 2013 18:45:57 +0000 (UTC), Michael Stemper wrote:
>

> In article <MPG.2b4168301...@news.individual.net>, Stan Brown <the_sta...@fastmail.fm> writes:
>
> >Hi, folks! I'm trying to show the actual URLs of selected links when
> >someone prints a page.
>
> If you don't mind a nosy question, what's the use case for this? I'm
> trying to figure out why somebody would want to see URLs in hard copy.
> Do people really retype stuff from paper into a browser?

They may, if they've printed it or if someone has printed it and
handed it to them.

Have you never printed a Web page and stuck it in a file till you
had time to read it, then wished you knew where the links went to?
Sure, you might be able to make a stab at them by Googling, but it
would be a lot more certain and less work if you just had the URLs in
front of you.

Stan Brown

unread,

Jan 16, 2013, 9:15:07 AM1/16/13

to

On Wed, 16 Jan 2013 08:43:41 +0200, Jukka K. Korpela wrote:
> A page that I recently referred to when writing a manuscript is
> http://www.khronos.org/registry/webgl/specs/latest/
> but I include just its title "WebGL Specification". These words, even
> without the quotes, give the right page as the first hit in Google.
>

Today, they do. But you can't know what will happen in the future.
Newer pages may crowd out the one you actually referred to.

Jukka K. Korpela

unread,

Jan 16, 2013, 9:31:54 AM1/16/13

to

2013-01-16 16:15, Stan Brown wrote:

> On Wed, 16 Jan 2013 08:43:41 +0200, Jukka K. Korpela wrote:
>> A page that I recently referred to when writing a manuscript is
>> http://www.khronos.org/registry/webgl/specs/latest/
>> but I include just its title "WebGL Specification". These words, even
>> without the quotes, give the right page as the first hit in Google.
>
> Today, they do. But you can't know what will happen in the future.

That's an inconvenience indeed.

> Newer pages may crowd out the one you actually referred to.

Indeed, but it is more probable that the URL changes. For a technical
specification with a distinctive name, for example, the title tends to
be more stable.

I'm not saying that printing URLs would always be pointless. But they
tend to mess up rather than help. It might be different if it were
possible, in practice, to use CSS to generate URLs as footnotes or
endnotes. It isn't yet, so if you think URLs are needed - as they might
be e.g. in a scholarly work, where citations need to be exact - I think
you should put them, as content, in an endnotes section. There you might
have just the title as a link, with the URL as generated content from
the href value in print only - or maybe even on screen.

There's a tricky problem with URLs: division into lines. Browsers behave
rather differently (see http://www.cs.tut.fi/~jkorpela/html/nobr.html ),
and the problems need to be addressed by controlling line breaks
somehow. You can't do this with generated content, since you need added
markup or special characters for the purpose. So this would boil down to
having URLs as content (possibly generated by a script, not not as
generated content in CSS sense).

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Michael Stemper

unread,

Jan 16, 2013, 1:53:30 PM1/16/13

to

In article <MPG.2b607f9d1...@news.individual.net>, Stan Brown <the_sta...@fastmail.fm> writes:
>On Tue, 15 Jan 2013 18:45:57 +0000 (UTC), Michael Stemper wrote:
>> In article <MPG.2b4168301...@news.individual.net>, Stan Brown <the_sta...@fastmail.fm> writes:

>> >Hi, folks! I'm trying to show the actual URLs of selected links when
>> >someone prints a page.
>>
>> If you don't mind a nosy question, what's the use case for this? I'm
>> trying to figure out why somebody would want to see URLs in hard copy.
>> Do people really retype stuff from paper into a browser?
>
>They may, if they've printed it or if someone has printed it and
>handed it to them.

Fortunately, nobody's ever printed a web page and handed it to me. If
they did, I'd politely (or possibly snippily) ask them to send me a
URL. My eyes are middle-aged, and when I look at something in a browser,
I can hit <CTRL>-<+> until it's readable. That's hard to do with paper.

>Have you never printed a Web page and stuck it in a file till you
>had time to read it,

No. I bookmark a page until I have time to read it.

I guess this is proof of the common adage that people's methods of
experiencing the Web vary all over the map. I always knew this, but
I didn't realize before now how wide the map is.

--
Michael F. Stemper
#include <Standard_Disclaimer>

"Writing about jazz is like dancing about architecture" - Thelonious Monk

tlvp

unread,

Jan 16, 2013, 11:14:44 PM1/16/13

to

On Wed, 16 Jan 2013 01:15:42 -0500, tlvp answered Michael Stemper:

> On Tue, 15 Jan 2013 18:45:57 +0000 (UTC), Michael Stemper wrote:
>
>> Do people really retype stuff from paper into a browser?
>
> Yup. Cheers, -- tlvp

Perhaps that was too flippant. So let me illustrate. Just today I opened a
printed technical newsweekly to find, on the inside front cover, an HP ad
with the following suggestion:

> ... Learn more
> at convergedinfrastructure.com
> or scan the QR code.

Clearly HP expects some readers to retype that URL from paper :-) . And I'd
be very hesitant to suggest HP is unique in that regard.