Re: HTML and PDF for reading large documents (bookmarks)

dorayme

unread,

May 12, 2016, 6:58:36 AM5/12/16

to

In article <HTML-2016...@ram.dialup.fu-berlin.de>,
r...@zedat.fu-berlin.de (Stefan Ram) wrote:
...

> So it seems that browser manufacturers do not think of
> people who want to use an HTML browser to read a large
> file in more than one sessions and thus need a way to
> later find where they left off in the previous session!

I am reminded of the convenience of a Kindle, where one is always
returned after an interruption reading a book to where one left off.
Even when changing a device on which you are reading (like an iPad
with a Kindle app), if you are on wifi, you will be invited to go to
where you last read. All this is coordinated at remote servers.

Browsers are usually not used in the way you use them. They might
acquire the capabilities if it was more needed. Certainly an *author*,
independently of a browser manufacturer, could put in facilities to
make this easier for a reader, in all sorts of ways.

In the meantime how about a simple copy of a phrase or sentence you
are up to, next time you return to the *very long* document, you
search for the phrase or sentence. This is where you, not the browser
or the author, are in control.

--
dorayme

David E. Ross

unread,

May 12, 2016, 10:39:30 AM5/12/16

to

On 5/12/2016 3:02 AM, Stefan Ram wrote:
> When HTML was new, I thought that I should prefer it over
> PS/PDF for my own internal usage, because HTML was »open« and
> text-file based, while PS/PDF was »closed« or at least it felt
> this way because it was binary-file based and came from a
> single company.
>
> But today I observe that I convert more and more larger
> documents into PDF for my own convenience!
>
> Why?
>
> A minor reason was that I observed that one browser was
> very slow when it had to display large HTML pages.
>
> But the major point is, when I read large documents,
> I want to take a note of some bookmark within the document
> to continue to read there after an interruption.
>
> When a document is split into pages, this is easy. I just
> write down the /page number/. (I do not want to rely on
> in-program bookmarking tools, because I might continue to
> read the same file with a different reader later.)
>
> But when I have a large HTML file, it is impossible for
> me to find any reference that I could write down as an
> indication of where exactly in this large document I left
> off so that I then can continue to read it at the same
> position later (possibly even with another browser).
>
> Actually this is not a deficiency of HTML, but of browsers.
> A browser simply could display that the first character
> fully displayed on the screen is at »position 23.3111841427 %«
> of the document (the percent indication should refer to the
> HTML source code so as to be independent of display settings
> and CSS). But browsers usually do /not/ do this.
>
> Actually HTML makes /more/ sense when one is reading on a
> screen, because an artificial pagination has no use on a
> screen, where on is scrolling the document. But a position
> indicator would still help.

>
> So it seems that browser manufacturers do not think of
> people who want to use an HTML browser to read a large
> file in more than one sessions and thus need a way to
> later find where they left off in the previous session!
>

> Newsgroups: comp.infosystems.www.authoring.html,comp.infosystems.www.browsers.misc
>

I often read very large Web pages. If I need to interrupt, I open a new
plain-text file (.txt on Windows). I copy the URI from my browser and
paste it into the file. Then I copy part of a line of text and paste
that into the file. I save the file, using the title of the Web page as
the file name. Then I can terminate my browser and shut down my PC.
The next day, I can return directly to where I left off.

--
David E. Ross
<http://www.rossde.com/>.

Donald Trump claims everyone likes him. Does that
include his ex-wives? How about the students who
discovered that their education at Trump University
was worthless?

Jukka K. Korpela

unread,

May 12, 2016, 3:27:30 PM5/12/16

to

12.5.2016, 13:02, Stefan Ram wrote:

> When HTML was new, I thought that I should prefer it over
> PS/PDF for my own internal usage, because HTML was »open« and
> text-file based, while PS/PDF was »closed«

If you are referring to documents for your private use, or for internal
use in a company, then the topic is, strictly speaking, off-topic in all
comp.infosystems.www groups. However, over the years, non-WWW use of WWW
technologies has been discussed in these groups; but care should be
taken to distinguish WWW use from non-WWW use when relevant.

> But today I observe that I convert more and more larger
> documents into PDF for my own convenience!

Convert what type(s) of documents?

> A minor reason was that I observed that one browser was
> very slow when it had to display large HTML pages.

It depends. If you use very old-style table layout for a large document,
then you have that problem. For a simple-structure HTML document, there
is no reason a browser cannot start rendering its content fast. Setting
dimensions for images may speed things up.

> But the major point is, when I read large documents,
> I want to take a note of some bookmark within the document
> to continue to read there after an interruption.

This is a user agent issue. There is nothing in HTML that requires or
forbids such behavior.

> Actually this is not a deficiency of HTML, but of browsers.

Indeed.

On the practical side, you might consider making your document (if you
convert it anyway, from some format) an e-book in the EPUB format, which
is really just a zipped file containing an XHTML document and associated
files. This is often very easy when using the free Calibre software.
E-book readers typically have features like “remembering” your location,
or at least marking a location.

I am currently working on a document that might eventually become an
e-book distributed commercially, or a free e-book, or just a web page.
Using Calibre and the EPUB format keeps all options open. (Although
there is no benefit from using XHTML syntax for web pages, there are no
real drawbacks either. For the EPUB format, XHTML syntax must be used,
but that’s almost a triviality when using suitable software.)

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Stan Brown

unread,

May 13, 2016, 6:25:20 AM5/13/16

to

On 12 May 2016 10:02:40 GMT, Stefan Ram wrote:
> But when I have a large HTML file, it is impossible for
> me to find any reference that I could write down as an
> indication of where exactly in this large document I left
> off so that I then can continue to read it at the same
> position later (possibly even with another browser).
>

> Actually this is not a deficiency of HTML, but of browsers.

You're straining to find a problem, and overlooking at least two
solutions (often three) already available to you.

1. Use a word or phrase that's unique or nearly so, then when you
return to the document press Ctrl+F to find the word or phrase.

2. Or just notice about what percentage of the document you've
scrolled down.

3. Authors should give id= attributes on all section headers, for
this purpose among others. In Firefox, you can highlight an element
and then right-click and Inspect Element to see it. I assume most
other browsers have something similar. This isn't guaranteed, but
it's the easiest when the author has done his or her job, because you
can include the value of the id attribute right in your bookmark.

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://BrownMath.com/
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2.1 spec: http://www.w3.org/TR/CSS21/
validator: http://jigsaw.w3.org/css-validator/
Why We Won't Help You: http://preview.tinyurl.com/WhyWont

Thomas 'PointedEars' Lahn

unread,

May 13, 2016, 6:46:07 PM5/13/16

to

Stefan Ram wrote:

> I am offering my course notes on web pages. A course
> page has links to dozens of lesson pages. But that's it.
> No deeper nesting is involved. There are no sublesson pages.
>
> The participiants of my courses often are annoyed by this,
> slightly annoyed, that is. They would prefer to have it all
> within a single large file and sometimes manually copy each
> lesson into a word document. I can understand them. I would love
> to offer the course notes as a single file myself, but I did not
> yet have the time to prepare this, because - of course - I do
> not want to do this /once/, but I need to establish an /automatic
> process/ that can generate all versions of a document (multi-page
> and single-page) from a single source, so that changes to the
> single source propagate to all document versions without manual
> interaction.

You can use XML for the content, and XSLT to extract from it and present
that in any format that you like.

F'up2 <news:comp.infosystems.www.authoring.misc>

--
When all you know is jQuery, every problem looks $(olvable).

Jukka K. Korpela

unread,

May 16, 2016, 12:58:51 AM5/16/16

to

12.5.2016, 23:08, Stefan Ram wrote:

> For example, someone has published 10 HTML documents,
> each of which would have about 10 pages when printed,
> and I want to read those with a handheld device that
> I use like an ebook reader. So, I manually join the
> 10 pages to a single file and then convert this to PDF
> and copy this single PDF file to the "ebook reader" device.

Wouldn’t it be more natural to create an EPUB ebook from them and use
any normal ebook reader on any device?

>> On the practical side, you might consider making your document (if you
>> convert it anyway, from some format) an e-book in the EPUB format, which
>> is really just a zipped file containing an XHTML document and associated
>> files. This is often very easy when using the free Calibre software.
>> E-book readers typically have features like ''remembering'' your location,
>> or at least marking a location.
>

> I tried this, but I did not like it, because the
> pagination of an EPUB is not fixed.

Just like pagination of an HTML document is not fixed. This should be
regarded as a useful property, not a problem.

> Changing the
> font size or even just the device orientation or
> the reader program might give some point in the
> document a new page number.

So? What do you need page numbers for? Do you also need line numbers and
numbers of characters on a line?

> Another problem with using HTML for the kind of
> documents that I write is that browser manufacturers
> - after a phase of increasing support for MathML -
> now agains seem to be decreasing MathML support.

I’m not sure I see what you are talking about. First you mentioned that
you want to read some material that someone else has produced as a set
of HTML documents. Now you seem to be discussing something completely
different, namely authoring documents with mathematical content. The
answers to that complicated questions depend, among other things, on the
intended use of such documents. For such documents, EPUB format (as
currently defined and supported) is inadequate; it can handle reasonably
only rather simple math expressions. HTML with MathML is currently also
limited in practice, though I cannot see what you mean by decreasing
support. But tools like MathJax can produce very satisfactory results
for online use. For offline use, PDF appears to be the only feasible
solution, unless the users can be expected to have Microsoft Word 2007
or newer.

> I am offering my course notes on web pages.

Well, maybe you should have started by saying this and illustrated it
with a URL. It seems that your real problem is how to produce some
material in different formats for different types of use, automatically
generating them from some base format. This is a very broad question and
can hardly be discussed in a useful way without knowing much more about
the type of content, intended uses, etc. (Well, it *could* be discussed
as a general level, but that would mean something like writing a
voluminous book.)

> Since this post still is crossposted into the (nearly empty)
> browser newsgroup:

I don’t see why it was included initially, but thought you might have a
reason.

> The web browser "Amaya" has a menu entry "make book" (or some such)
> that reportedly can combine several HTML pages into a single
> document. I tried it and even changed all my links to the kind
> of links that were prescribed by Amaya, but it did not work.

Last time I seriously tried Amaya (was it ten years ago), it looked like
experimental software created for testing something or proving some
point, rather than software for production work. I have not heard any
news that would give a reason to give it another chance.

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Jukka K. Korpela

unread,

May 16, 2016, 2:13:47 PM5/16/16

to

16.5.2016, 20:52, Stefan Ram wrote:

> "Jukka K. Korpela" <jkor...@cs.tut.fi> writes:
>> So? What do you need page numbers for? Do you also need line numbers and
>> numbers of characters on a line?
>

> When I have read only a part of a document, I would like to
> write down the number of the page where I should continue to
> read it later. Line numbers are not needed for this.

What you are really talking about is setting up a location in a document
so that you continue from it. Page numbers are a blunt tool here. Page
*and* line numbers would be much more specific, but unnecessarily technical.

This is not about data formats (HTML vs. PDF), but about software used
to display data. E-book readers generally let you mark the current
location (or do it automatically); web browsers don’t. Using EPUB, which
is really just XHTML bundled with associated image, style, and other
files, you make your data accessible on e-book readers.

> The topic intended was to compare HTML with PDF.

Was it? Are data formats the issue? It sounds like your issue is with
software.

> I remember having read something about Chrome removing
> support for MathML to make the browser faster.

I don’t. And I think it would be odd indeed. Code that is not executed
can hardly affect the speed. Code for processing MathML is relevant only
when the document contains MathML. Removing support to MathML would
admittedly affect the *size* of browser code; this might have been
important in the early 1990s. ☺

--
Yucca, http://www.cs.tut.fi/~jkorpela/