I've also been interested in this issue.
One of my conclusions is that the only application that does a really
good job at preserving content, images, and format is an application
built for exactly that purpose (mainly surfulator). SQLnotes does a
decent job, though not quite as nice as surfulator. Some of the other
dedicated web-clip applications only do a so-so job.
You can print out to a PDF file, and the look of the site is preserved
perfectly, but you're missing all the links and real text (unless your
PDF print engine can also do OCR).
With TW, one approach is to cut and paste the source of the target web
page into your tiddler, taking everything that you want from within
the <body> tags of the target page, and pasting them between <html></
html> tags of the TW. But then the formatting will be that of TW, not
of the target page. You might be able to work around this by cutting
and pasting the stylesheet attributes for the target page into the
stylesheet tiddler, giving them an extra outer class (e.g. .mypage h
{...} ) so that they don't clash with tiddler attributes, and then
enclosing your cut and paste text in tw style enclosures (e.g. {{mypage
{<html>...</html>}
Whew! What a lot of work. And it won't be portable when offline if
depending on images for its appearance.
Thinking about it from a different view, why do I need to preserve
*everything* on a web page? What I usually want is the text, one or
two images, the original url, and maybe one or two useful links.
Frequently pages are cluttered with banner adds, links to unrelated
information, etc. Why not just capture the useful stuff, and ignore
the rest?
One (1) way to do this is to do a screen capture, copy the captured
file to a subdirectory below tw. Use a image link (with sizing feature
if necessary) to view the page. Use tiddlysnip to copy the essential
text you want and paste it below the image. If you keep the images in
a directory below your TW file, then you only have to copy one
directory to your USB drive when its time to hit the road.
Or (2), copy the text you want with tiddlysnip, save the images you
want in a TW sub-directory, link up the images, and apply any
formatting you want. Insert any links from the source document that
are useful. Tiddlysnip will already have captured the source url. A
little more work, but the result is a reference page probably more
useful than the original page, and information that is consistent
across tiddlers.
Another idea (3) that comes to mind is that you could save an entire
website in a directory below TW. There's a Tiddler plugin
(MiniBrowser ?) that will allow you to display a web page or url
inside of a tiddler. You could capture text information and insert it
into your tiddler, possibly hidden so that it can be searched. The
tiddler would use the plugin to display the site you have saved.
Like I said, I haven't settled on one solution, but these are the ones
I've been experimenting with.
-- Mark