binary content in tiddlyweb

62 views
Skip to first unread message

chris...@gmail.com

unread,
May 1, 2012, 12:29:19 PM5/1/12
to tidd...@googlegroups.com

In the early days of TiddlyWeb the concept of the tiddler was based on
tiddlers as commonly found in a TiddlyWiki: small pieces of text
operating as content, code or styling.

Only rarely would you see a single tiddler that was very large, such
as an image in a tiddler or some other kind of binary content.

Therefore TiddlyWeb was written to have optimal throughput for small
tiddlers. Binary tiddlers capability was thrown in as a useful tool
when hosting content elsewhere was either unavailable or not a great
option. Effecient handling of the content was left until it became a
problem.

With the ascendency of complex aggregations of tiddlers, including
binary, for web apps and relations, especially on TiddlySpace, the
lack of efficiency is becoming more of a problem.

My questions for this post are:

1 How important is it that binary uploads and downloads in tiddlyweb
are efficient?
2 If that is important, how important is it that store and serialization
code changes made to accomodate binaries take account of existing
tiddlyweb plugin code and maintain backward compatibility?
3 How imporant is it that binary tiddlers remain raw tiddlers or could
binary tiddlers be pointers to binary content?

The changes I have in mind for #2 would allow serialization as_tiddler
to accept either a string or a file handle (currently just a string) and
allow stores to provide or use tiddler objects (on tiddler_get and
tiddler_put) that have a text field which can be a file handle or string
(currently a string). The file handle would kick in when content
matches certain types or incoming content length is over some
configurable length.

A different option (#3) would add a layer of indirection[1] for the
creation of a binary tiddler:

* Do a raw PUT of the binary content to a separate store optimized for
binaries (direct fd -> file streaming, no in memory middle step).
* Get back a URI for the binary content.
* Create a JSON tiddler that references that URI, and does tagging and
fields.
* When doing a GET for the tiddler, the binary URI is dereferenced in
different ways (depending on the Accept header).

For tiddlers on which tags and such were not needed, just PUTting the
tiddler as a binary content type would do the steps above by proxy.

This option has a variety of problems, including questions on how
content is effectively identified, made secure, and cleaned up
appropriately when DELETEd.

This option 3 may actually just be a special kind of tiddler.type that
we want to consider, separate from the binary handling issue. A sort
of redirecting-tiddler that is useful in itself.

If binary handling needs to be hyper efficient, then something like
option 3 is needed. If what's needed is simply something that is better
than now, the earlier option ought to work quite well, alongside some
tweaking of how stores are configured (today I did some tweaking on
tiddlyspace.com which at least makes the impact of big binary PUT on
users not doing the PUT a bit less).

So. To restate the question: Who cares about binary tiddlers and in
what fashion do you care?

Thanks.

[1] All problems with computers can be solved with another layer of
indirection.
--
Chris Dent http://burningchrome.com/
[...]

Jeremy Ruston

unread,
May 1, 2012, 1:17:27 PM5/1/12
to tidd...@googlegroups.com
Hi Chris

Good questions, and timely for me.

> 1 How important is it that binary uploads and downloads in tiddlyweb
>  are efficient?

I've found that users of TiddlySpace very much value being able to
treat binary content in the same way as the rest of their content, and
with the same permission model.

To meet those users expectations, if we have the feature at all then I
think it has to be reasonably efficient. Perhaps that might mean to be
optimised for max 1MB binaries, with 10MB stretch capacity.

Maybe the server could progressively throttle transfers of large
attachments to discourage their use.

> 2 If that is important, how important is it that store and serialization
>  code changes made to accomodate binaries take account of existing
>  tiddlyweb plugin code and maintain backward compatibility?

Difficult, I don't have enough direct knowledge to venture an opinion.

> 3 How imporant is it that binary tiddlers remain raw tiddlers or could
>  binary tiddlers be pointers to binary content?

I'm very keen on the pointer idea as it enables sane treatment of big
video files, which are now quite well supported on the web and in
HTML5. It's something I want to do anyway for TW5, and it would be
terrific to do it in step with TiddlyWeb.

I'm with you in thinking of the feature as potentially a general
purpose federated content aliasing mechanism.

I think it needs to be separate from the existing tiddler type
mechanism, because it would be better if the type of an alias tiddler
could be used to reflect the type of the remote resource. A simple
possibility is the use of a "src" field that contains the URL of the
content, with the "text" field perhaps containing a placeholder to be
used while the real content is loading. Is that the kind of thing you
were thinking?

Best wishes

Jeremy
> --
> You received this message because you are subscribed to the Google Groups
> "TiddlyWeb" group.
> To post to this group, send email to tidd...@googlegroups.com.
> To unsubscribe from this group, send email to
> tiddlyweb+...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/tiddlyweb?hl=en.
>



--
Jeremy Ruston
mailto:jeremy...@gmail.com

PMario

unread,
May 3, 2012, 5:10:11 AM5/3/12
to TiddlyWeb
On May 1, 6:29 pm, chris.d...@gmail.com wrote:
> So. To restate the question: Who cares about binary tiddlers and in
> what fashion do you care?
I do care in the way, that my personal upload / donwload experience
should be good (same or better as now), even if someone else throws
tons of unoptimized pictures and videos towards TS.

>* Do a raw PUT of the binary content to a separate store optimized for
> binaries (direct fd -> file streaming, no in memory middle step).
>* Get back a URI for the binary content.
>* Create a JSON tiddler that references that URI, and does tagging and
> fields.
>* When doing a GET for the tiddler, the binary URI is dereferenced in
> different ways (depending on the Accept header).
>
>For tiddlers on which tags and such were not needed, just PUTting the
>tiddler as a binary content type would do the steps above by proxy.
This looks good to me. I'm just not sure, how a offline version would
look like.

-m

EduardWagner

unread,
May 3, 2012, 12:07:10 PM5/3/12
to tidd...@googlegroups.com
Hi Chris,

very good question and it would really nice to have this improved. 

1 How important is it that binary uploads and downloads in tiddlyweb
   are efficient?

We use a lot of binary tiddlers like images and pdfs-documents in our tiddlywebs.
A client side option to upload multiple files at once would be great!
And sometimes i dream of taking a screenshot and post it directly to tiddlyweb!!!
 
2 If that is important, how important is it that store and serialization
   code changes made to accomodate binaries take account of existing
   tiddlyweb plugin code and maintain backward compatibility?

As we store tousands of binary tiddlers!!!
 
3 How imporant is it that binary tiddlers remain raw tiddlers or could
   binary tiddlers be pointers to binary content?

 
A pointer to binary content seems to be enough. If it helps to reduce loading times even more!
 
A different option (#3) would add a layer of indirection[1] for the
creation of a binary tiddler:

* Do a raw PUT of the binary content to a separate store optimized for
   binaries (direct fd -> file streaming, no in memory middle step).
* Get back a URI for the binary content.
* Create a JSON tiddler that references that URI, and does tagging and
   fields.
 * When doing a GET for the tiddler, the binary URI is dereferenced in
   different ways (depending on the Accept header).

That sounds GREAT!!!
A referencing tiddler for binary content, to be able for tagging, using custom fields and alias.
And if i publish this reference to a different bag, i can use the same binary content if the permission is given.
Rendering of binary content as embedded images or linked files should be possible.

It would give us great pleasure!
Bye Edi

Ben Gillies

unread,
May 3, 2012, 1:34:00 PM5/3/12
to tidd...@googlegroups.com
I meant to reply to this a while ago but never got round to it, so...


> 1 How important is it that binary uploads and downloads in tiddlyweb
>  are efficient?

How much are binary tiddlers slowing things down by on different
TiddlyWeb instances (e.g. tiddlyspace.com) in general? If large binary
tiddlers slow things down a lot, but aren't generally PUT/GET very
often, then it's likely not that important, on the other hand, if they
are, then it is.

> 2 If that is important, how important is it that store and serialization
>  code changes made to accomodate binaries take account of existing
>  tiddlyweb plugin code and maintain backward compatibility?

I'm not sure there's _that_ much code that exists that deals with
binary tiddlers directly, so I'm not sure it would be _that_ much
effort to patch things up. From my point of view, I'm quite happy to
make any necessary changes to my various plugins and push them out at
the same time to keep things working if it's needed.

> 3 How imporant is it that binary tiddlers remain raw tiddlers or could
>  binary tiddlers be pointers to binary content?

They could be, we'd likely have to make several changes to the various
clientside plugins that deal with them (not _that_ big a deal). Though
IIRC, large binary tiddlers already appear as links anyway.

> The changes I have in mind for #2 would allow serialization as_tiddler
> to accept either a string or a file handle (currently just a string) and
> allow stores to provide or use tiddler objects (on tiddler_get and
> tiddler_put) that have a text field which can be a file handle or string
> (currently a string). The file handle would kick in when content
> matches certain types or incoming content length is over some
> configurable length.
>
> A different option (#3) would add a layer of indirection[1] for the
> creation of a binary tiddler:
>
> * Do a raw PUT of the binary content to a separate store optimized for
>  binaries (direct fd -> file streaming, no in memory middle step).
> * Get back a URI for the binary content.
> * Create a JSON tiddler that references that URI, and does tagging and
>  fields.
> * When doing a GET for the tiddler, the binary URI is dereferenced in
>  different ways (depending on the Accept header).

What do you mean binary URI? Are you talking about a separate URI, or
just treating the standard URI in different ways depending on the
Accept header (e.g. request as JSON, txt, etc -> retrieve that
standard tiddler, request as default, image/jpeg, etc -> retrieve from
the binary store). If the latter, that seems fairly sensible.

> So. To restate the question: Who cares about binary tiddlers and in
> what fashion do you care?

I care to the extent that I want to be able to upload images to
TiddlySpace without abusing things too much (that is, it's possible
now, but better to use something else and hotlink (assuming the
something else permits that of course)). I don't really do video, pdf,
doc, etc, but if TiddlyWeb is supposed to hold notes, ideas, whatever,
then I think it's important not to restrict the format that the notes
come in.


Ben

chris...@gmail.com

unread,
May 4, 2012, 9:16:51 AM5/4/12
to tidd...@googlegroups.com
On Thu, 3 May 2012, EduardWagner wrote:

> very good question and it would really nice to have this improved.

If you could provide more detail about how things are not good, that
would be useful.

> We use a lot of binary tiddlers like images and pdfs-documents in our
> tiddlywebs.
> A client side option to upload multiple files at once would be great!
> And sometimes i dream of taking a screenshot and post it directly to
> tiddlyweb!!!

If you have a modern browser it turns out that multiple file drag and
drop file upload is not too hard, we started a bit of test work in
that area last week:

http://cpie.tiddlyspace.com/filedrop.js

chris...@gmail.com

unread,
May 4, 2012, 9:41:18 AM5/4/12
to tidd...@googlegroups.com
On Thu, 3 May 2012, Ben Gillies wrote:

> How much are binary tiddlers slowing things down by on different
> TiddlyWeb instances (e.g. tiddlyspace.com) in general? If large binary
> tiddlers slow things down a lot, but aren't generally PUT/GET very
> often, then it's likely not that important, on the other hand, if they
> are, then it is.

It's sort of a matter of what people want to be able to do. The
current situation on tiddlyspace.com is that the inefficiency is noise
in the analysis done to understand usage patterns. The slowdowns we
saw while at the hothouse should be less of a problem now that I've
made some changes to how the content is saved into the database. Still
causes load, but not load the current user agent should notice.

>> 2 If that is important, how important is it that store and serialization
>> �code changes made to accomodate binaries take account of existing
>> �tiddlyweb plugin code and maintain backward compatibility?
>
> I'm not sure there's _that_ much code that exists that deals with
> binary tiddlers directly, so I'm not sure it would be _that_ much
> effort to patch things up. From my point of view, I'm quite happy to
> make any necessary changes to my various plugins and push them out at
> the same time to keep things working if it's needed.

The issue is that it might be useful to change the serialization api
so that the as_tiddler method takes either a string or a filehandle,
and that change might need to be reflected across various
serializations.

It's a bit not correct for me to say binary tiddlers. What I really
means is "tiddlers which are really large". That these are usually
binary is coincidence.

>> 3 How imporant is it that binary tiddlers remain raw tiddlers or could
>> �binary tiddlers be pointers to binary content?
>
> They could be, we'd likely have to make several changes to the various
> clientside plugins that deal with them (not _that_ big a deal). Though
> IIRC, large binary tiddlers already appear as links anyway.

That's only in TiddlyWiki. An important aspect of this discussion is
that binary tiddler handing really only matters _outside_ of
TiddlyWiki as that's the only place (currently) where you'd want to
mess with them directly (in TiddlyWiki links are better).

The main issue is that a large tiddler (greater than 10s of MB) gets
read into memory during a PUT or a GET. That's not ideal. Coming up
with ways to avoid that is _required_ if (and only if) people want to
store large tiddlers _in_ TiddlyWeb.

My read on what's been said so far is that people are fairly keen on
being able to have them in the same place, with the same API, but
would be willing to accept a layer of indirection if that was
necessary.

> What do you mean binary URI? Are you talking about a separate URI, or
> just treating the standard URI in different ways depending on the
> Accept header (e.g. request as JSON, txt, etc -> retrieve that
> standard tiddler, request as default, image/jpeg, etc -> retrieve from
> the binary store). If the latter, that seems fairly sensible.

I mean that binaries are stored in some kind of auxilliary storage,
which provides a URI for the content. The tiddler which operates as a
pointer to that content has a field of "_uri" or something like that
which points into the aux storage.

What gets sent to the client depends on the accept header. If default
is request, then the request redirects to the binary URI (so the aux
storage can delivery the content efficiently (i.e. without a big use
of memory)). If something else, like JSON, you'd get a tiddler which
has a pointer field to the binary content.

The details are still a bit fuzzy.

I'd prefer not to mess with the existing external data structures if
at all possible.

> I care to the extent that I want to be able to upload images to
> TiddlySpace without abusing things too much (that is, it's possible
> now, but better to use something else and hotlink (assuming the
> something else permits that of course)). I don't really do video, pdf,
> doc, etc, but if TiddlyWeb is supposed to hold notes, ideas, whatever,
> then I think it's important not to restrict the format that the notes
> come in.

This is pretty much what I think too.

chris...@gmail.com

unread,
May 4, 2012, 10:01:13 AM5/4/12
to tidd...@googlegroups.com
On Tue, 1 May 2012, Jeremy Ruston wrote:

> To meet those users expectations, if we have the feature at all then I
> think it has to be reasonably efficient. Perhaps that might mean to be
> optimised for max 1MB binaries, with 10MB stretch capacity.

This is one of those situations where fixing the problem for medium
sized things pretty much fixes it for big things too.

In addition to what I said to Ben about reducing memory eating, the
other fix is making sure that however big tiddlers are being written,
it is done in a way that doesn't block.

In TiddlySpace right now, it blocks because the tiddler's text field
is written to a table that is still MyISAM (in order to support the
search index). If binary content was written somewhere else, this
wouldn't be as much of an issue.

However in order to fix that one small issue means yet _another_
database migration, and those are getting rather tiresome so if it's
gonna happen I want to make sure that lots of bases are covered.

> I think it needs to be separate from the existing tiddler type
> mechanism, because it would be better if the type of an alias tiddler
> could be used to reflect the type of the remote resource. A simple
> possibility is the use of a "src" field that contains the URL of the
> content, with the "text" field perhaps containing a placeholder to be
> used while the real content is loading. Is that the kind of thing you
> were thinking?

I reckon these sorts of things would emerge out of implementation. As
with so many of these things what 'type' means at the various levels
is a bit messy:

* in the serializations
* in the stores
* in HTTP request and response headers
* in the JSON representation
* in TiddlyWiki

Ben Gillies

unread,
May 4, 2012, 10:05:51 AM5/4/12
to tidd...@googlegroups.com
>> I'm not sure there's _that_ much code that exists that deals with
>> binary tiddlers directly, so I'm not sure it would be _that_ much
>> effort to patch things up. From my point of view, I'm quite happy to
>> make any necessary changes to my various plugins and push them out at
>> the same time to keep things working if it's needed.
>
> The issue is that it might be useful to change the serialization api
> so that the as_tiddler method takes either a string or a filehandle,
> and that change might need to be reflected across various
> serializations.

Sure, that's pretty much what I thought. From my point of view,
tiddlywebplugins.form would likely be somewhat simpler if file handles
could be passed into the serialization directly, as it already needs
to reference one. As to other serializations, how many are there?

>> What do you mean binary URI? Are you talking about a separate URI, or
>> just treating the standard URI in different ways depending on the
>> Accept header (e.g. request as JSON, txt, etc -> retrieve that
>> standard tiddler, request as default, image/jpeg, etc -> retrieve from
>> the binary store). If the latter, that seems fairly sensible.
>
>
> I mean that binaries are stored in some kind of auxilliary storage,
> which provides a URI for the content. The tiddler which operates as a
> pointer to that content has a field of "_uri" or something like that
> which points into the aux storage.
>
> What gets sent to the client depends on the accept header. If default
> is request, then the request redirects to the binary URI (so the aux
> storage can delivery the content efficiently (i.e. without a big use
> of memory)). If something else, like JSON, you'd get a tiddler which
> has a pointer field to the binary content.

That sounds reasonable to me.
Reply all
Reply to author
Forward
0 new messages