remoteStorage.js v0.7 developer preview

32 views
Skip to first unread message

Michiel de Jong

unread,
Jun 27, 2012, 12:38:09 PM6/27/12
to unhosted
Hi!

Since the changes in v0.7 of remoteStorage.js are quite drastic, it's
both taking me a while to finish, and also makes it basically useless
to keep putting a lot of time in you guys developing your apps with
the v0.6 library.

That's why i compiled a developer preview of v0.7. It already has the
API changes in there, and it will work in itself (as far as i have
tested on todomvc), but connecting to remoteStorage is disabled.

The syncing to remoteStorage is now entirely independent from how your
app interacts with the library. That's why you can already develop
your apps against the developer preview, and then when we have the
proper version of v0.7 ready, you can drop it in and everything should
work.

The way it works now is as follows:
https://github.com/unhosted/remoteStorage.js/tree/v0.7/example/todomvc

Let me know what you think! In the meantime, i'll keep working on the
asynchronous synchronization.


Cheers,
Michiel

shybyte

unread,
Jun 28, 2012, 2:26:52 PM6/28/12
to unho...@googlegroups.com
It's great to see such high level apis and automatic synchronization. 
But I wonder if it will be still possible to use the raw low level API used from version 0.6.x or at least something similar ?
Will the new API be backward compatibly to the old one? 

Because even if the automatic synchronization works great for most use cases, there will be always cases where you need maximal control in order to create innovative fast application.

(Actually I started adding some selfmade localStorage caching to my app Shared Stuff last week...  So maybe I'm just afraid of having to throw it away and rework everything.)


Cheers, Marco

Michiel de Jong

unread,
Jun 28, 2012, 3:27:45 PM6/28/12
to unho...@googlegroups.com
they idea is that you would then move your code into a 'stuff' module,
and publish it as a module for other apps to use.

Modules expose a high-level API, but they use the baseClient API
themselves, which is still not very settled, but it contains
synchronous get and set methods, a 'change' event, and an asynchronous
get method, which should do most of the things you need to do.

The idea is that modules should only store data in the baseClient, and
never in localStorage. The baseClient also takes care of mime types
and of json-ld (although most of this is not implemented yet).

If you look at the 'tasks' and 'documents' module, you see they get by
without using localStorage. They only store private data, but storing
public data should be easy to add.

What functionality are you missing in the baseClient that makes you
want to do a Stuff-specific sync? Maybe it's syncing just one element
in a directory? or choosing which elements should be expunged from the
cache? (right now that's not implemented, but i was thinking doing LRU
on implicitly synced items, and then throw an error if the explicitly
synced items already fill up the localStorage capacity).

IMO, cloud sync can be done generically, and we don't have to write
custom sync logic into each module, but maybe you have some specific
needs that i hadn't considered. Even then, i think it's better to add
them to the baseClient than adding them to a specific module.

Cheers,
Michiel

shybyte

unread,
Jun 28, 2012, 5:32:05 PM6/28/12
to unho...@googlegroups.com
I completely appreciate the benefits of the module concept and automatic sync and I'm looking forward to use it.
But my experience with frameworks is, that if they try to save you from some hassle by guiding your path and abstract things away, they can actually cause big headaches if you try to do something that the frameworks creator has not foreseen (and nobody can foresee everything or should even try to do it).  
This is especially a real pain in the ass, if the thing you want to achieve, is something that would be totally easy without the framework. I really prefer frameworks/libraries that give me access to the metal if I need it.

What I like at remoteStorage 0.6.x is that it's so stupid simple. Just KISS. It's absolutely easy to understand whats going on in the background. It's like a simple stone from which you can build a little hut or a pyramid by just knowing it and some javascript.

Here are just some ideas what crazy people might want to do:
* Apps with lot's of big data that want to work partially offline could need fine grain control about what is saved in the limited localStorage (like e.g. image apps)
* If you want collect lot's of data from several other public remoteStorages, I want prioritize which is cached and how long and which needs to be updated when. For example stuff from close friends, which update their stuff frequently must be handle differently than the stuff from other people which update their stuff never.
* Optimal cache/refresh time can be different for different pages in the app. For example if I want to see the profile and stuff from one friend, it's no problem to update it with a very short cache time but on a page that shows aggregated stuff from 100's of people I want maybe a different cache time.
* Some stuff might be more secret than others and should not be saved in the localStorage after an logout while other public stuff can be saved forever.

This are just a few examples (that i implemented or might want to implement later) and  sure ... it might be possible that some of this is already possible with the new  API or it could be added somehow, but in the end these are just examples (that don't needs to be discussed here deeply) showing that there is always someone who needs more direct control. 

I guess it's cheap to add this raw access to the official 0.7 API. So why not?

Cheers, Marco

Michiel de Jong

unread,
Jun 29, 2012, 2:10:06 AM6/29/12
to unho...@googlegroups.com
ok, these are good points. let's try to break it down.

first of all, we need to have clear what the library does, and what it
exposes, then per point we can see what should be configurable and
what not:

- widget:
- displayWidget - we probably want to offer two choices: just
indicate which div should display the widget, or get direct access to
the session so you can design your own 'connect your remote storage'
widget
- module management:
- defineModule and loadModule - these are quite basic; maybe we want
to compile versions of the library that have modules loaded in by
default (right now 'tasks-0.1' and 'documents-0.1' are both included
by default). but that's an option of the build script, not of the
library at runtime.

- webfinger+hardcoded discovery:
- i don't think it's desirable that different apps respond
differently to certain user addresses, so i don't think there should
be any configurable parts here. Maybe the xhr timeout should be
controlled in response to the device, but that's not app-specific, so
if that's necessary, the library should do that itself. one exception
i can think of is during testing, you might want to 'spike' the
discovery with a test account, but then you might as well just spike
the session so that you're already logged in with that test account
(and don't need to do the discovery)

- wireClient:
- the wireClient takes care of abstracting all the difference
between webdav, couchdb and getputdelete. This is i think the part we
agree most on should be in the library and hidden from view. of the
modules and the apps.

- what to sync explicitly:
- the module can tell the baseClient which directories or items
should be kept in sync proactively. The baseClient will start fetching
data into the cache (localStorage) so that it's ready for synchronous
access. we can make this control as detailed as we want, since we are
only exposing it to modules, not to apps.

- what to cache implicitly:
- whenever the module gets or sets some data, by default it is added
to the cache implicitly. we can of course make this optional with an
extra parameter.

- what to expunge from cache:
- i was thinking by default we want to expunge implicitly cached
data on a LeastRecentlyUsed basis (maybe influenced by item size), and
fire an error event if the cache (localStorage) is full with
explicitly cached data.

- when to pull:
- i was thinking by default once a minute while the window has
focus, and once every 10 minutes when not. but we can of course make
this as configurable as we want, per key, and separately for in-focus
and off-focus.

- when to push:
- there are 3 caching levels: window (in js globally scoped memory),
device (localStorage), cloud (remoteStorage if connected). Pushing
from memory to device is instant (actually, even synchronous) right
now. I was thinking we probably want to do this once a second by
default, but we can allow the module to set this per path if we want.
pushing from device cache to server i was thinking of once every 10
seconds by default whenever there is something to push, but again, we
can make this configurable per path

- read/write access to the user's and other users' storage while
totally bypassing all caching layers:
- the way i had it in my head right now, you can do this for read by
using the asynchronous get function and specifying the dont-cache
option. For writing it's a bit more complicated, since all writes are
queued. but we can allow a module to specify 'expungeAfterWrite', that
actually sounds useful. but as you say, it would be super easy to also
expose an asynchronous non-cached write.

- json-ld and mimetypes:
- this is still under construction, but i want the baseClient to
automatically add @context fields into data objects, pointing to
per-module data format specs that are automatically published using a
sort of docgen/docco-like system. i think it's important that modules
document how they store data, and i think we don't want to have any
modules with undocumented data formats. That's why right now the
baseClient exposes storeObject and storeObject and storeMedia. But of
course if you don't want to use these things, just stringify your
objects in the module, and then use storeMedia with mime-type
application/json or even application/octet-stream whenever you want
raw access to the wire data.

- directory listings:
- it's quite tricky how directory listings work on webdav, couchdb
and getputdelete, and i haven't even started properly coding all of
that. right now, what the baseClient exposes is only the cached data
(which includes directory listings with modified-times), and it will
tell you if there are pending outgoing changes on a certain path, and
how long ago the last check for incoming changes took place on a
certain path (this is not implemented yet). What i don't think we want
to expose to the module is access to tinkering with modified times and
directory listings. i think those should "just work", or if someone
has a use case for tinkering with them (hidden files? unsynchronized
files?) then we should maybe look at how we can add that functionality
generically.

- other couchdb and webdav functionality:
- afaik we're exposing most of webdav's functionality, but couchdb
has some specific things that the other api's don't have, like
server-side map-reduce and bulk updates. i think where and when these
are useful, the baseClient should for an abstraction layer for those
with fallbacks so that each functionality always works either for
everybody or for nobody. i can think of no case where exposing this to
the modules would make sense.

- exposing direct access to the wire data:
this is basically covered in the 'json-ld and mimetypes' and
'directory listings' points above.


i think what you're mainly after is asynchronous read and asynchronous
write without caching, right? I'll make sure i'll add both of those to
the baseClient. about the caching strategy for things you do cache,
i'll also make that overridable. and then maybe the ability to
override the widget design would be the third important thing? let me
know if you need anything else, and i'll simply add it.

again, this is talking about functionality exposed from the baseClient
to the modules, each module makes its own decisions about what it
exposes to apps. But my idea is that when you implement for instance
SharedStuff from scratch, 50% of your code would be writing the
reusable 'stuff' module, and 50% would be the actual app, which would
by then be only a presentation layer on top of the reusable module.
does that make sense?

cheers,
Michiel

shybyte

unread,
Jun 29, 2012, 4:23:25 AM6/29/12
to unho...@googlegroups.com
>i think what you're mainly after is asynchronous read and asynchronous 
>write without caching, right? I'll make sure i'll add both of those to 
>the baseClient. about the caching strategy for things you do cache, 
>i'll also make that overridable. and then maybe the ability to 
>override the widget design would be the third important thing? let me 
>know if you need anything else, and i'll simply add it. 

Yes, Yes and Yes! Absolutely great!

If I understand right you proposed 3 values for caching DONT_CACHE,IMPLICIT(default),EXPLICIT. I really like that.
To expunge of cache:

I think it makes additionaly sense to have the option to explizitly remove things from cache. 
For example when the  the exception for cache-is-full occurs, or if the application ends.
It might make also sense to configure a limit of the cache size,
if the app needs a fraction of the localStorage for other puposes.
Another point that came to my mind thinking about caching and pulling timer:

I would hate an mobile picture app, which synchronzies 5MB of cached images every 1 minute 
while I'm in hollyday where I pay 1 Euro per MB. Offcourse we could try to integrate all such stuff into the lib, but we should give the app an easy option to configure it by itself.


>again, this is talking about functionality exposed from the baseClient 
>to the modules, each module makes its own decisions about what it 
>exposes to apps. But my idea is that when you implement for instance 
>SharedStuff from scratch, 50% of your code would be writing the 
>reusable 'stuff' module, and 50% would be the actual app, which would 
>by then be only a presentation layer on top of the reusable module. 
>does that make sense? 

This makes absolutely sense, but IMHO we should not force the developer to create such an reusable module.
If someone want's to write a simple app or just don't care about architecture he should have the options to grab the basic client and call it's get/store methods directly. I think this is important for easy adoption of remoteStorage by developers. 
I could image for example 3 tutorials on the website :

* Use BasicClient directly 
* Use an existing module
* Create your own module


Cheers, Marco

Michiel de Jong

unread,
Jun 29, 2012, 5:50:56 AM6/29/12
to unho...@googlegroups.com
On Fri, Jun 29, 2012 at 11:23 AM, shybyte <shy...@googlemail.com> wrote:
> This makes absolutely sense, but IMHO we should not force the developer to
> create such an reusable module.
> If someone want's to write a simple app or just don't care about
> architecture he should have the options to grab the basic client and call
> it's get/store methods directly. I think this is important for easy adoption
> of remoteStorage by developers.
> I could image for example 3 tutorials on the website :
>
> * Use BasicClient directly
> * Use an existing module
> * Create your own module
>

ok, i'll keep that in mind. have to work out how the data versioning
would work, we probably never want two different module versions
writing on the same data (but reading should be allowed). so maybe we
should have tasks/0.1/todos/ for the the todos list app that uses
tasks module version 0.1. and then you could define tasks/0.2/... and
tasks/custom-whatever/... if people do

remoteStorage.getBaseClient('tasks', 'whatever', function(baseClient) {
//have a baseClient available here that gives access to the
tasks/custom-whatever/... subdirectory
})

to store tasks using the base client directly, and

remoteStorage.loadModule('tasks', '0.1', 'rw');

to just load the existing module and access tasks/0.1/..., and

remoteStorage.defineModule('tasks', '0.2', function(baseClient) { ...});
remoteStorage.loadModule('tasks', '0.2', 'rw');

to define your own module and access tasks/0.2/..., but then we should
make sure only one person is in control of publishing versions of any
given module, so we don't get multiple clashing versions 0.2 of the
same module

shybyte

unread,
Jun 29, 2012, 10:13:35 AM6/29/12
to unho...@googlegroups.com
Data versioning. OMG. It's always a potential nightmare.

Actually as a user I don't want to use a app that says: 

I can import tasks with version 0.3, 0.4 and maybe v.super-duper.0815 . If you have tasks with different versions you are out of luck. If you want want to use the  tasks created by this app in other apps, you are probably also out of luck as I use a super improved task format, that is not very common yet.

As a developer I would be afraid that I have to support X tasks versions to import and to export, if users want to use my app. 

Unfortunately I haven't a real solution for this problem. 

My "solution" for  shared stuff is as follows for now:
* The program is able to deal with nearly all kind of missing data/attributes.
* The program tries to keep all data/attributes which it does not know and saves them also again when stuff is edited.

I think with these 2 rules you can go very far. 

For example if someones would make a fork and adds a "price" attribute to stuff, then he should be prepared for the case that this price is missing.
On the other hand it's still possible to edit this stuff with an older/different shared stuff version as this additional attribute is preserved and saved.

It should be possible to design the data format for task in such a flexible way, that be keeping the 2 rules in mind we get no problems in the near future.

Obviously this approach does not work for heavy data format changes.In in this case i would prefer to have a  version marker directly in the data and the app has to deal with it. But I would not like to keep the different version in different directories. I mean: Do you store different versions of documents on your local file system in different folders named after the version of the document format?

It could easily destroy all advantages of having a common "task" module if every version stores their data in different directories/files.

I also wonder how your proposed version system should work together with general modules like "documents". 

These all are just thoughts. It's really a tough problem. 

Cheers, Marco

Melvin Carvalho

unread,
Jun 29, 2012, 10:34:31 AM6/29/12
to unho...@googlegroups.com
On 29 June 2012 16:13, shybyte <shy...@googlemail.com> wrote:
Data versioning. OMG. It's always a potential nightmare.

Actually as a user I don't want to use a app that says: 

I can import tasks with version 0.3, 0.4 and maybe v.super-duper.0815 . If you have tasks with different versions you are out of luck. If you want want to use the  tasks created by this app in other apps, you are probably also out of luck as I use a super improved task format, that is not very common yet.

As a developer I would be afraid that I have to support X tasks versions to import and to export, if users want to use my app. 

Unfortunately I haven't a real solution for this problem. 

My "solution" for  shared stuff is as follows for now:
* The program is able to deal with nearly all kind of missing data/attributes.
* The program tries to keep all data/attributes which it does not know and saves them also again when stuff is edited.

I think with these 2 rules you can go very far. 

For example if someones would make a fork and adds a "price" attribute to stuff, then he should be prepared for the case that this price is missing.
On the other hand it's still possible to edit this stuff with an older/different shared stuff version as this additional attribute is preserved and saved.

It should be possible to design the data format for task in such a flexible way, that be keeping the 2 rules in mind we get no problems in the near future.

Obviously this approach does not work for heavy data format changes.In in this case i would prefer to have a  version marker directly in the data and the app has to deal with it. But I would not like to keep the different version in different directories. I mean: Do you store different versions of documents on your local file system in different folders named after the version of the document format?

It could easily destroy all advantages of having a common "task" module if every version stores their data in different directories/files.

I also wonder how your proposed version system should work together with general modules like "documents". 

These all are just thoughts. It's really a tough problem. 

It's a good point ... data versioning can be tricky.  I wonder if it's worth recording the version of remoteStorage used (e.g. v0.7) when saving data?
 

shybyte

unread,
Jun 29, 2012, 12:30:27 PM6/29/12
to unho...@googlegroups.com
> I wonder if it's worth recording the version of remoteStorage used (e.g. v0.7) when saving data?

I would say no. For me  remoteStorage is somehow orthogonal to the saved data. If we need to save the remoteStorage version in order to deal later with older data then this would be a sign that something went conceptually wrong.
If someone needs to store version markers, then the modules (maybe). But this is different form the remoteStorage version.

I wonder if something quite specific like tasks should be baked into the remoteStorage library ? Or is it just meant as an example?
If there is the need to bake something in the library I would expect something more generic like e.g.: documents, images or contacts.

Another question for the documents module:
Is a document anything in the documents folder? Independent of the file format. Would sound logical for me.
If yes:
 It would be great if it would be possible to search by file type.

Another API question:
Will the API allow to build a generic "file browser" or a "file selector dialog"?

Cheers, Marco

Michiel de Jong

unread,
Jul 2, 2012, 8:09:40 AM7/2/12
to unho...@googlegroups.com
hi!

On Fri, Jun 29, 2012 at 7:30 PM, shybyte <shy...@googlemail.com> wrote:
> I wonder if something quite specific like tasks should be baked into the
> remoteStorage library ? Or is it just meant as an example?

the idea is that tasks are actually quite an important generic type of
data in productivity tools. so it would potentially include bug
tickets, purchase orders, all that kind of things.

i think for example 10 modules that we want to do at first could be
contacts, calendar, photos, documents, tasks, stuff (that one is
actually rather specific but SharedStuff needs it), money (for
Opentabs as well as for our own in-house project expenses app), music,
bookmarks, apps (list of which app icons are on your launch screen,
see Mozilla AITC), and that would already cover a lot of things IMO,
and tasks does not misstand in that list i think.

> Another question for the documents module:
> Is a document anything in the documents folder? Independent of the file
> format. Would sound logical for me.
> If yes:
> It would be great if it would be possible to search by file type.

yes, sure. in particular, i want to add remoteStorage to unhosted
svg-edit, and those would be stored in documents as drawings.

>
> Another API question:
> Will the API allow to build a generic "file browser" or a "file selector
> dialog"?

yes, i just added a 'root' module to the design for that:

https://github.com/unhosted/website/wiki/Api

i also pre-emptively removed all references to user addresses from
there, because to me it looks more and more like remoteStorage should
have per-domain discovery and not per-user discovery.

About data versioning, i think there are four ways in which a module
attaches semantics to the data it stores:

- file path and file name. for instance, you may have a folder
photos/2012/05/26/ containing photos made on 26 May 2012. And if one
of them is called photos/2012/05/26/Sunset\ by\ the\ Lake, then the
module could interpret that as 'Sunset by the Lake' being the title of
the photo.

- mime type: like we just discussed, the module could treat a
text/plain document differently from an image/svg document.

- guessing: if a text document starts with a line "<html>" then the
module might treat it as an html document, even though in theory it
could be just a text file that by accident starts with those
characters.

- json-ld type: i'm baking this into the base client, so that all
modules are forced to use it. so on storage, each object has an @type
field which tells us how to interpret its fields.

Mime types are themselves already defined by a registry; for assigning
meaning to file paths and file names, and defining its own object
types, i'll add special fields to the defineModule() method. In fact,
the storeObject method has an obligatory parameter 'type' which can be
used for data versioning however each module sees fit.

As long as all modules choose their special file paths, file names and
object type names sensibly, i think we can have just one overall data
version that can last forever. If ever a change in remoteStorage.js or
in any specific modules makes a change in data formats necessary, then
the new version will have to use non-clashing file paths and object
types.
Reply all
Reply to author
Forward
0 new messages