Status of droplet VFS abstraction

79 views
Skip to first unread message

Marco van Wieringen

unread,
Mar 1, 2014, 4:00:33 AM3/1/14
to scalit...@googlegroups.com
Hi,

First some info we (Bareos) are developing a fork of the Bacula backup project and as such we want to add
a cloud storage backend for storing backup data in the cloud using libdroplet. We published the initial code
this week that perform the most basic operations using the VFS api of libdroplet.

While working on the support in bareos for libdroplet we mostly tested against the POSIX backend of
libdroplet and we forked the original droplet library on github to make some changes as we ran into some
problems with at least the POSIX backend. It reopens the file every time using a creat() call which leads
to truncating the file every time which is kind of deadly, also we had some problems with the reading of
data. We plan on upstreaming these patches via a pull request after we have discussed a bit more if the
changes make any sense and take the direction the droplet project want to take things.

The stuff can be found on https://github.com/bareos/Droplet

We are currently testing the S3 backend using both real Amazon storage and RIAK-cs S3 storage but
are seeing some strange things with the VFS api there. As far as we can see we do things right e.g.
dpl_open the file in S3 and use dpl_pwrite at the right offset but what seems to be happening is that it
overwrites the first block of data on S3 all the time. So we were wondering in what state the VFS
abstraction currently is as it seems things have been refactored quite a bit over the last months. Is
it supposed to work and are we doing something wrong or is the dpl_pwrite on S3 known not to work
and if so is there any work going on fixing it.

-- 
Marco van Wieringen
Bareos GmbH & Co. KG
http://www.bareos.com

Vianney Rancurel

unread,
Mar 3, 2014, 4:44:45 AM3/3/14
to marco.van...@gmail.com, scalit...@googlegroups.com

Hi Marco,

I'm glad to know you're using Droplet.

My answers below:

----- Original Message -----
> From: "Marco van Wieringen" <marco.van...@gmail.com>
> To: scalit...@googlegroups.com
> Sent: Saturday, March 1, 2014 10:00:33 AM
> Subject: [Scality-SCOP] Status of droplet VFS abstraction
>
>
> Hi,
>
> First some info we (Bareos) are developing a fork of the Bacula
> backup project and as such we want to add
> a cloud storage backend for storing backup data in the cloud using
> libdroplet. We published the initial code
> this week that perform the most basic operations using the VFS api of
> libdroplet.
>
> While working on the support in bareos for libdroplet we mostly
> tested against the POSIX backend of
> libdroplet and we forked the original droplet library on github to
> make some changes as we ran into some
> problems with at least the POSIX backend. It reopens the file every
> time using a creat() call which leads
> to truncating the file every time which is kind of deadly, also we
> had some problems with the reading of
> data. We plan on upstreaming these patches via a pull request after
> we have discussed a bit more if the
> changes make any sense and take the direction the droplet project
> want to take things.
>

You can simply replace the creat() by a non-destructive open()


> The stuff can be found on https://github.com/bareos/Droplet
>
> We are currently testing the S3 backend using both real Amazon
> storage and RIAK-cs S3 storage but
> are seeing some strange things with the VFS api there. As far as we
> can see we do things right e.g.
> dpl_open the file in S3 and use dpl_pwrite at the right offset but
> what seems to be happening is that it
> overwrites the first block of data on S3 all the time. So we were
> wondering in what state the VFS
> abstraction currently is as it seems things have been refactored
> quite a bit over the last months. Is
> it supposed to work and are we doing something wrong or is the
> dpl_pwrite on S3 known not to work
> and if so is there any work going on fixing it.
>

You are right, the VFS API was refactored to implement writes as "PUT range".
The only way S3 may support "PUT ranges" is via multipart-upload API which is not
yet implemented.

The other thing is that multipart-upload is a destructive operation (the whole file has to be replaced).
Anyways if you are puting the whole file every time this is OK.

It shall not be so hard to implement: In the dpl_s3_put() backend function, if range is not NULL then it shall be PUT as a "part".
See specification here: http://aws.typepad.com/aws/2010/11/amazon-s3-multipart-upload.html

We would be interested if you do it
Regards
Vianney




> --
> Marco van Wieringen
> Bareos GmbH & Co. KG
> http://www.bareos.com
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "SCOP" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to scality-scop...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out .
>

Marco van Wieringen

unread,
Mar 3, 2014, 10:14:15 AM3/3/14
to scalit...@googlegroups.com, marco.van...@gmail.com, vianney....@scality.com
Op maandag 3 maart 2014 10:44:45 UTC+1 schreef Vianney Rancurel:
That is something we also did but reading the next of the answer it may be
that things work now as designed as you can only write full files.
 

> The stuff can be found on https://github.com/bareos/Droplet
>
> We are currently testing the S3 backend using both real Amazon
> storage and RIAK-cs S3 storage but
> are seeing some strange things with the VFS api there. As far as we
> can see we do things right e.g.
> dpl_open the file in S3 and use dpl_pwrite at the right offset but
> what seems to be happening is that it
> overwrites the first block of data on S3 all the time. So we were
> wondering in what state the VFS
> abstraction currently is as it seems things have been refactored
> quite a bit over the last months. Is
> it supposed to work and are we doing something wrong or is the
> dpl_pwrite on S3 known not to work
> and if so is there any work going on fixing it.
>

You are right, the VFS API was refactored to implement writes as "PUT range".
The only way S3 may support "PUT ranges" is via multipart-upload API which is not
yet implemented.

Ok.
 
The other thing is that multipart-upload is a destructive operation (the whole file has to be replaced).
Anyways if you are puting the whole file every time this is OK.

Ah I was already afraid of that which makes S3 kind of useless for a append file unless you chunk your
file into multiple parts e.g. maybe create a directory instread of a a file and write the different pieces as
separate files. Is this true for all backends libdroplet supports ? If so I think it would make sense to create
something like a cache layer that gets the current content of the file on open and puts the file on close and
the dpl_pwrite and dpl_pread are done from the cache file.
 
It shall not be so hard to implement: In the dpl_s3_put() backend function, if range is not NULL then it shall be PUT as a "part".
See specification here: http://aws.typepad.com/aws/2010/11/amazon-s3-multipart-upload.html

We would be interested if you do it

I first have to get a proper design in my head on how to append to a file using libdroplet.
Reply all
Reply to author
Forward
0 new messages