Hello all,
I am wondering do you guys have any plans on supporting relax durability. Is it a good feature to have in bookkeeper (also for DistributedLog)?
I am thinking adding a new flag to bookkeeper#addEntry(..., Boolean sync). So the application can control whether to sync or not for individual entries.
- On the write protocol, adding a flag to indicate whether this write should sync to disk or not.
- On the bookie side, if the addEntry request is sync, going through original pipeline. If the addEntry disables sync, complete the add callbacks after writing to the journal file and before flushing journal.
- Those add entries (disabled syncs) will be flushed to disks with subsequent sync add entries.
To my use cases on DistributedLog, this feature can be used for supporting streams that don't have strong durability requirements.
What do you guys think? Shall I create a jira to implement this?
Thanks a lot
-Jia
--
You received this message because you are subscribed to the Google Groups "distributedlog-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to distributedlog-...@googlegroups.com.
To post to this group, send email to distribut...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/distributedlog-user/CALsc%2BXpJj3YT47bognhmEhHmahJkCgJUUY6Un4HVczfK_1MxPQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
> .
> For more options, visit https://groups.google.com/d/optout.
>
I was interested in trying something in this area, but never actually got
to do it.
A few random notes:
1. My suspicion, with no backing data at this point, is that simply
skipping the fsync
for "non-durable" ledgers might not give a big improvement, just a bit
less latency
for non-fsynced writes but roughly the same throughput. Imagine a
bookie
receiving writes for 2 ledgers, 1 durable and the other non-durable.
Since the entries are appended to the journal as they come in, the
fsync() for the
durable ledger write will also carry on the data for the previous
non-durable ledger
write, causing more IOPS if that was spanning a different disk block.
Given that the bookie throughput is typically limited by the IOPS
capacity of the
journal device, having non-durable write might help that much.
2. The other options I was thinking were :
- Do not append the non-durable entries to journal (redundancy is
anyway given by
writing to multiple bookies). In this case though, a single bookie
could loose more
entries depending on flushTime, and also could loose entries even
in case of
process crash, not just kernel-panic or power-outage.
- Use a separate journal for non-durable writes which will not be
fsynced()
- Configure the durability at the bookie level and then use
placement/isolation policy to choose the
appropriate set of bookies for a non-durable ledger.
3. How do bookie replication will operate when getting read-errors?
Matteo
On Thu, Jun 2, 2016 at 11:09 PM Sijie Guo <si...@apache.org> wrote:
> I think if a ledger is configured to be non-durable, it is kind of
> application's responsibility to tolerant the data loss.
> So I don't think it actually will have to change any in the bookkeeper
> client side.
>
> - Sijie
>
> On Thu, Jun 2, 2016 at 7:29 AM, Venkateswara Rao Jujjuri <
> juj...@gmail.com>
To view this discussion on the web visit https://groups.google.com/d/msgid/distributedlog-user/CAKKTCLXs42QqZY-pw0YeL6uYqmDCEiFOxo5%3DRkXwcSg%3DEgrMJA%40mail.gmail.com.
On Wed, Aug 3, 2016 at 12:51 PM, Enrico Olivelli <eoli...@gmail.com>
wrote:
> Hi Jia,
> I have another similar use case for this feature.
> Let it be a ledger a db transaction log.
> The client issues a sequence of data manipulation instructions inside the
> scope of the transaction, if everything goes well a commit is finally added
> to the sequence. From the client perspective it is important to wait for
> sync only for the last entry, that is the 'commit'.
> In my case all the entries will be added with sync=false and then the last
> with sync=true. But it is important that the addentry with sync returns
> only if all the previous entries of the same sequence or of the same ledger
> have been written to stable storage.
>
Yup, I think that's a common usage pattern.
> In this case I see the real challenge is that entries span multiple
> bookies and it will be very hard to coordinate such a sync
>
Does making ensemble size equal to ack quorum size work here?
> At the moment for my projects is not very urgent but I think that it could
> be an useful feature
>
> Enrico
>> >>> > >> > email to distributedlog-user+unsub...@googlegroups.com.
>> >>> > >> > To post to this group, send email to
>> >>> > >> distributedlog-user@googlegroups.com.
>> >>> > >> > To view this discussion on the web visit
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>> https://groups.google.com/d/msgid/distributedlog-user/CALsc%
>> 2BXpJj3YT47bognhmEhHmahJkCgJUUY6Un4HVczfK_1MxPQ%40mail.gmail.com
>> >>> > >> > <
>> >>> > >>
>> >>> >
>> >>> https://groups.google.com/d/msgid/distributedlog-user/CALsc%
>> 2BXpJj3YT47bognhmEhHmahJkCgJUUY6Un4HVczfK_1MxPQ%40mail.
>> gmail.com?utm_medium=email&utm_source=footer
>> >>> > >> >
>> >>> > >> > .
>> >>> > >> > For more options, visit https://groups.google.com/d/optout.
>> >>> > >> >
>> >>> > >>
>> >>> > >
>> >>> > >
>> >>> > >
>> >>> > > --
>> >>> > > Jvrao
>> >>> > > ---
>> >>> > > First they ignore you, then they laugh at you, then they fight
>> you,
>> >>> then
>> >>> > > you win. - Mahatma Gandhi
>> >>> > >
>> >>> > >
>> >>> > > --
>> >>> > > You received this message because you are subscribed to the Google
>> >>> Groups
>> >>> > > "distributedlog-user" group.
>> >>> > > To unsubscribe from this group and stop receiving emails from it,
>> >>> send an
>> >>> > > email to distributedlog-user+unsub...@googlegroups.com.
>> >>> > > To post to this group, send email to
>> >>> > distributedlog-user@googlegroups.com.
>> >>> > > To view this discussion on the web visit
>> >>> > >
>> >>> >
>> >>> https://groups.google.com/d/msgid/distributedlog-user/
>> CAKKTCLXLqqW6q3V%2Br%3Dt%3DdOhq-gue_fWNpAgaFrMXw%
>> 3DaCHUFomQ%40mail.gmail.com
>> >>> > > <
>> >>> >
>> >>> https://groups.google.com/d/msgid/distributedlog-user/
>> CAKKTCLXLqqW6q3V%2Br%3Dt%3DdOhq-gue_fWNpAgaFrMXw%
>> 3DaCHUFomQ%40mail.gmail.com?utm_medium=email&utm_source=footer
>> >>> > >
>> >>> > > .
>> >>> > >
>> >>> > > For more options, visit https://groups.google.com/d/optout.
>> >>> > >
>> >>> >
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Jvrao
>> >> ---
>> >> First they ignore you, then they laugh at you, then they fight you,
>> then
>> >> you win. - Mahatma Gandhi
>> >>
>> >>
>> >> --
>> >> You received this message because you are subscribed to the Google
>> Groups
>> >> "distributedlog-user" group.
>> >> To unsubscribe from this group and stop receiving emails from it, send
>> an
>> >> email to distributedlog-user+unsub...@googlegroups.com.
>> >> To post to this group, send email to distributedlog-user@
>> googlegroups.com
>> >> .
>> >> To view this discussion on the web visit
>> >> https://groups.google.com/d/msgid/distributedlog-user/CAKKTCLXs42QqZY-
>> pw0YeL6uYqmDCEiFOxo5%3DRkXwcSg%3DEgrMJA%40mail.gmail.com
>> >> <https://groups.google.com/d/msgid/distributedlog-user/
>> CAKKTCLXs42QqZY-pw0YeL6uYqmDCEiFOxo5%3DRkXwcSg%3DEgrMJA%40mail.
>> gmail.com?utm_medium=email&utm_source=footer>
>> >> .
>> >>
>> >> For more options, visit https://groups.google.com/d/optout.
>> >>
>> >
>> >
>>
> --
>
>
> -- Enrico Olivelli
>