[RFC] Merge-friendly data layout

18 views
Skip to first unread message

Andrey Utkin

unread,
Aug 16, 2018, 12:40:06 AM8/16/18
to taskwar...@googlegroups.com
I have a motivation to switch away from using taskserver for git,
because of various data integrity issues and concerns.

Below is a list of my issues and concerns with current git and
taskserver approaches, criticism of them and a vague idea of how to
improve the state of affairs. I'd like to hear opinions, criticism,
suggestions, any honest feedback. I'd like to hear who would like to use
such an implementation, as well as who would like to help to develop it.

I hope my criticism doesn't upset anybody in developers community; my
great gratitude to every contributor into Taskwarrior, I am using it for
more than 2 years now to organize my life and I can't find better
solution for me amongst both FOSS and proprietary/paid software.

So here are some data integrity issues I experienced in last few months.

Syncing on flaky mobile connection, I got complete blockage several
times, requiring me to reset tasks data on taskserver and do "task sync
init". Really, I synced on mobile phone, and after that I couldn't sync
on my laptop anymore, it said "no common ancestor" or like that. For the
record, I am using inthe.am taskserver service and it runs up to date
stable release, 1.1.0.

And, this happened to me before and I also have it right now, after
syncing everything, one host has some tasks which I have already deleted
on another host.

On my different hosts, after syncing everything, pending.data checksums
are different. The differences are
* attribute formatting ('imask:"1.000000"' vs 'imask:"1"'),
* ordering of entries
* some tasks are present on one host despite they were remomved by me
from another host.

Current manual says that syncing through taskserver is superior over git
approach because taskserver can resolve conflicting changes and with git
you must resolve them manually and that they happen often. This is true,
but it's no surprise with the employed data layout, where, for example,
new tasks are just appended to the single JSON file. For example, if
https://www.passwordstore.org/ used such a layout instead of tree-like,
file per entry layout, it would require purpose-built password store
syncing server, too. But it fact it was built with consideration of how
git can merge data with less burden, and in practice you don't get
conflicts.

I have tried many taskwarrior + git solutions - both self-made and those
shown at https://taskwarrior.org/tools/ .

https://github.com/mrschyte/taskwarrior-hooks/blob/master/mergetool/taskmerge.py
is the only thing I found which addresses a problem of merging, and it
takes 10 seconds to resolve a trivial conflict on my tasks data on my
powerful laptop, a bit too long for what I'm doomed to do a lot. Too bad
for me.

But I see that there are some technical steps which would result in
crossbreeding of great traits of the current git way and taskserver way.
Specifically, rock-solid data integrity characteristics and ability to
sync peer to peer (offline) of the git way, and awareness of data
meaning of the taskserver way.

Steps to be taken are the following.

* Implement an alternative data backend - the current only one is in
src/TDB2.cpp (to my superficial understanding). This will allow for
native git merging of data files in unproblematic cases. Goals:
* store tasks in separate files
* store attributes on separate lines, in sorted order;
`git -c diff.context=0 merge`, I imagine, should merge successfully
the changes in adjacent lines.
* Implement client-side conflict resolution tool (mergetool) using same
rules as what taskserver employs.

Thanks in advance for any replies.
Reply all
Reply to author
Forward
0 new messages