[ANN] Mutagen - A unique file sync utility in Go inspired by Unison

1,574 views
Skip to first unread message

Jacob Howard

unread,
Jan 5, 2017, 10:28:29 AM1/5/17
to golang-nuts
Greetings fellow Gophers!

I wanted to share and get feedback on a Go project that I've been working on for a little while called Mutagen:


It's a continuous, bidirectional file synchronization utility similar to Unison (think SSHFS, but far more robust and not requiring kernel extensions). It's main goal is simplicity - no configuration, no flags, just usage (a minimal number of knobs to turn, in Go parlance).

It uses Go's unique abilities to enable what I consider to be its most interesting feature: It only needs to be installed on the system where you want to control synchronization. It uses Go's cross-compiling support and syscall-only binaries to create small "agent" binaries that it copies to remote hosts automatically after probing for their OS/architecture. It supports a broad range of platforms - pretty much anything Go supports except Plan 9 and mobile (mostly just because those ports can be a bit flakey).

The original goal was to mirror and work with code on smaller platforms (e.g. a Raspberry Pi) on which you might want to compile and run (but not write) code, thus saving you the need to either transfer your vim/emacs configuration over to remote systems or push/pull from your local system every time you want to test a change. But you can use it for general synchronization as well.

You can find a full accounting of the features here: https://github.com/havoc-io/mutagen#unique-features

The tl;dr is: user-space (no kernel extensions or admin privileges needed), syncs locally or over SSH (can even sync SSH-to-SSH without a local copy of files), auto-reconnects, uses pipelined rsync algorithm for any significant data transfer to make effective use of bandwidth and reduce latency, can handle very large directories (the Linux source tree was the development case), identifies conflicts, handles platform and filesystem quirks, and has dynamic status display.

I'd be particularly interested in feedback regarding the idea, design, and usefulness to people. The code is clean, but not all exported members are currently documented because I was sort of treating it like one giant internal package. This will be fixed soon though. I'll also provide full documentation of the algorithm in the near future, but it's very similar to Unison, so the design documents for that will give you 95% of the idea.

It's still an early beta, though essentially feature complete. I've only tested it personally between macOS, Windows (XP/7), and Linux (386/amd64/ARM). I'd be interested in tests on more obscure systems (non-x86 architectures and some of the BSDs), but please be aware that it is a beta that has the power to create and delete files, so it's not to be used on production or mission-critical systems, and could be dangerous on any system.

This is my first medium-to-large Go project.  On the whole it was a positive experience, and I'm fairly certain that this exact design would be almost impossible to reproduce with any other language on the market right now.  It also uses a fair number of excellent Go libraries (see legal.go in the source), so thanks to all those authors and the Go team for making it possible.

-Jacob

Shawn Milochik

unread,
Jan 5, 2017, 11:03:10 AM1/5/17
to golang-nuts
I really like the idea of what you have here. I'm currently using SyncThing for this purpose. SyncThing seems to fit all your requirements with the exception of only needing to be installed on one of the machines. However, in return SyncThing allows you to select which folders are shared from each machine to each other machine, making it really useful for sharing only a subset of your data with other people. https://syncthing.net/

If you're familiar with SyncThing (or if not, it's kind of like Dropbox/btsync but without a middleman or closed-source protocols), what's the case for using mutagen instead of (or as well as) SyncThing?

Thanks for sharing!

Shawn


wilk

unread,
Jan 5, 2017, 12:13:06 PM1/5/17
to golan...@googlegroups.com
On 05-01-2017, Shawn Milochik wrote:
> --94eb2c1a09e43b9d8205455b0873
> Content-Type: text/plain; charset=UTF-8
>
> I really like the idea of what you have here. I'm currently using SyncThing
> for this purpose. SyncThing seems to fit all your requirements with the
> exception of only needing to be installed on one of the machines. However,
> in return SyncThing allows you to select which folders are shared from each
> machine to each other machine, making it really useful for sharing only a
> subset of your data with other people. https://syncthing.net/

The bonus of syncthing is also deduplication on whole files right ?
Mutagen will do it also ?



--
William

Jacob Howard

unread,
Jan 5, 2017, 5:28:42 PM1/5/17
to golang-nuts, Sh...@milochik.com
Thanks for the questions!

Syncthing is, as you say, aimed at being a Dropbox replacement.  It is designed to be set up for long-term synchronization sessions.  Mutagen was designed for more ephemeral syncing, particularly for remote code editing.  Its ancestry really lies closer to something like Unison, rmate, or Sublime SFTP.  It just so happened that the final design of Mutagen worked decently for long-term synchronization as well.

I also wanted to make Mutagen simpler to use than Syncthing.  When you open up the Syncthing docs, it can take some time to get your bearings, and I suspect the setup is a barrier to entry for a lot of people who aren't experts and just want to sync files.  I wanted anyone who knew how to use SSH to be able to use this instantly.

I also wanted this to operate over SSH rather than a custom TCP port.  Most remote systems that you might want to edit code on aren't going to let you forward a TCP port through their firewalls like you'd need to do for Syncthing, so then you're tunneling over SSH and it's kind of complicated.

Finally, as you mention, you don't need to install on the remote.  This sounds trivial, but it helps solve a lot of issues with systems that you don't control.

So to answer your question, basically it's short-term and simple vs long-term and more complex.  Syncthing has a lot of nice configuration options and can do a lot of cool things, and this doesn't replace that.  For people who don't know what SSH is, Sycnthing is also probably still the way to go, especially on home networks.  For people who need to edit a directory on a remote system that they don't necessarily have extensive control over and don't want to deal with the flakiness of SSHFS or SFTP, Mutagen is hopefully going to be the way to go.

-Jacob

Jacob Howard

unread,
Jan 5, 2017, 5:35:59 PM1/5/17
to golang-nuts, wi...@flibuste.net
Syncthing's transfer model is a bit different with it's block-exchange protocol, so it's a bit difficult to compare the exact behavior.  For most use cases, I think you'd see comparable performance in terms of data transfer required for change propagation, though Mutagen could perform a few additional optimizations that it doesn't currently do, e.g. optimizing rsync block size based on file size, but these would be very marginal gains.

-Jacob

wilk

unread,
Jan 6, 2017, 5:12:16 AM1/6/17
to golan...@googlegroups.com
On 05-01-2017, Jacob Howard wrote:
> ------=_Part_50_803652314.1483655759799
> Content-Type: multipart/alternative;
> boundary="----=_Part_51_771736522.1483655759799"
>
> ------=_Part_51_771736522.1483655759799
> Content-Type: text/plain; charset=UTF-8
>
> Syncthing's transfer model is a bit different with it's block-exchange
> protocol, so it's a bit difficult to compare the exact behavior. For most
> use cases, I think you'd see comparable performance in terms of data
> transfer required for change propagation, though Mutagen could perform a
> few additional optimizations that it doesn't currently do, e.g. optimizing
> rsync block size based on file size, but these would be very marginal gains.

By deduplication, i mean on differents files.
If differents files share a common part, this part will be transfered
two times ?

>
> -Jacob
>
> On Thursday, January 5, 2017 at 7:13:06 PM UTC+2, wilk wrote:
>>
>> On 05-01-2017, Shawn Milochik wrote:
>> > --94eb2c1a09e43b9d8205455b0873
>> > Content-Type: text/plain; charset=UTF-8
>> >
>> > I really like the idea of what you have here. I'm currently using
>> SyncThing
>> > for this purpose. SyncThing seems to fit all your requirements with the
>> > exception of only needing to be installed on one of the machines.
>> However,
>> > in return SyncThing allows you to select which folders are shared from
>> each
>> > machine to each other machine, making it really useful for sharing only
>> a
>> > subset of your data with other people. https://syncthing.net/
>>
>> The bonus of syncthing is also deduplication on whole files right ?
>> Mutagen will do it also ?
>>
>>
>>
>> --
>> William
>>
>>
>


--
William

Jacob Howard

unread,
Jan 6, 2017, 5:20:55 AM1/6/17
to golang-nuts, wi...@flibuste.net
Ah, sorry.  No, it won't do re-use blocks across files, only within a single file.

-Jacob

Pierre Neidhardt

unread,
Jan 15, 2017, 4:11:13 AM1/15/17
to golang-nuts
I've developed a related bu complementary tool, also in pure Go:


It synchronizes hierarchies, in the sense that it will detect renames and move files on target accordingly, *without* transfering any file.
It can be a huge time saver in scenarios where renamings have occured.

It is complementary to rsync, unison, or mutagen in in the sense that you can run hsync before those tools to save lots of byte transfers. The end result will remain the same once the second tool has been run.

As a bonus, hsync will print out the list of duplicates (comes for free).
Reply all
Reply to author
Forward
0 new messages