Towards Iris 0.2.0, suggestions for the Book of Iris and plans for 0.3.0

13 views

Skip to first unread message

Péter Szilágyi

unread,

Mar 15, 2014, 7:39:02 AM3/15/14

to projec...@googlegroups.com, Sander van Harmelen

Hi all,

I just wanted to give the heads up that I've finished a very heavy refactor of Iris from the ground up through the whole system [1]. The goal was not to add new stuff, but to begin graduating Iris from an academic system into an industrial one. Of course, there will always be plenty more to do, but I thought these modifications are significant enough to warrant a new release.

Beside the countless bug and stabilization fixes, a few notable extras are:

Separate data and control connections between Iris peers, which should make the system much more resilient to overload.
Completely redesigned tunnel based on direct TCP connections, opposed to the previous simulated-stream-over-scribe implementation.

I've been looping tests for a few days on a single machine running about 10-20 nodes, with no unexpected failures. The only failures I see are messages losses caused by high churn and CPU overload. So at this point I'd like to enter a feature freeze for 0.2.0, correcting only showstopper bugs. I will try and do some distributed testing too (albeit I'll be preoccupied with other commitments for about a week), and thought I'd call out to anyone interested in Iris to give it a spin and report on anything unexpected, ugly or pretty much any usage feedback (positive is welcome too, it boosts morale) :). At the moment Iris spits out quite a lot of debugging log entries, which I will remove before going ahead with the release, but which can be very useful for tracking down potential problems (mainly in proto/scribe/topic/topic.go, if you want to silence them).

---

On a different note, I'd like to continue writing up the Book of Iris [2], adding at least one chapter about getting Iris up and running. I plan to include sections on:

Target environment Iris was meant to run in (firewalls, security expectations, etc).
Basic architecture of Iris (i.e. single node many client / machine, language bindings, rationale behind it etc).
Short tutorial-like section on getting a few nodes up and passing messages between them.

These should preferably hit the website simultaneously with the 0.2.0 release. If you have any suggestions, comments or things you'd like to see in the book, now's the time to make yourself heard. The goal of the "book" is to be a valuable resource for the community, so let it be driven by the community :).

---

Finally, I have a few features planned for of Iris, which I'll be implementing in the foreseeable future. Some will land in 0.3, some only after that. I'm open to all feedback on these so that I can have a clear idea which are the essential features waited for and which are non essential ones that can be delayed.

Federation support
This is one that has been requested by a few people privately and on Twitter too, so I think it is a good candidate to make the 0.3.0 release about. Since this email is getting long as is, I'll write a follow up with the ideas of the planned levels of federation support. That would be the ideal thread to discuss any propositions or desires.
Package scope leveled logging
Even by logging only potential error messages (many of which are caused by distributed system concurrency and not errors), the logs can grow fast. Adding debugging outputs too see what the system does results in an enormous amount of log output. It would be nice to have a package scope leveled logger: a classical leveled logger, but separately configurable for individual overlay layers (i.e. to debug only scribe, or iris or pastry).
Statistics gathering
Iris is kind of like a black box with a lot going on inside, but extremely hard to debug even with full detailed logs due to the sheer number and complexity of the messages. It could provide invaluable to have a statistics gathering mechanism that would act somewhat like a profiler, collecting event infos from the whole system (e.g. network ingress/egress at different layers, system resource consumption, maybe network topology).
Synchronized ops
Certain operations are not synchronized now (most importantly the subscribe and tunnel closing), which could increase message loss if the timings are not correct. These are also worth thinking about, but they are not trivial as they need to be solved at both overlay and relay level.

I have some ideas on all the above topics, but the goal is to make them as simple and lightweight as possible, hence why I didn't add them yet. If you miss anything else, fire away with the proposals.

Have a happy weekend,

Peter

PS: Try it, code it, share it :)

Refs:

[1] https://github.com/karalabe/iris/compare/dc8b7e204889d9e235d87fbca2923ebb2b20adf6...master

[2] http://iris.karalabe.com/book

Reply all

Reply to author

Forward

0 new messages