Announcement: Implementation of Phantom as Master Thesis

13 views
Skip to first unread message

Michael

unread,
May 18, 2009, 10:48:01 AM5/18/09
to Phantom Protocol, la...@smart-link.de, Michael...@informatik.uni-erlangen.de
To all people involved or interested in this project:
Searching for a topic for my Diplomarbeit (~ Master Thesis) a friend
(Leslie) told me about this project and after reading through the
white
paper, I felt it would be great if Magnus' concept could become
reality.
So I formulated a thesis covering the basics of the protocol and
looked
for a thesis supervisor, which I found in Lars from FoeBud (a German
civil rights and data privacy organization, similar to the Electronic
Frontier Foundation).

So what does that mean for this project?

It means that for the next six months I will spend most of my time
working on this project, hopefully getting it to a point where the
remaining implementation steps can easily be distributed on other
project members.

Since it is hard to estimate the required time for the thesis, I will
first focus on the "setting up of the routing path" and then see how
much further I can get from there within the time limit.
Since I will be focusing my time on this project, I can as well help
Magnus in coordinating individual efforts to further this project, so
please, everyone who wants to help this project, contact me and Magnus
now! As long as I mark which parts of the code originate from me, and
which originate from other people's efforts, this will be no problem
(concerning the "no help allowed" clause in master theses). This
clause
also doesn't mean you can't give me valuable advice, and help me with
problems I'll definitely run into! ;)

Where to start from?

I will start by looking a bit into cross platform development and
network application programming (TCP, OpenSSL stuff), before getting
started on the design. If anyone of you knows good resources for these
fields, please forward them to me in the Mailing List or using Blog
Comments. Afterwards I will begin defining Interfaces and creating a
UML
diagram / a general implementation map of the protocol. And then I
guess
I can start by implementing base classes.

Magnus and I agreed to use C++ as programming language.

Let's bring anonymity to a new level!

michael

Magnus Bråding

unread,
May 18, 2009, 1:18:22 PM5/18/09
to phantom-...@googlegroups.com
This is of course very good news for the project!

And to everyone who has offered their help previously, but not been able
to contribute much yet for one or other reason (a big one being my lack
of time to coordinate the development), this is a great chance! Let's
hear from all of you here on the list (or in private to me and Michael
if you'd rather that), so we can split up the development between all of
us in the most optimal way!

Regarding the programming language, the reasoning went something like this:

I really think that the application should be able to be run as a native
application in Windows (although not ONLY in Windows), due to the huge
userbase potential. Thus, languages like Python and Java are out the
Window. Not counting some more obscure languages, that practically only
leaves us with C and C++. On one hand, C is often cleaner/less messy,
and more people can also code in it. On the other hand, I guess C++ has
the potential to create more elegant solutions, if you just handle it
correctly and carefully stay away from the luring messy side of the
language.

It's practically down to a "religious" discussion from there I think,
but since Michael is the one offering to do the most actual work on it
so far, I think that his preference for any of these languages should
weight in relatively heavy.

Also, C++ applications can quite easily use modules/subsets that are
written in C, as long as we keep a good abstraction, which we absolutely
intend to do, and that in turn takes away even more of the possible
"bad" properties of using C++, if that's what Michael feel most
comfortable with.

And again, everyone, if you want to help, this is the time to make your
voice heard!

Regards,
Magnus

Leslie P. Polzer

unread,
May 19, 2009, 4:12:09 AM5/19/09
to phantom-...@googlegroups.com

Magnus Bråding wrote:

> Let's hear from all of you here on the list (or in private to me and Michael
> if you'd rather that), so we can split up the development between all of
> us in the most optimal way!

I'm interested in helping but am tied up too much right now with other
things, so I will content myself with helping Michael every now and then
as needed.


> Regarding the programming language, the reasoning went something like this:

I agree that this topic has an enormous potential to spark off flames
and religious discussions. It's therefore important that we keep ourselves
to facts, both positive and negative.


> I really think that the application should be able to be run as a native
> application in Windows (although not ONLY in Windows), due to the huge
> userbase potential. Thus, languages like Python and Java are out the
> Window. Not counting some more obscure languages, that practically only
> leaves us with C and C++.

Yes.

I heavily tend towards the C option right now but I'm open
to be convinced otherwise. :)


> On the other hand, I guess C++ has the potential to create more elegant
> solutions,

Would you elaborate on that?


> if you just handle it correctly and carefully stay away from the
> luring messy side of the language.

And that?


> It's practically down to a "religious" discussion from there I think,
> but since Michael is the one offering to do the most actual work on it
> so far, I think that his preference for any of these languages should
> weight in relatively heavy.

Yes, it's an important point but I don't think it necessarily has
to be emphasized that heavily. Reasons:

* C isn't that hard when you know C++

* Exploration is part of a Master's Thesis

* Choices affecting the project as a whole must be balanced
against single developer experience and preferences, even
if the developer in question takes on a huge chunk of it.


> Also, C++ applications can quite easily use modules/subsets that are
> written in C, as long as we keep a good abstraction, which we absolutely
> intend to do, and that in turn takes away even more of the possible
> "bad" properties of using C++,

Unfortunately there's another side to this and an important argument
for me: the option to call Phantom functions via a foreign function
interface. It's simple with C but can be quite cumbersome with C++
due to non-standardized name mangling/ABI and private semantics
(e.g. the object system).

This -- i.e. the ability to easily code modules in another language
-- is a large advantage of C.

I hope others will chime in and state their analysis so we can
solve this issue quickly.

Leslie

Magnus Bråding

unread,
May 19, 2009, 4:33:16 AM5/19/09
to phantom-...@googlegroups.com
>> On the other hand, I guess C++ has the potential to create more elegant
>> solutions,
>
> Would you elaborate on that?

The object orientation often results in somewhat more intuitive or
easily read code (again, if used the right way), and the functionality
you get "for free" along with the object orientation and its built-in
aspects and features might somewhat reduce the size and complexity of
the code you have to write yourself.


>> if you just handle it correctly and carefully stay away from the
>> luring messy side of the language.
>
> And that?

I'm not sure if what you mean "And that is?", referring to which are the
messy sides, but if you do, well, C++ has a _lot_ of features that you
can use to make the code extremely messy if you want to (or don't know
how to do the opposite), that's what I meant.


> * C isn't that hard when you know C++
>
> * Exploration is part of a Master's Thesis

Exploration and learning is one thing, but if Michael finds the entire
development process to be more tedious during the lenght of the work, it
will result is less being done, that's what I'm mostly concerned about.
Also, the learning and exploration at a master thesis level should
normally be aimed at a little higher level than learning or exploring a
programming language (I have a Master's degree in computer science
myself), i.e. it will be better invested in exploring/completing the
protocol itself.


>> Also, C++ applications can quite easily use modules/subsets that are
>> written in C, as long as we keep a good abstraction, which we absolutely
>> intend to do, and that in turn takes away even more of the possible
>> "bad" properties of using C++,
>
> Unfortunately there's another side to this and an important argument
> for me: the option to call Phantom functions via a foreign function
> interface. It's simple with C but can be quite cumbersome with C++
> due to non-standardized name mangling/ABI and private semantics
> (e.g. the object system).
>
> This -- i.e. the ability to easily code modules in another language
> -- is a large advantage of C.

I'm not extremely familiar with exactly _how_ cumbersome C++ would make
this, but I'm quite sure it won't be unsurmountable, not to mention all
the juicy master thesis "exploration" it will involve. ;-)

Again, I agree that C is "cleaner", but with clean you also get to do
more of the work yourself. I can only come to the same conclusion as
before, that I trust that the best results will come out of the main
developers (i.e. Michael so far) making this choice, after their own
careful consideration of the arguments presented here.

Regards,
Magnus

Walter

unread,
May 19, 2009, 4:34:59 AM5/19/09
to phantom-...@googlegroups.com
>> Let's hear from all of you here on the list (or in private to me and Michael
>> if you'd rather that), so we can split up the development between all of
>> us in the most optimal way!
>
> I'm interested in helping but am tied up too much right now with other
> things, so I will content myself with helping Michael every now and then
> as needed.

I would like to get involved later on, maybe with some tangents like
connecting any string resources with a web based translation interface
to facilitate internationalisation, and/or writing documentation.

I've just moved internationally and am starting a new job so I'm really
too busy to dedicate much time right now (and though I can follow
code, my C/C++ skillset is 'inexperienced' at best).

> I agree that this topic has an enormous potential to spark off flames
> and religious discussions. It's therefore important that we keep ourselves
> to facts, both positive and negative.

My input here is limited to 'common courtesy' - whoever's writing it
should have the final choice themselves, and everyone else should be
thankful they're writing it at all!

- Walter

Leslie P. Polzer

unread,
May 19, 2009, 4:46:46 AM5/19/09
to phantom-...@googlegroups.com

Magnus Bråding wrote:

> The object orientation often results in somewhat more intuitive or
> easily read code (again, if used the right way), and the functionality
> you get "for free" along with the object orientation and its built-in
> aspects and features might somewhat reduce the size and complexity of
> the code you have to write yourself.

There are nice, clean and mature object systems for C like GObject.


> I'm not sure if what you mean "And that is?", referring to which are the
> messy sides, but if you do, well, C++ has a _lot_ of features that you
> can use to make the code extremely messy if you want to (or don't know
> how to do the opposite), that's what I meant.

I would have liked a more detailed explanation of what of C++
you'd consider messy and which not.


> Exploration and learning is one thing, but if Michael finds the entire
> development process to be more tedious during the lenght of the work, it
> will result is less being done, that's what I'm mostly concerned about.
> Also, the learning and exploration at a master thesis level should
> normally be aimed at a little higher level than learning or exploring a
> programming language (I have a Master's degree in computer science
> myself), i.e. it will be better invested in exploring/completing the
> protocol itself.

I agree, it's mainly a scientific exercise after all.


> Again, I agree that C is "cleaner", but with clean you also get to do
> more of the work yourself.

Alright, but for most of the functionality (e.g. array/buffer/string
management, object orientiation) there are good libraries.


> I can only come to the same conclusion as
> before, that I trust that the best results will come out of the main
> developers (i.e. Michael so far) making this choice, after their own
> careful consideration of the arguments presented here.

Again I agree. It's our job to present the options as best as possible
so Michael (and others who decide to invest large amount of time right
away) can make an informed decision.

Leslie

Michael

unread,
May 26, 2009, 8:04:23 AM5/26/09
to Phantom Protocol
Dear Phantoms,

Since you have left the final decision, regarding the programming
language, up to me, I have come to the decision to use C++ in general
and C where appropriate.
I have talked to some more cs students of my university and they gave
me good arguments against using Java or Python and other scripting
language. The main reason for that is, that they require a runtime
environment, which leads to a couple of problems like: if we want
Phantom to be adopted by the masses, even something simple as
installing Java or Python on their computer would drastically limit
out target group. Even if we'd somehow include all things we'd need in
the binary / install package, this might lead to problems when runtime
environments are already installed.

We ruled languages like Prolog (which I am very good at) and Lisp out,
because there are too less OpenSource programmers out there, being
firm in these, which could continue and maintenance the project after
me.

That brings us down to C and C++.
For some tasks that might lie ahead of us, like low level hardware /
kernel programming and efficiency critical methods/classes, C seems
the best choice.
For modeling, object orientation, structuring and code readability, C+
+ seems the best choice (at least for me, since I'll be doing the
structuring, modeling etc.).

So for me it seems a good choice at this point to use C++ as main
language for the project, but to use C where it makes sense. As far as
the people I talked to informed me, using C in a C++ project is very
easily done and basically is broken down to some casts of objects to
structures. Also someone suggested to produce a generic library that
would allow people to program their own high level GUI and stuff in
any language they like.

I hope this decision is okay for anyone in the project.
I will soon send a second mail with some new recon I gathered.

Michael

On May 19, 10:46 am, "Leslie P. Polzer" <s...@viridian-project.de>
wrote:

Magnus Bråding

unread,
May 26, 2009, 8:10:13 AM5/26/09
to phantom-...@googlegroups.com
Hi Michael,

Sounds great, looking forward to your further progress.

Regards,
Magnus

Leslie P. Polzer

unread,
May 26, 2009, 10:00:25 AM5/26/09
to phantom-...@googlegroups.com

Michael wrote:

> I have talked to some more cs students of my university and they gave
> me good arguments against using Java or Python and other scripting
> language. The main reason for that is, that they require a runtime
> environment, which leads to a couple of problems like: if we want
> Phantom to be adopted by the masses, even something simple as
> installing Java or Python on their computer would drastically limit
> out target group. Even if we'd somehow include all things we'd need in
> the binary / install package, this might lead to problems when runtime
> environments are already installed.

Then you seem to have arrived at the same conclusion as we in this
group. ;)


> We ruled languages like Prolog (which I am very good at) and Lisp out,
> because there are too less OpenSource programmers out there, being
> firm in these, which could continue and maintenance the project after
> me.

Sorry for being slightly off-topic, but I must disagree here for the
sake of truth. I don't know about Prolog but Lisp (Scheme and
Common Lisp) rather disqualifies for the same reasons as Python
does, surely not for lack of popularity among Open Source
programmers -- although it has to be admitted that Lisp
communities haven't yet attracted as many followers as Python
or Perl.


> That brings us down to C and C++.
> For some tasks that might lie ahead of us, like low level hardware /
> kernel programming and efficiency critical methods/classes, C seems
> the best choice.
> For modeling, object orientation, structuring and code readability, C+
> + seems the best choice (at least for me, since I'll be doing the
> structuring, modeling etc.).

For this to be really useful it would be nice to have a clean
separation between the two parts -- maybe two libraries?

In what way do you think hardware programming is necessary?


> So for me it seems a good choice at this point to use C++ as main
> language for the project, but to use C where it makes sense. As far as
> the people I talked to informed me, using C in a C++ project is very
> easily done and basically is broken down to some casts of objects to
> structures. Also someone suggested to produce a generic library that
> would allow people to program their own high level GUI and stuff in
> any language they like.

Yes, I strongly agree here.

Leslie

wernerd

unread,
May 29, 2009, 1:19:37 PM5/29/09
to Phantom Protocol
Michael,

good to hear that your thesis work is in full swing now. As promised
in another
group I had a look into Phantom and just want to say Hello here as
well ;-) .

While reading Magnus's Phantom WP and protocol description I got some
ideas
and some comments.

On 18 Mai, 16:48, Michael <tay...@gmail.com> wrote:
> To all people involved or interested in this project:
> Searching for a topic for my Diplomarbeit (~ Master Thesis) a friend
> (Leslie) told me about this project and after reading through the
> white
> paper, I felt it would be great if Magnus' concept could become
> reality.
....

> Where to start from?
>
> I will start by looking a bit into cross platform development and
> network application programming (TCP, OpenSSL stuff), before getting
> started on the design. If anyone of you knows good resources for these
> fields, please forward them to me in the Mailing List or using Blog
> Comments. Afterwards I will begin defining Interfaces and creating a
> UML
> diagram / a general implementation map of the protocol. And then I
> guess

Even before doing any interfaces (C++) I would propose to layout the
protocol itself:
such as data structures, wire format, little/big endian format,
is it a binary protocol or a more "human readable" (like XML ;-) ).
Also there should be some definitions about the crypto algorithms,
hash algorithms (key length for example, which streaming crypto (AES
in
CFB mode?) ) etc etc.

Also define some schematic protocol flows (similar to SIP call flows,
for
example) that also show which information is transfered, and which
information each node uses.

These more formal and technical definitions and structures help to
create protocol state diagrams and other diagrams that can help to
understand
and clarify problems and to start a more formal evaluation of the
protocol.

Defining this stuff first also helps to eliminate ambiguities and
areas of interpretation
(both can be found) in the WP. In addition this supports the design of
"test drivers" that
should be done by other parties to verify the correct implementation
and protocol
flow.

During development, test and interop-tests of ZRTP this was extremely
important and uncovered quite some problems.

What do you think?

Best Regards,
Werner

Leslie P. Polzer

unread,
May 31, 2009, 8:36:09 AM5/31/09
to phantom-...@googlegroups.com

wernerd wrote:

> Even before doing any interfaces (C++) I would propose to layout the
> protocol itself:
> such as data structures,

Yes. It's helpful to do data-driven development but it's important
not to get carried away with it.


> wire format

Why would we care about the protocol at the wire level?
Even if we'd care, how would we influence this?


> little/big endian format,
> is it a binary protocol or a more "human readable" (like XML ;-) ).

Those are implementation details and I advise against caring
about them at this stage.


> Also there should be some definitions about the crypto algorithms,
> hash algorithms (key length for example, which streaming crypto (AES
> in
> CFB mode?) ) etc etc.

Again I'd advise against this.

Write stubs for make_hash(octets) and encrypt(octets)/decrypt(octets)
and that's it.


> Also define some schematic protocol flows (similar to SIP call flows,
> for
> example) that also show which information is transfered, and which
> information each node uses.

Yes, very good.


> During development, test and interop-tests of ZRTP this was extremely
> important and uncovered quite some problems.

I agree that tests are important; we'll see whether there are enough
interested parties and effort to make shared testing possible before
the end of Michael's work.

Leslie

Michael Prinzinger

unread,
Jun 1, 2009, 8:18:32 AM6/1/09
to phantom-...@googlegroups.com
Hey Werner!

thanks for looking into this project and thanks for your good advice!

Yesterday I talked to Magnus to plan the next steps and we agreed of me starting on the implementation,
while in parallel working out design decisions and outlines like you recommend them in your Mail.

For the implementation,
I will start on a local unencrypted model using different TCP ports and different processes, as well as a hardcoded list of ports of these processes.
For the serialization of data, I will start with a human readable format (probably XML) for debugging purposes, later on binary format (raw bytes) will be used.
Then I will try to provide the infrastructure for computing hashes (some simple hash algorithm at first, real cryptographically secure algorithms will be added by using a crypto library at a later point) and include the unencrypted hash of the SetupPackage.
The next step will be to get started on encryption. Again I will start by providing the infrastructure and an interface to the latter to be added (Open)SSL library. At this point I will just use some XOR to "encrypt" and "deecrypt" the transmitted data.
When all this works satisfiable I will try running it between different computers.
As a next step I will look into including OpenSSL for real encryption.

For the design, I want to pick up your suggestions
and discuss the transfer format = serialization (XML for debugging/development, binary for real usage <- the little/big endian question goes here).
I will try to "draw" an UML diagramm covering all the base classes and structures, and their interrelations.
Also I will work out some sample schematic protocoll flows. I have done this also for my last work, and I agree that it helps understanding what's going on a lot, as well as clarifying how you want to do things yourself.
If that is done, and the parallel implementation is at the at the same step, I will read more about (refresh my memory) about crypto algorithms, and then discuss which algorithms to use, their safety and so on.

Only at this points real test cases make sense, I think. Before that point too much will change to quickly to think about tests going along with the project.


I agree with Leslie, that I could leave out some of the design stuff for now, especially since right now I am the only/main programmer and can very well cover things in my head.
But, apart from clariyfing things and making things easier for later programmers, the design part (especially discussions about what algotirhtm/format/etc. to take) makes for an important part of a good master thesis as well. Also I kinda like this stuff, so it's no nusisance for me doing it :)

Your mail suggested to do the design before the actual implementation. That is also what we learn in school and it usually makes sense, especially when having a big team / big project and when needing to coordinate things.
However since I will be mostly alone with hopefully some help for special tasks, it is not necessary to complete the design before starting to code. I'd rather try a parallel approach and let both things grow together.
After all a big problem of design models like the waterfall model is that you will often have to go back and change the design according to new realizations.

I'd be grateful for more input at any time, especially since you have a lot experience in related fields (network programming, encryption) :)

Michael

P.S. I agree with Leslie, that I don't see how talking about "wire format" would be relveant for this project. Could you explain, what you mean there?
Reply all
Reply to author
Forward
0 new messages