Implementing a different routing protocol

275 views
Skip to first unread message

Fraser Cadger

unread,
Nov 27, 2012, 3:00:59 PM11/27/12
to serval-proje...@googlegroups.com

Hi all,


First of all, let me start by stating that I am very impressed with the work of the Serval Project and the Serval app. I appreciate that it is still under development, but having experimented on several Android phones I have found it really easy to use and effective.


My name is Fraser Cadger and I am a third year PhD student at the University of Ulster in Northern Ireland. My project is concerned with developing a framework to allow the streaming of multimedia content both live (i.e. video call) and on-demand (recorded videos) in disaster recovery scenarios using a mesh network of WiFi-enabled devices (currently this entails a testbed of six Android phones). As I am working with Android devices this obviously adds a layer of difficult when trying to implement ad-hoc networking. After doing some searching I came across several different implementations of ad-hoc routing on Android, and after some experimentation the two I was most interested in were Serval and Commotion (who I believe the Serval Project collaborates with). In the end I decided to work with the Serval app because I felt that was the closest to what I was doing, and I also liked how it worked on the phones.


Currently what I am interested in doing is implementing my own routing protocol (which is still under development) on the phones using Serval as a base. That is to say, that I want to replace the modified BATMAN code Serval uses for routing with the current version of my routing algorithm (originally written in C++ for ns-2 but re-writing in C should not be a huge problem). Obviously I realise this will not be a simple as copying and pasting my code in and that is why I am sending this message. From reading various comments in the code I understand that one of the main modifications to Serval is to restrict broadcasting to link-local nodes (i.e. not network-wide broadcasting), if I have understand the code correctly that is. The protocol I am developing is a variation of the greedy routing protocol GPSR http://www0.cs.ucl.ac.uk/staff/b.karp/gpsr/gpsr.html . Both the original GPSR and my own protocol use limited broadcasting as well; beacon (regular hello messages) are broadcast as far as one hop and nodes maintain tables of neighbours who can be reached directly only. There is no conventional collection of routes; instead each node forwards a packet to their neighbour who best meets the criterion/criteria (generally geographic location, i.e. located closest to the destination) one hop at a time. So packets are effectively passed from node to node without a formal route existing. This version of geographic routing is not perfect, and that is why we are working on several modifications, but for now I am content to have some form of working geographic routing up and running.


I have been reading through the code and trying to determine what parts I need to change and where to add my code. What I am looking for is the point at which a node determines where to send a packet. I realise that this will vary depending on the packet's origin, that is to say that when a node generates a new packet it will usually be treated differently from when an intermediate node receives a packet from another node. Now, if I understand correctly Serval's version of BATMAN does not use an explicit routing table structure. I have came across a struct called subscriber defined in overlay_address.h, and from what I have read this seems to act as a record of different nodes (destinations). Within the subscriber struct there is an integer variable called reachable and this determines whether a node is reachable directly via unicast, broadcast, or must be reached indirectly. If a node must be reached indirectly then there is a field called next_hop which if I understand correctly is a pointer to another struct (the intermediate node between ourselves and the destination). Is this correct? Now, what I have noticed in the code is that sometimes next_hop contains a pointer to another next_hop (i.e. next_hop->next_hop). What I'm guessing this means is that if there are multiple intermediate nodes (i.e. to send a packet to node D node A needs to send it via B and C), then this is a way of linking them as a route. So in essence, the subscriber struct contains the route to a destination (by way of the next_hop attribute).


For the actual routing, from reading the code I'm guessing that the 'overlay_route_recalc_node_metrics' function is used to determine whether a destination can be reached directly or indirectly, and if indirectly it will then assign the appropriate intermediate nodes as next_hop's. Therefore, to create or change a route this function is called. Is this correct?


In my case, I would like to do things slightly differently. As I am not doing end-to-end routing I do not need a list of destinations, instead all I want is a list of 1-hop neighbours who can be accessed directly. Then from that list I would determine which of these is the most suitable as the next hop (obviously in my case this will require other stuff, for instance adding GPS coordinates to the packet header and storing this in the subscriber field) and forward the packet to that node, and so on until the packet has been delivered (or has to be dropped).


The main questions I have are:


  • Exactly where is a packet received and the node to which it should be sent decided?

    • i.e. if I want to decide which node to forward a packet to where should I decide this?

    • I came across a method called 'overlay_mdp_receive' in mdp_client.c, is this maybe what I'm looking for?

  • Concerning the subscriber entity, is there an actual table/list/array of these – as I can't seem to find one?

    • i.e. a list of neighbours/known nodes/destinations?


I apologise if my questions and this email aren't very well-worded. Essentially what I'm looking for is some advice/guidance on exactly how routing (determining intermediate nodes for nodes which cannot be reached directly) and forwarding (looking at a received/originated packet and determining which node to send it to) is done. As I indicated earlier in this message, there are a few functions/structs I have stumbled across that I think are relevant and I have made some guesses at what they are doing, so I would appreciate if someone could correct/expand on my guesses.


Any help/guidance I have would be greatly appreciated. It goes without saying that any code I develop myself I will happily share, and any issues/bugs I come across with Serval will be reported.


Thank you for taking the time to read this message, I'm sorry it's a bit on the long side but hopefully I've made myself clear.


Regards,


Fraser


Ps. I realise this topic has been covered before, but I think some of the questions I am asking in this message are new.

Paul Gardner-Stephen

unread,
Nov 27, 2012, 5:58:33 PM11/27/12
to serval-proje...@googlegroups.com
Hello Fraser,

Sounds like an interesting project.

Jeremy has been doing the most work on the mesh routing parts of
Serval, so I expect that he will chime in with where things are in the
current state of the code. Note that routing is currently under
active development, so things are liable to change.

Back to your actual goal, which is to stream multimedia content for
disaster recovery scenarios, this is something that we have been
thinking about from the earliest, and it is nice to hear that someone
is looking to work on it.

Thinking about the general approach you are considering around greedy
routing, it may make more sense to use the Serval Rhizome
store-and-forward scheme as the basis, rather than the MDP/overlay
real-time routing. Rhizome understands the idea of a "journal" which
is really just a file that grows in successive versions. Nodes
receiving a journal can, in principle at least, pull just the new part
of the file. If the file has grown further in the meantime, then
another pull will occur. There would still be some tweaking
required using this approach, such as making Rhizome be more selective
about who it exchanges with so that it can be directed towards the
destination, but I think it would give you more resilient routing. The
tradeoff is likely to be increased latency, although I think that the
actual useful throughput would increase, because packet loss and
retransmission would be dealt with each hop. You would also be able
to use WiFi unicast packets, and thus the full WiFi bandwidth.

You should also take a look at Serval Maps that provides functionality
for nodes to share their geographic location (via Rhizome), and that
could be used in place of adding geo tags to each packet.

I guess overall I am envisaging a solution where Serval Maps provides
the geolocation information, and also possibly the user interface for
choosing which phone you want to see the video from. Then the video
or other content is pulled down via the improved Rhizome that you
would create. By using Rhizome, it doesn't matter if a link drops for
a short while, as the content will be cached on intermediate nodes,
and so it will deliver as soon as it is able.

Anyway, happy to keep thinking through the options with you, and
looking forward to seeing what you create.

We would prefer that you contribute any modifications you make back to
our repo so that everyone can benefit. We have a standard Harmony
Project issued contributor agreement that can facilitate that fairly
painlessly.

Paul.
> --
> You received this message because you are subscribed to the Google Groups
> "Serval Project Developers" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/serval-project-developers/-/MgHT2-tr_dcJ.
> To post to this group, send email to
> serval-proje...@googlegroups.com.
> To unsubscribe from this group, send email to
> serval-project-dev...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/serval-project-developers?hl=en.

Jeremy Lakeman

unread,
Nov 27, 2012, 6:00:10 PM11/27/12
to serval-proje...@googlegroups.com
On Wed, Nov 28, 2012 at 6:30 AM, Fraser Cadger <cad...@googlemail.com> wrote:
Close, next_hop always points to our immediate neighbour that we
should forward the payload to. The payloads themselves are almost
always sent in broadcast packets which may contain multiple payloads
addressed to multiple neighbours. Including payloads that should be
flooded across the entire network.

> For the actual routing, from reading the code I'm guessing that the
> 'overlay_route_recalc_node_metrics' function is used to determine whether a
> destination can be reached directly or indirectly, and if indirectly it will
> then assign the appropriate intermediate nodes as next_hop's. Therefore, to
> create or change a route this function is called. Is this correct?
>
>
> In my case, I would like to do things slightly differently. As I am not
> doing end-to-end routing I do not need a list of destinations, instead all I
> want is a list of 1-hop neighbours who can be accessed directly. Then from
> that list I would determine which of these is the most suitable as the next
> hop (obviously in my case this will require other stuff, for instance adding
> GPS coordinates to the packet header and storing this in the subscriber
> field) and forward the packet to that node, and so on until the packet has
> been delivered (or has to be dropped).
>
>
> The main questions I have are:
>
>
> Exactly where is a packet received and the node to which it should be sent
> decided?

See overlay_queue.c / overlay_stuff_packet() where we pull payloads
from our QOS queues and make a final decision about where the payload
should go next.
overlay_payload.c / overlay_frame_append_payload() is where the header
format is written into the packet.
overlay_packetformats.c / packetOkOverlay() is where those payloads
are parsed out of an incoming packet and processed or queued if they
are addressed to this node.

> i.e. if I want to decide which node to forward a packet to where should I
> decide this?
>
> I came across a method called 'overlay_mdp_receive' in mdp_client.c, is this
> maybe what I'm looking for?

That's for connected client applications to send and receive messages.
If you want to pass some extra messages between daemons, the process
is a little different.

You can schedule an alarm to periodically queue a message, or you can
hook into every outgoing packet and write your payload just before
it's sent. eg overlay_tick_interface() is called for each interface
periodically to force an outgoing packet.

You can add a case to process the message in overlay_mdp.c /
overlay_saw_mdp_frame().


> Concerning the subscriber entity, is there an actual table/list/array of
> these – as I can't seem to find one?

There's a tree in memory, built in overlay_address.c as each node is discovered.

> i.e. a list of neighbours/known nodes/destinations?
>
>
> I apologise if my questions and this email aren't very well-worded.
> Essentially what I'm looking for is some advice/guidance on exactly how
> routing (determining intermediate nodes for nodes which cannot be reached
> directly) and forwarding (looking at a received/originated packet and
> determining which node to send it to) is done. As I indicated earlier in
> this message, there are a few functions/structs I have stumbled across that
> I think are relevant and I have made some guesses at what they are doing, so
> I would appreciate if someone could correct/expand on my guesses.

Yes, basically overlay_route.c is currently responsible for path
discovery. These decisions are written into the subscriber structure
so that none of the rest of the code needs to know the internal
details of how these routes were calculated.

I also have plans to implement a different routing protocol. I've been
trying to tease apart our current spaghetti code to separate the
different layers of the implementation as much as possible. This
should enable us to drop in a different path discovery process without
needing to change much of the rest of the daemon.

I've been doing some more work in that direction recently in a
currently private branch. Mainly to tweak the packet format, reducing
header overheads, so we can lock in the network byte format for
forward compatibility. I've still got a couple more changes to
implement though.

Neighbour discovery & link state detection need to be split out from
overlay_route.c and redesigned. But I'm not quite ready to start that
task yet. We don't have any bandwidth throttling yet. If you're
planning to send large volumes of data, any excessive latency added
from buffer bloat can cause havoc with route statistics. The list of
things I'd like to get done before I start replacing the routing layer
never seems to get any shorter.

You should also check out our testing framework, specifically
tests/routing. While the setup process for these tests is a little
verbose, and the file based network simulation is a bit different to
an actual adhoc mesh. It gives you a quick way to tell if you're on
the right track without needing to wander around with phones all the
time.

>
> Any help/guidance I have would be greatly appreciated. It goes without
> saying that any code I develop myself I will happily share, and any
> issues/bugs I come across with Serval will be reported.
>
>
> Thank you for taking the time to read this message, I'm sorry it's a bit on
> the long side but hopefully I've made myself clear.
>
>
> Regards,
>
>
> Fraser
>
>
> Ps. I realise this topic has been covered before, but I think some of the
> questions I am asking in this message are new.
>

Fraser Cadger

unread,
Nov 29, 2012, 3:48:33 PM11/29/12
to serval-proje...@googlegroups.com

Paul,

Thank you very much for your fast reply!

Your suggestion of using Rhizome instead of the MDP overlay is interesting, and definitely worth considering. Although I read a little about Rhizome on the Serval website I don't think I really appreciated it's capabilities. Also, when I installed the release version of Serval from Google Play, selecting Rhizome didn't appear to do anything (I'm not sure if this was an error or because the Rhizome could wasn't fully functioning then). Therefore, when I came to read the Serval code I decided to skip the Rhizome code to save time.

Now it would seem as though this was a mistake, and your description of Rhizome in particular the journal concept sounds very interesting. I have been considering implementing some for of peer-to-peer/distributed approach for the actual streaming, due to the distribute nature of an ad-hoc/mesh network. In the envisaged networks node mobility will be permissible and this in addition to other dynamics calls for distributed approaches. The journal concept could be of great use here, with some tweaking as you mention. Using the p2p idea, I would envisage some form of peer/node selection for participation in the streaming, and then the use of my routing protocol to do the actual protocol. However, I have always intended interaction between the streaming application and routing layers, and so it is possible that the end result will lack the formal separation between overlay and 'underlay' routing found in traditional applications.

I also like your suggestion of using Serval Maps for sharing location. In most geographic routing protocols there is something known as a location service which acts as a mechanism for sharing node locations so that nodes who are not in direct contact with each other can still find their locations. Initially, to test geographic routing, I was just planning on running five phones in a static scenario and checking locations manually, just to test geographic routing was possible. However, as my project is actually concerned with mobility I will eventually need to find a way of allowing nodes to share their locations. The visual aspect of Serval Maps also sounds appealing for, as you say, choosing which nodes you want to see a video. Having a graphical means of selecting nodes for streaming is definitely something we would like.

One last question, I've had a quick look through the Serval code, and am I right in saying all of the Rhizome code is located in the serval-dna folder, or is there other Rhizome code elsewhere?

About contributing code, I am happy to do that via the repository.

Thanks again for your reply.

Fraser

Fraser Cadger

unread,
Nov 29, 2012, 5:58:15 PM11/29/12
to serval-proje...@googlegroups.com

Jeremy,

Thank you for your fast response.

Just to clarify, when you say that the payloads are almost always broadcast packets, does this mean that all data routing is done via broadcasting? So that when a call is being made, all of the VoIP packets are broadcast? None of them are unicast or delivered via multi-hop intermediate nodes? Apologies if I have misunderstood this. If that is the case, then is this a network-wide broadcast? I would assume this is quite traffic intensive, especially if more than one call is taking place at the same time? If it is restricted broadcast, then how are packets delivered to nodes located further away?

I actually looked at overlay_stuff_packet(), and I meant to ask about this, as I wasn't too sure about why it only covered broadcast traffic. However, if what I posted above is correct and all data is broadcast then that would make sense.

I will have a look at the methods/files you pointed out, although I have read through most of them there were obviously certain aspects I missed. Now that you've pointed me in the right direction, I will look again and things should become clearer.

With regards to the mdp_client_recv() function, to confirm when you talk about passing messages between daemons you mean on the same node and not actually sending messages between devices.

Thanks again,

Fraser

Jeremy Lakeman

unread,
Nov 30, 2012, 1:05:55 AM11/30/12
to serval-proje...@googlegroups.com
On Fri, Nov 30, 2012 at 9:28 AM, Fraser Cadger <cad...@googlemail.com> wrote:
> Jeremy,
>
> Thank you for your fast response.
>
> Just to clarify, when you say that the payloads are almost always broadcast
> packets, does this mean that all data routing is done via broadcasting? So
> that when a call is being made, all of the VoIP packets are broadcast? None
> of them are unicast or delivered via multi-hop intermediate nodes? Apologies
> if I have misunderstood this. If that is the case, then is this a
> network-wide broadcast? I would assume this is quite traffic intensive,
> especially if more than one call is taking place at the same time? If it is
> restricted broadcast, then how are packets delivered to nodes located
> further away?

Because we're assuming that servald is running on an adhoc wifi
medium, where at the radio level all packets are effectively broadcast
anyway, but can only be heard by nodes within range. There's only one
wifi medium, it's like having a wired network where multiple devices
are all attached to the same wire.

Unicast wifi packets have an ACK/NACK retry mechanism for additional
reliability. Sometimes when a link is starting to fail, this retry
behaviour actually causes more harm to the rest of the network.
Sending unicast network packets just means that all the other devices
in range are actively ignoring them, it doesn't change the available
bandwidth in any way.

Wifi also has some other overheads for each packet, and a minimum
period of silence before another packet can be transferred. So if
we've got a bunch of small messages to send to multiple neighbours,
why shouldn't we put them all in the same broadcast packet?

Though sometimes broadcast adhoc packets are dropped due to
collisions, and sometimes they just don't work. Some of the android
phones we're trying to support will actively filter them out when the
screen turns off.

And sometimes servald will be running on other kinds of networks where
using broadcast packets all the time is a very bad idea. So we need to
support unicast IPv4 packets as well.

We've essentially implemented our own complete network stack, covering
OSI layers 2-7 (addressing, routing, encryption, ...). That runs in
user space and uses IPv4 broadcast or unicast packets as the
underlying link layer.

So while a voice payload will technically be heard by all immediate
neighbours, only the next hop node will forward it onwards to the
destination for multi-hop communications.

Of course that's completely different to a broadcast mdp payload, that
is manually repeated by each node, on every interface, to ensure that
it reaches every single device across the entire network.

Paul Gardner-Stephen

unread,
Dec 1, 2012, 1:33:13 AM12/1/12
to serval-proje...@googlegroups.com
Hi Fraser,

Yes, you are right, Rhizome is primarily all in serval-dna
(github.com/servalproject/serval-dna), which is pulled in as a
sub-module by batphone.

As for contributing back to the repository, that would be great.

Paul.

Fraser Cadger

unread,
Dec 12, 2012, 1:43:53 PM12/12/12
to serval-proje...@googlegroups.com
Jeremy,

Thank you again for your assistance. Going through your comments and reviewing the code I now feel as though I have a fairly good knowledge of how Serval's routing works. There are a few things I would like to clear up, mostly to do with the advertisement/update messages.

If I've understood things correctly there are three types of such messages; self announce, self announce ack and node announce.

Self announce:

It seems as though self announce is sent by overlay_add_selfanouncement() found in packetformats.c. This method constructs a new frame for the announcement with a TTL of 1, the next hop address is a broadcast address which is generated by the overlay_broadcast_generate_address() function. I'm guessing what this does is create an appropriate broadcast address. The final destination is then set as OA_CODE_PREVIOUS, although I'm not really sure what this is. Is it supposed to be a previous node? I.e. one that we received a message from before? If so, what would it be if this was the first time we were sending and we hadn't heard an advertisement? If next_hop is broadcast address, why not destination (I know that in Serval a packet with a TTL of 1 can still be forwarded, but we could just reset destination if we were doing this)?

overlay_self_announcement() is called itself by overlay_init_packet() in overlay_interface.c. It appears as though this function is designed to create certain types of packets to test an interface. If it is called with the parameter tick not set to zero then overlay_add_self_announcement() is called otherwise a dummy packet is created (this is why I think it might be something to do with testing an interface). Also, I notice that in overlay_init_packet() fields such as dest are filled, but I thought they were filled in overlay_add_selfannouncement() so are they filled and then overwritten, or have I made a mistake? The only place I can see overlay_init_packet() being called with a nonzero value for int tick (i.e. the only time it would call overlay_self_announcement()) is in overlay_tick_interface().  I'm not really sure what this does (apart from calling overlay_init_packet()), but I'm guessing the tick is something do with a CPU tick. I've seen the word tick a few times, so I think this could be some sort of period and when a tick occurs the self advertisements are sent on the interface. Am I right/close? Going further, overlay_tick_interface() is called by overlay_interface_poll() and overlay_dummy_poll(). I'm guessing that these polling methods are used to periodically check the state of an interface, and in doing so send self announcements? 

The last thing I want to check here, is that after overlay_init_packet() is called in overlay_tick_interface() the function overlay_route_add_advertisement() in overlay_advertise.c is called. Looking at this quickly, it seems to me as if this method is a way of attaching and forwarding adverts received from other devices?

Receiving self announce messages seems fairly straightforward. First overlay_route_saw_self_announce() in overlay_route.c is called which in turn calls get_node() the purpose of which is to find or create a node struct for the originator of the self announce message, get_node() itself tries to find or create a subscriber. One thing I don't understand in get_node() is why reachability is set to REACHABLE_NONE here. Surely if we have received a message from that device then it is reachable? If it is connected to use then it would be reachable directly, and if the announcement originated from another node then it would be reachable indirectly and we could set next_hop as the device it came to us via? There is a comment which says 'if we're taking over routing calculations, make sure we invalidate any other calculations first' so does this mean we might have previous metrics about this node, and now we want to delete them and start afresh? This subscriber object is then returned and passed to  overlay_route_get_neighbour_structure() which in turn makes the node a neighbour. Once this has happened the payload (I'm assuming this is other node advertisements and not data) are copied from the frame, and passed along with the frame, sender, interface and neighbour struct to the function overlay_route_ack_selfannounce() which I assume is where the process of sending a self announce ack starts.

Self announce ack:

overlay_route_ack_selfannounce() is called to send acknowledgement messages. If there is no route then broadcast is used, this would seem to overwrite the earlier preference of setting the reachability to REACHABLE_NONE. So I'm guessing what this means is if we do have a route then we send the ack along it, but if we created a new node struct earlier and reachability was set to none (i.e. if this is a new node or its status has changed such as a direct node becoming indirect), we are now changing it to REACHABLE_BROADCAST and sending the packet as a broadcast (albeit with a TTL of 6 which would allow forwarding). Payload/observations are then appended (so what I said earlier about payloads, they could be observations? As well as other adverts being forwarded? Or just one of these).  Finally the ack is placed on the mesh management queue (I skipped a few steps to do with observations).

For receiving a self announce ack, the function overlay_route_saw_selfannounce_ack() in overlay_route.c is called. This function checks if the payloads are valid and then copies them (s1 and s2) along with the interface before calling overlay_route_node_can_hear_me() which is used to find or create a node entry, update the node;s observation, and then update its reachability score. The process for the first step seems similar to receiving a node advertisement except get_node() isn't called (so reachability isn't set to none) with overlay_route_get_neighbour() still being called. Reachability metrics are then calculated/recalculated using overlay_route_recalc_neighbour_metrics.

Node announce:

I'm not really sure exactly where these are sent from. I've seen the functions add_advertisement() and overlay_route_add_advertisements() in overlay_advertise.c, which I think might be involved. I remember overlay_route_add_advertisements() being called earlier on in overlay_tick_interface(), where it is described as advertisement for routes. I also noticed the function overlay_route_please_advertise() in overlay_advertise.c is called by overlay_route_recalc_metrics(). I'm guessing what's going on here is that node announcements are used to distribute details about other nodes we know about (either directly or indirectly), so if node x is connected to y and z it will send advertisements about those nodes. Am I right?

However, I'm not sure exactly what triggers their sending. I can see that overlay_route_add_advertisements is responsible for some of this, and it calls add_advertisement() by way of a callback function. So I'm guessing add_advertisement() creates the actual advertisement and then overlay_route_add_advertisements(), puts it into a frame. If overlay_route_add_advertisements() is called by overlay_tick_interface() does this mean that node advertisements are created every time there is an interface tick, and are appended to the self announcements which are also called by overlay_tick_interface()? I also noticed that there is a function called overlay_route_please_advertise() in overlay_advertise.c, this seems to be some sort of means of prioritising a node being requested over other nodes (presumably it is not always possible to advertise all nodes at the same time).  Now I noticed that above this there is a comment stating that nodes with lower scores are advertised more frequently as part of some form of round robin scheduling, so nodes with scores < 100 always get advertised and so on. Also, overlay_route_please_advertise() is called by overlay_route_recalc_metrics() whenever a node's score has changed substantially.

Receiving a node announcement seems to be handled by overlay_route_saw_advertisement() in overlay_advertise.c which then uses these adverts to update its routing table using the method overlay_route_record_link().

I'm guessing that node announcements are used as a way of both informing other devices about nodes we know about (and who they may or may not know about) and then telling them our information on them has changed. I believe Serval/BATMAN does not have end-to-end routes for destinations, but instead only contains the right 'direction' to forward a packet to a destination not reachable directly. Node announcements are a way of discovering devices we are not in direct contact with, and we then record these nodes and the scores are a means of determining their reachability. Does this then mean that scores are only used for non-direct nodes? Also, I may have simply missed it, but is there any mechanism for determining which next hop is best for a particular destination? I.e. if we know about a destination via two neighbours, would we decide which was best or have more than one route for that destination?

If the above (or some of it) is true, I take it that self announcements are intended as a means of identifying nodes that we are directly connected to; i.e. if a new node moves into our vicinity and we hear it's self announcement we add it to our list of known nodes and ack? I take it self announcements are sent at regular intervals then so that we can see who is still there and remove devices who are no longer within our range? The self announcements seem to be connected to interface ticks, does this mean that ticks are periodic and controlled by some timer? I.e. every x seconds we will perform a tick and send out self announcements (as well as announcements of nodes we are connected to)?

I apologise for the length of this message, I thought about shortening it down so I just had questions but then I decided it would make more sense if I tried to explain my understanding so that you could confirm/deny/expand on this. I also realise I may have asked some questions more than once, as I'm trying to work out both the bigger picture and small details!

Thanks again for the time and effort you have taken to respond, I really appreciate this.

Fraser

Jeremy Lakeman

unread,
Dec 12, 2012, 5:45:17 PM12/12/12
to serval-proje...@googlegroups.com
On Thu, Dec 13, 2012 at 5:13 AM, Fraser Cadger <cad...@googlemail.com> wrote:
> Jeremy,
>
> Thank you again for your assistance. Going through your comments and
> reviewing the code I now feel as though I have a fairly good knowledge of
> how Serval's routing works. There are a few things I would like to clear up,
> mostly to do with the advertisement/update messages.
>
> If I've understood things correctly there are three types of such messages;
> self announce, self announce ack and node announce.
>
> Self announce:
>
> It seems as though self announce is sent by overlay_add_selfanouncement()
> found in packetformats.c. This method constructs a new frame for the
> announcement with a TTL of 1, the next hop address is a broadcast address
> which is generated by the overlay_broadcast_generate_address() function. I'm
> guessing what this does is create an appropriate broadcast address. The
> final destination is then set as OA_CODE_PREVIOUS, although I'm not really
> sure what this is. Is it supposed to be a previous node? I.e. one that we
> received a message from before? If so, what would it be if this was the
> first time we were sending and we hadn't heard an advertisement? If next_hop
> is broadcast address, why not destination (I know that in Serval a packet
> with a TTL of 1 can still be forwarded, but we could just reset destination
> if we were doing this)?

All payloads have four possible addresses attached to them; source,
destination, nexthop and sender
For broadcast payloads both destination and nexthop are set to broadcast.

The packet format you are looking at didn't actually have a way to
specify who sent this packet. So as a temporary measure, without
breaking backwards compatibility, I made sure that every packet had a
self announce as the first payload. with the intention of redesigning
the packet format completely at some point.

Note that the packet format has now been redesigned and shouldn't need
to change again.

> overlay_self_announcement() is called itself by overlay_init_packet() in
> overlay_interface.c. It appears as though this function is designed to
> create certain types of packets to test an interface. If it is called with
> the parameter tick not set to zero then overlay_add_self_announcement() is
> called otherwise a dummy packet is created (this is why I think it might be
> something to do with testing an interface). Also, I notice that in
> overlay_init_packet() fields such as dest are filled, but I thought they
> were filled in overlay_add_selfannouncement() so are they filled and then
> overwritten, or have I made a mistake? The only place I can see
> overlay_init_packet() being called with a nonzero value for int tick (i.e.
> the only time it would call overlay_self_announcement()) is in
> overlay_tick_interface(). I'm not really sure what this does (apart from
> calling overlay_init_packet()), but I'm guessing the tick is something do
> with a CPU tick. I've seen the word tick a few times, so I think this could
> be some sort of period and when a tick occurs the self advertisements are
> sent on the interface. Am I right/close? Going further,
> overlay_tick_interface() is called by overlay_interface_poll() and
> overlay_dummy_poll(). I'm guessing that these polling methods are used to
> periodically check the state of an interface, and in doing so send self
> announcements?

Ticking is the interval that we send a packet to tell other neighbours
that we exist.

> The last thing I want to check here, is that after overlay_init_packet() is
> called in overlay_tick_interface() the function
> overlay_route_add_advertisement() in overlay_advertise.c is called. Looking
> at this quickly, it seems to me as if this method is a way of attaching and
> forwarding adverts received from other devices?

Correct, this announcement process is based loosely on the BATMAN
routing protocol.

> Receiving self announce messages seems fairly straightforward. First
> overlay_route_saw_self_announce() in overlay_route.c is called which in turn
> calls get_node() the purpose of which is to find or create a node struct for
> the originator of the self announce message, get_node() itself tries to find
> or create a subscriber. One thing I don't understand in get_node() is why
> reachability is set to REACHABLE_NONE here. Surely if we have received a
> message from that device then it is reachable? If it is connected to use
> then it would be reachable directly, and if the announcement originated from
> another node then it would be reachable indirectly and we could set next_hop
> as the device it came to us via? There is a comment which says 'if we're
> taking over routing calculations, make sure we invalidate any other
> calculations first' so does this mean we might have previous metrics about
> this node, and now we want to delete them and start afresh?

This is another hopefully temporary hack. I've been adding the
capability to communicate with other nodes via unicast IPv4 packets.
Or on interfaces that we aren't running our routing engine on.
But as I intend to replace the existing routing code entirely, I
haven't tried to make it work with these new unicast links.

I'm going to build a more complete layer for discovering and managing
all links between neighbours, this will need to handle cases where
either broadcast or unicast packets fail to arrive.
Once that's done it will be easier to experiment with different
approaches for discovering network paths, as there will be a clear
separation between the two layers and it should be possible to use any
link type in the routing table.

> This subscriber
> object is then returned and passed to
> overlay_route_get_neighbour_structure() which in turn makes the node a
> neighbour. Once this has happened the payload (I'm assuming this is other
> node advertisements and not data) are copied from the frame, and passed
> along with the frame, sender, interface and neighbour struct to the function
> overlay_route_ack_selfannounce() which I assume is where the process of
> sending a self announce ack starts.
>
> Self announce ack:
>
> overlay_route_ack_selfannounce() is called to send acknowledgement messages.
> If there is no route then broadcast is used, this would seem to overwrite
> the earlier preference of setting the reachability to REACHABLE_NONE. So I'm
> guessing what this means is if we do have a route then we send the ack along
> it, but if we created a new node struct earlier and reachability was set to
> none (i.e. if this is a new node or its status has changed such as a direct
> node becoming indirect), we are now changing it to REACHABLE_BROADCAST and
> sending the packet as a broadcast (albeit with a TTL of 6 which would allow
> forwarding). Payload/observations are then appended (so what I said earlier
> about payloads, they could be observations? As well as other adverts being
> forwarded? Or just one of these). Finally the ack is placed on the mesh
> management queue (I skipped a few steps to do with observations).

With the packet format re-write, the reachability is now set to
REACHABLE_BROADCAST | REACHABLE_ASSUMED. Eg, try sending payloads via
a broadcast packet, but we don't really know if it will work.
Yeah, none of that prioritisation actually works though.

> Receiving a node announcement seems to be handled by
> overlay_route_saw_advertisement() in overlay_advertise.c which then uses
> these adverts to update its routing table using the method
> overlay_route_record_link().
>
> I'm guessing that node announcements are used as a way of both informing
> other devices about nodes we know about (and who they may or may not know
> about) and then telling them our information on them has changed. I believe
> Serval/BATMAN does not have end-to-end routes for destinations, but instead
> only contains the right 'direction' to forward a packet to a destination not
> reachable directly. Node announcements are a way of discovering devices we
> are not in direct contact with, and we then record these nodes and the
> scores are a means of determining their reachability. Does this then mean
> that scores are only used for non-direct nodes? Also, I may have simply
> missed it, but is there any mechanism for determining which next hop is best
> for a particular destination? I.e. if we know about a destination via two
> neighbours, would we decide which was best or have more than one route for
> that destination?

Should be the shortest, highest score.

> If the above (or some of it) is true, I take it that self announcements are
> intended as a means of identifying nodes that we are directly connected to;
> i.e. if a new node moves into our vicinity and we hear it's self
> announcement we add it to our list of known nodes and ack? I take it self
> announcements are sent at regular intervals then so that we can see who is
> still there and remove devices who are no longer within our range? The self
> announcements seem to be connected to interface ticks, does this mean that
> ticks are periodic and controlled by some timer? I.e. every x seconds we
> will perform a tick and send out self announcements (as well as
> announcements of nodes we are connected to)?

Yep, interface->tick_ms is the interval.

> I apologise for the length of this message, I thought about shortening it
> down so I just had questions but then I decided it would make more sense if
> I tried to explain my understanding so that you could confirm/deny/expand on
> this. I also realise I may have asked some questions more than once, as I'm
> trying to work out both the bigger picture and small details!
>
> Thanks again for the time and effort you have taken to respond, I really
> appreciate this.
>
> Fraser

I'd much rather help someone who can demonstrate that they've tried to
understand the problem on their own.

Jeremy Lakeman

unread,
Dec 13, 2012, 2:12:02 AM12/13/12
to serval-proje...@googlegroups.com
> I'd much rather help someone who can demonstrate that they've tried to
> understand the problem on their own.

I suppose I should clarify, just in case you take that sentence the
wrong way. It's very encouraging that you are showing an interest in
understanding our software for yourself. It can be very frustrating
working with people who fail to demonstrate any independent progress.

Feel free to shoot off an email with any difficult unanswered
questions you have at the end of each day. With timezone differences
it's likely we can answer them before you start in the morning. I
don't mind taking half an hour here or there to solve a problem that
might take you a couple of days to solve otherwise.

Also note that our software has been undergoing quite rapid changes
lately. Some of the research and understanding you have gained may
already be obsolete.

If you can let me know some more details about your previous routing
experiments, I'll probably be able to refactor the interfaces to our
existing routing layer to make it easier for you to make the changes
you need. I'm intending to replace the routing layer eventually
anyway, so this will certainly be a productive use of my time.

Fraser Cadger

unread,
Dec 13, 2012, 2:27:35 PM12/13/12
to serval-proje...@googlegroups.com
Jeremy,

Thanks again for your help. I appreciate that as you said, it can be difficult trying to help people understand your own software. I decided to send my initial message because I had tried reading through the code several times, picking up bits and pieces, but being unsure about exactly how things fitted together. I genuinely found your previous comments very illuminating and useful, and when I compared your comments with the code a great deal of it made sense and I then started thinking about ways of implementing my own routing. In fact, I produced a short report for my supervisor based on this and detailing the steps I intended to take to implement our routing protocol on Serval. I was about to dive in with this when I realised that I did not really know how the update process worked and that this was a very important factor! So I looked at the code again and tried to work my way through, making notes, those notes were then used in my last post. As you said in your comment it is better to work with people who are making independent progress, so that was why I tried to explain my understanding instead of jumping straight in with questions. Actually, I have been meaning to check back with the current version of the software in repository, as I noticed a few functions were in different locations and you mentioned that the self announce had been redesigned, so this is something I'll look at.

Ok, so I'll now try to explain what I intend to do with routing.

Our overall aim is to develop a protocol called Geographic QoS Predictive Routing (GQPR) which uses geographic/location/mobility information as well as other context to make QoS predictions for neighbours, these predictions are then used as the basis for forwarding decisions. GQPR is intended to act as the routing element in a framework called Geographic QoS Peer-to-peer Streaming (GQP2PS) which will run on WiFi devices (at the moment a testbed of six Android phones + my own Android tablet) and facilitate the streaming of both live (i.e. video call) and on-demand (i.e. recorded video) between the devices. The intended use case is for disaster recovery scenarios, for instance someone with basic medical/first aid training comes across an injured person but is not exactly sure what course of action to take, so they can initiate a video call with a doctor located at base station (or anywhere else covered by the network) who is able to view the patient and perform a diagnosis and if necessary supervise the treatment. This is just on scenario, it could also be used for repairing infrastructure where an engineer supervises and guides the repair process. Indeed, the technology could be used for many purposes unrelated to disaster recovery, but we are focussing on this to give us a specific aim, although the ideas for the technology actually came before the idea for application.

What we have at present is the building blocks of GQPR which we have called GPR - Geographic Predictive Routing - GPR contains some of the main elements of what will become GQPR but not all of the functionality. At present GPR uses location predictions (provided by an Artificial Neural Network) as well as some other factors (congestion level, radio range, a metric we'd developed called neighbour range (this relates to the positions of a neighbour's neighbours), amongst others to make forwarding decisions. I think explaining GPR would make the most sense if I first explained how basic geographic routing worked and then discussed our modifications to it. As the name implies, geographic routing makes forwarding decisions based on node's geographic locations instead of logical addresses. The most basic form of geographic routing is greedy geographic routing.

Greedy geographic routing works as follows:
  • A packet is received, the node inspects the packet and records the location of the destination
  • The node then calculates the physical distance between itself and the destination
  • All of the node's neighbours are then compared against this distance
  • The one with the shortest distance to the destination is selected as the next hop
Greedy geographic routing does not create end-to-end routes (which is why some people prefer the term forwarding, although the terms are used interchangeably in literature), instead nodes are forwarded on a per-hop basis using the greedy criterion. This does seem a bit similar to BATMAN, where packets are also forwarded on a per-hop basis, however the big difference is that in greedy geographic routing nodes only have knowledge about their directly connected neighbours (i.e. those within their radio range) and they do not know about destinations. So while BATMAN nodes know of the destination node's existence (but only know the next hop to reach it and not a full route), greedy geographic routing nodes know nothing about the destination (aside from its address and location) and simply make the forwarding decision based on which neighbour is closest to that destination. The idea being that by shortening the geographic distance at each hop we are finding the quickest way to reach a destination. 

Obviously this does not always hold true in practice, and their is a problem known as the local maximum where nodes are forced to drop packets if they cannot find a neighbour closer to the destination than themselves (to prevent loops, the packet cannot travel physically backwards). However, greedy geographic routing is relatively lightweight, and highly localised as nodes are only required to know their directly connected neighbours and so are not affected by topology changes at the other side of the network (although this does mean nodes will sometimes forward packets to unreachable destinations). There have been quite a few modifications and variants to standard greedy geographic routing, some of which are more similar to conventional routing protocols by establishing end-to-end routes (but using geographic information to do so), while others focus on particular problems (QoS, energy consumption, security, etc.) I spent a large period of my first year surveying different geographic routing protocols and published this as a paper if you're interested in finding more about geographic routing (shameless plug for my own paper!!!); http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6238283&tag=1 .

Getting back to basic greedy routing, obviously nodes need a way of discovering their immediate neighbours as well as finding the locations of other nodes who they are not directly connected to. The first is achieved through the sending of beacon messages at set periods of time, beacons are just your bog standard hello-type messages except they contain the coordinates of the sending node. Neighbours are stored in a dedicated neighbour table along with their coordinates, this neighbour table is the only table nodes store. The second is more difficult, but typically involves the use of something called a location service which usually involves certain nodes being designated as location servers. Regular nodes periodically send location updates to these servers and when a node needs to send a packet to a destination it is not connected to it will query the relevant location server for that node's destination. Exactly how the location service works depends on the particular scheme being used.

So now to talk about GPR.

Structurally GPR is very similar to classical greedy geographic routing, it was originally implemented on top of the code developed by Karp and Kung for their protocol known as GPSR (Geographic Perimeter Stateless Routing). What we did was first implement a neural network algorithm which predicts the future location of a device based on two previous coordinates, and the times they were recorded at. Tests found that the location prediction algorithm was able to accurately predict the future locations of neighbouring devices with an error of less than 1m in most instances. At present the use of the location prediction algorithm is relatively simple, all we are doing is looking at our neighbours two previous locations (and timestamps) and using these to determine what position the neighbour will be in when we send the packet. This allows us to use greedy geographic routing, but with the ability to cope with changes in location. In order to avoid sending packets to nodes that are too far away, we always check their predicted location against our transmission range to ensure they are within it. So in addition to using the location predictions we developed a modified congestion control algorithm which weighs the neighbour's congestion level against the distance between the neighbour and other factors such as the reliability of the node's information. The calculation is stored as a value t, and then we evaluate the neighbour's neighbour range (i.e. the physical range covered by its own neighbours), and if that neighbour is the best it becomes the candidate node and the other node's values for t are compare against it, and so until we finish the loop and end up with the best neighbour (or none in which case we'll drop the packet).

Another factor is that because we are using location predictions we do not need node updates as frequently as geographic routing normally dictates. So while ordinary geographic routing protocols usually send these messages over 0.5s we send them every 13.6s which reduces the amount of control traffic. This is a number based on ns-2 experimentation and it is likely that in real wireless network it will need to be more frequent, but we still expect it to be a lot less frequent than the conventional 0.5s.

It might not seem like a huge modification, but it performs a lot better than unmodified greedy geographic routing in simulations of video calling and streaming traffic, and compares well with AODV, DSR, and DSDV. If you look at the bulletin points I typed above to describe greedy geographic routing, it is very similar except our means of calculating the best next hop is slightly different.

As I said earlier, I had thought about and discussed with my supervisor how we could go about implementing GPR in Serval. We both decided that it was best to implement basic greedy geographic routing first and then work from there. There is a possibility that the NN algorithm we used in simulations will be too resource-intensive for the phones we are using and will either need to be modified or replaced with a different approach (we have some backups), so our plan is to implement basic geographic routing first, conduct separate experiments of the NN algorithm on the phones, and if everything goes well combine them, build GPR, and then continue working on the phones and simulations to create GQPR. 

In terms of implementing basic greedy geographic routing, I think making the forwarding decisions could be quite simple. When we are in overlay_stuff_packet(), instead of looking at the destination's next_hop field we would call a function which would loop through our list of neighbours and find the neighbour closest to the destination and select that as the next hop for the packet (or drop it if we can't find a suitable next hop). So that part seems quite simple.

However, what seems a bit more complicated is implementing the beaconing messages. I can see two ways of doing this; keep the current Serval system of self announcements, self announcement acks, and node announcements and simply add locations to these messages. The other would be to get rid of node announcements, and just use self announcement and self announcement acks to transmit location information between directly connected nodes. This actually brings me to a question I forgot to ask; if self announce messages are regularly sent why do we need to ack these? If we are sending our own acks and receiving acks from other neighbours do we need an explicit reply? I.e. if we just send self announce messages, but after a period of time we don't get any self announce messages from neighour x, we remove them from our list of neighbours. Slight digression.

Both of the two approaches I described have their advantages; the first is ostensibly simpler as we don't need to stop anything being sent, while the second means that we are getting something closer to the 'traditional' beacon approach used in geographic routing as well as avoiding the transmission of node announcements and the recording of indirect nodes which we will just ignore. On the other hand, if we keep node announcements this could actually act as a 'surrogate' for the location service if we include their locations. So if for instance we have three nodes; a, b, and with b being connected to a and c, a only being connected to b, and c only being connected to c. If b advertises c to a and a to c, then a is able to know the location of c (and vice versa) despite not being connected and without the need for a location service (obviously this is a trivial example as in this instance a can only reach c via b, so b is not really compared against anyone else). 

I realise that Serval's implementation of BATMAN is intended to limit the number of nodes a particular device knows about to avoid bandwidth going out of control, but as we only have a testbed of six this shouldn't be a problem for our experiments. Although we would obviously need to rethink for future uses which involve larger networks. So I imagine this would really be a temporary way of avoiding a location service, and in the long run we would still try to implement an explicit location service (possibly based on Serval Maps which Paul pointed me in the direction of). However, I did discuss the idea of retaining some BATMAN elements in our implementation (where we have some knowledge of nodes we are not directly connected to), instead of going for a pure greedy geographic routing approach (where we only know about our one-hop neighbours) as we are by no means obliged to use a particular approach and our ultimate aim is performance. Therefore it is possible that this will lead to us developing a hybrid approach between GPR/GQPR and BATMAN.

However, I think for now the best approach might be for us to try both messages of messaging and see how they perform. I'm meeting my supervisor tomorrow, and I'm going to discuss this with him. I've read through your comments fully, so I think I should be able to start planning how I will modify Serval announcement/advertisement messages and then try to get something running next week. Unfortunately I am going back home for the Christmas holidays, and I will only have access to an Android tablet and a Windows laptop (which belongs to someone else!), so I'm not sure how much work I'll get done until January, but if I can get Android NDK up and running on the laptop I might be able to get some routing work done.

Hopefully, the explanation of my work makes sense (or enough sense), and as I indicated before I am happy to share any code I develop myself (I have spoken about this with Paul). So in addition to developing my routing protocol I intend to work on video calling, and Paul stated this was something the Serval team wanted to include, so if I get that working I am more than happy to put the code in the repo.

Regards,

Fraser

Paul Gardner-Stephen

unread,
Dec 13, 2012, 3:33:55 PM12/13/12
to serval-proje...@googlegroups.com
Hi Fraser,

It is great to see the depth of your explorations and thinking on all
this; most refreshing.

I am sure that Jeremy will have more input, but a few things that come to mind:

1. Using location-enabled routing is desirable, but there may be
instances where people don't want their location to be advertised, and
so whatever solution you create needs to be compatible with
non-location-based routing if it is to eventually end up in the Serval
source. I don't see this as a big problem, but it is probably
sensible to think about how you implement your schemes with that in
mind. For example, it probably suggests adding location as an
optional element to the self-announcement packets rather than doing
away with the self-announcement packets.

2. The local maxima / routing loop problem can be done away with if
you are performing directed routing of Rhizome bundles. This means
that simple greedy geographic routing can probably be fairly well
applied to Rhizome to help push bundles in the right direction, but
relying on the Rhizome semantics that mean that it is not a problem if
a bundle is offered to a node that has already seen it. Similarly,
bundles will "leak" out of peninsulas by the normal "brownian motion"
of undirected Rhizome manifest announcements. Thus there might end up
being two geographic routing schemes; one for Rhizome that is simpler,
and one for MDP traffic that is more complex.

Paul.

Jeremy Lakeman

unread,
Dec 13, 2012, 4:42:46 PM12/13/12
to serval-proje...@googlegroups.com
The real world isn't a flat 2d plain with no obstructions. But with
ACK's, NACK's and re-transmission it shouldn't matter too much if we
try to send a payload to a node that's now out of range. It should be
possible to detect broken network paths lazily with minimal impact to
throughput and latency. Plus when the network is busy, every data
packet can act as a beacon without wasting bandwidth.

> It might not seem like a huge modification, but it performs a lot better
> than unmodified greedy geographic routing in simulations of video calling
> and streaming traffic, and compares well with AODV, DSR, and DSDV. If you
> look at the bulletin points I typed above to describe greedy geographic
> routing, it is very similar except our means of calculating the best next
> hop is slightly different.
>
> As I said earlier, I had thought about and discussed with my supervisor how
> we could go about implementing GPR in Serval. We both decided that it was
> best to implement basic greedy geographic routing first and then work from
> there. There is a possibility that the NN algorithm we used in simulations
> will be too resource-intensive for the phones we are using and will either
> need to be modified or replaced with a different approach (we have some
> backups), so our plan is to implement basic geographic routing first,
> conduct separate experiments of the NN algorithm on the phones, and if
> everything goes well combine them, build GPR, and then continue working on
> the phones and simulations to create GQPR.
>
> In terms of implementing basic greedy geographic routing, I think making the
> forwarding decisions could be quite simple. When we are in
> overlay_stuff_packet(), instead of looking at the destination's next_hop
> field we would call a function which would loop through our list of
> neighbours and find the neighbour closest to the destination and select that
> as the next hop for the packet (or drop it if we can't find a suitable next
> hop). So that part seems quite simple.

How to you communicate the current geographical location of the destination?
How does the node that creates the payload work this out?
For that matter, how are we going to communicate the device's location
to the servald daemon?
What about devices that don't have a working location service? Or they
don't currently have the available power to run one?
But don't let that stop you from experimenting.

> However, what seems a bit more complicated is implementing the beaconing
> messages. I can see two ways of doing this; keep the current Serval system
> of self announcements, self announcement acks, and node announcements and
> simply add locations to these messages. The other would be to get rid of
> node announcements, and just use self announcement and self announcement
> acks to transmit location information between directly connected nodes.

I would recommend you keep the existing messages and formats, and add
new message types that are sent at the same time, in the same packet.

Firstly, if the existing messages change, you don't need to maintain
your forked version. Secondly, network nodes that aren't implementing
your routing protocol will still be able to communicate in other ways.

Which is an important point. I want to make it easy to use servald as
a platform for future routing experiments such as this. We should aim
to build a single generic payload format and internal memory format
for tracking links to neighbours, while allowing the storage and
communication of extra protocol specific link information. It may be
useful to forward this protocol specific data even if we don't
understand it.

I think it's reasonable for the core of servald to build a map of all
2-hop neighbours, regardless of the routing approach being used. This
should allow us to limit unnecessary repetition of messages that need
to be flooded to all nodes, similar to olsr's MPR selection.

> This
> actually brings me to a question I forgot to ask; if self announce messages
> are regularly sent why do we need to ack these? If we are sending our own
> acks and receiving acks from other neighbours do we need an explicit reply?
> I.e. if we just send self announce messages, but after a period of time we
> don't get any self announce messages from neighour x, we remove them from
> our list of neighbours. Slight digression.

We can't assume all network links are symmetrical. At the physical
layer, a high powered transmission can be heard by nodes in a large
area, but that doesn't mean they can all transmit with enough power to
send a packet in the other direction. Localised interference can also
be a significant factor. The adhoc wifi standard has a number of flaws
that can prevent packets being delivered. Just because you can hear a
broadcast packet, doesn't mean you could receive a unicast one and
vice versa. And since Android doesn't support adhoc mode specifically,
it isn't tested at all by device manufacturers. And most android
devices deliberately drop all broadcast traffic when they enter power
saving modes, eg when the display powers off. The real world is a very
messy place.

Jeremy Lakeman

unread,
Dec 13, 2012, 7:13:28 PM12/13/12
to serval-proje...@googlegroups.com
On Fri, Dec 14, 2012 at 5:57 AM, Fraser Cadger <cad...@googlemail.com> wrote:
> Unfortunately I am going back home for the Christmas holidays, and I will
> only have access to an Android tablet and a Windows laptop (which belongs to
> someone else!), so I'm not sure how much work I'll get done until January,
> but if I can get Android NDK up and running on the laptop I might be able to
> get some routing work done.

While I think of it, you should be able to make significant progress
without deploying to android at all.
I'd recommend that you add a servald command line function to send a
location update to the daemon.
And setup as a few simple synthetic tests using our bash based testing
framework so you can confirm basic operation, see for example
tests/routing.
The lifecycle for testing on android is very time consuming; build,
deploy, wait for GPS lock, move devices around, wait, collect devices,
collect log files, examine what went wrong....
Try to limit deploying android to real world confirmation of what you
have already tested under ideal conditions.

Fraser Cadger

unread,
Dec 14, 2012, 8:54:55 AM12/14/12
to serval-proje...@googlegroups.com
I should have perhaps made it more clear that when we are using location prediction to predict nodes outside of range it is primarily intended as an improvement for geographic routing. In geographic routing a dropped packet is pretty much unrecoverable, and more so there is no real means of the sender finding out packets have been dropped (via notification by the dropping node) so it may be a while before the sender finds this out. So in a scenario where a lot of packets are being sent to a destination who is unreachable the sender may send many packets and waste bandwidth and energy, obviously this depends on the application as interactive applications such as VoIP will usually terminate the stream if they do not receive any packets from the source. However, in an ad-hoc network we obviously want to avoid every single unnecessary packet.

That said, using location prediction and radio range comparisons is really only one small us for location predictions. In fact, the uses we have currently found for location predictions are really just scratching the surface. Our ultimate goal is to build a context-aware protocol, as mobility and location are such a huge factor in ad-hoc networks we have placed priority on this area, but even then all we are really doing is predicting where a node will be in the future. This has been useful, and will continue to be useful as I have some more ideas for this, but so far we have primarily been using location and not mobility as such. This is something we need to look at in the future if we are to achieve our aims.

Typically when we were simulating we always used UDP traffic so there were no acks or resends involved, this was always the approach we intended to use for the real world as well, so essentially every packet is important and we must do our best to get it delivered (although this is generally not possible and some packet drop is understandable, but it should be minimised as much as possible).

Actually, one I idea I just had is that if a packet is sent to an unavailable node or is dropped by an available node experiencing high congestion, if other nodes overhear the packet then they could forward it instead. This would be particularly applicable to Serval which broadcasts multiple Serval packets inside a single UDP packet, so if a node somehow *knew* that the intended next hop was unavailable we could forward the packets intended for them (and communicate to the previous hop/sending node that their intended next hop was unavailable). Exactly how we would know this (bearing in mind that the previous node obviously didn't know this) I'm not too sure, obviously we wouldn't want 'false alarm' situations where we think the next hop is unavailable forward the packet, and it turns out that the intended next hop is available and has forwarded the packet as well, and so on. However, I think if we do some calculation based on location as well as signal, it could be possible to identify (with *reasonable* accuracy) instances in which we have overheard a packet intended for a node no longer available and take over forwarding.
Typically (but not in every case) a node will consult the location server for the location of the destination prior to sending. If the location server queried does not know the destination then it will send the request to another location server. Just to clarify, when I use the term location service I mean a way of sharing device locations and not a system such as GPS or Galileo for finding your own location. The exact implementation of a location various, some assume the presence of static/infrastructure nodes who are distributed evenly and these nodes act as the servers covering particular geographic sectors. Most don't assume this, and instead assign nodes as location servers (based on their attributes, election, or some other means - often similar to cluster heads in wireless sensor networks) to cover particular areas. Obviously there are problems to this such as a location server going down, but most have a secondary (or several secondary) servers ready to take their place. Another issue is what happens if a server moves out of the area they are supposed to be covering? Again, depending on the system they should have another device to distribute the locations to. Other approaches try to avoid this by using a purely distributed method, but this still requires the need to find the relevant node hosting the location data.

In reality, many papers within the area of geographic routing avoid the question of how a location service will work and (as I said before) simply pull the destinations coordinates directly from the simulator. Implementations of geographic routing have tried to resolve this by using a flooding protocol, but this is obviously inefficient and isn't something we want to do. Similarly, many implementations have been for static networks, and nodes were manually informed of other nodes coordinates. Again, this is not an approach open to us as we are looking at mobility. For our initial experiments we intend to use fake coordinates hard-coded into the nodes, and experiment with only static scenarios to confirm that geographic routing is working in principle. Then we will look at real GPS coordinates in static scenarios, and when this is working we will look at the issue of discovering destination locations.

I am not 100% sure exactly how we will do this. I originally envisaged adapting one of the location servers available in literature (the code for some of them is available as C++ code for ns-2, so we would simulate first, find the best and then lift the code) for implementation on the devices. However, as I mentioned in my last post I am now strongly considering using Serval in some way. Paul mentioned the possibility of Serval Maps and while I have only had a surface inspection of this, it is a possibility. Similarly, as I indicated in my last post I am seriously considering the possible use of node announcements as a means of distributing location between nodes who are not directly connected. I think that would definitely be something worth considering, however as the Serval implementation limits the number of nodes we know about this is obviously not a permanent solution.

As for getting the coordinates into servald, this could be slightly tricky but I don't think impossible. Although Android does not provide a C library which will allow NDK code to directly access the GPS device there have been several successful attempts at doing this. One possible approach would be to write a C wrapper for using the Android Locationmanager library, another could be to pass messages between the Java portion of Serval and the C portion, while a more convoluted (and probably slower) approach would be for the Java part to write GPS locations to a text file which would then be read by servald.

> However, what seems a bit more complicated is implementing the beaconing
> messages. I can see two ways of doing this; keep the current Serval system
> of self announcements, self announcement acks, and node announcements and
> simply add locations to these messages. The other would be to get rid of
> node announcements, and just use self announcement and self announcement
> acks to transmit location information between directly connected nodes.

I would recommend you keep the existing messages and formats, and add
new message types that are sent at the same time, in the same packet.

Firstly, if the existing messages change, you don't need to maintain
your forked version. Secondly, network nodes that aren't implementing
your routing protocol will still be able to communicate in other ways.

This makes sense. Although I'm obviously aware that Serval places its own packets inside an UDP packet for the actual transmission, I somehow keep treating these packets as though they are independent packs being unicast. That is why I didn't consider the possibility of creating my own type of broadcast packet and sending it alongside the existing packets. I'm curious though, would it not make more sense to just add fields for coordinates to the existing packets? I would only need two fields, one for longitude and another for latitude so they shouldn't take up too much space. The only downside I could see to this is if the packet format change considerably (as you mention), but if I am only adding two new fields would I really need a new packet type?
I think the use of acks makes a bit more sense now. I was probably just used to the conventional hello-type messages sent by most ad-hoc routing protocols. Yes, uni-directionality can be a problem and it doesn't make sense to assume that because we can hear a neighbour they can hear us. So I can see how acks would help avoid uni-directional links being misunderstood as bi-directional. As I said earlier, I am strongly leaning in favour of keeping the existing Serval announcement system and adding my stuff to this. 

Fraser Cadger

unread,
Dec 14, 2012, 9:17:10 AM12/14/12
to serval-proje...@googlegroups.com
On 13 December 2012 20:33, Paul Gardner-Stephen <pa...@servalproject.org> wrote:
Hi Fraser,

It is great to see the depth of your explorations and thinking on all
this; most refreshing.

I am sure that Jeremy will have more input, but a few things that come to mind:

1. Using location-enabled routing is desirable, but there may be
instances where people don't want their location to be advertised, and
so whatever solution you create needs to be compatible with
non-location-based routing if it is to eventually end up in the Serval
source.  I don't see this as a big problem, but it is probably
sensible to think about how you implement your schemes with that in
mind.  For example, it probably suggests adding location as an
optional element to the self-announcement packets rather than doing
away with the self-announcement packets.

For our specific project we are intending to develop a solution that would be used by organisations responding to a disaster. The idea being that the people using the network would be either professionals (firefighters, paramedics, etc.) or volunteers, the equipment they would use would be provided for them and they would be working under the auspices of a specific agency. Therefore, in this case sharing location would be part of the job they are doing and would be expected, as the devices would not be their own and as they would only be using them for the disaster management work, it is not anticipated that sharing location would be a significant ethical issue. However, as I said in another message I think this technology could be of use for other applications, and these may bring about some ethical objections to location sharing.

This also touches on another issue, where our protocol would currently cease to work or not function properly if no location information was available. If previous location information was available it could be used, but it may be out of date and unreliable, however if no location information is available then the protocol will not work. Generally GPS is available, but there problems particularly in mobile phone GPS hardware that can cause issues. Similarly, although we are intending to use the system outdoors it does make sense to consider indoor applications as well.

The above seems like a case for including some elements of BATMAN in our proposed protocol, so that if no location information is available we can still forward packets using a more conventional mechanism.

2. The local maxima / routing loop problem can be done away with if
you are performing directed routing of Rhizome bundles.  This means
that simple greedy geographic routing can probably be fairly well
applied to Rhizome to help push bundles in the right direction, but
relying on the Rhizome semantics that mean that it is not a problem if
a bundle is offered to a node that has already seen it.  Similarly,
bundles will "leak" out of peninsulas by the normal "brownian motion"
of undirected Rhizome manifest announcements.  Thus there might end up
being two geographic routing schemes; one for Rhizome that is simpler,
and one for MDP traffic that is more complex.

I had actually been meaning to send another message about Rhizome for some time. I did some research into the code and I also came across a blog post by yourself discussing your tests with Rhizome implemented over MDP; http://servalpaul.blogspot.co.uk/2012/12/rhizome-over-mdp.html . I find the concept of the Rhizome journalling quite interesting, and I think it would definitely be relevant for the sharing of on-demand (recorded files), however I am not sure if it would work for the interactive video calling/VoIP traffic, but I must admit I have not looked at Rhizome in much detail so I could be wrong. Actually one question I have, leading on from your blog post; is Rhizome intended to be implemented over MDP? If so, aside from the journal concept of a file being a version which can be updated, and the use of store and forward, is there much of a difference between Rhizome routing and MDP routing?

Thanks again,

Fraser 

Jeremy Lakeman

unread,
Dec 16, 2012, 5:52:50 PM12/16/12
to serval-proje...@googlegroups.com
On Sat, Dec 15, 2012 at 12:24 AM, Fraser Cadger <cad...@googlemail.com> wrote:
>
> Actually, one I idea I just had is that if a packet is sent to an
> unavailable node or is dropped by an available node experiencing high
> congestion, if other nodes overhear the packet then they could forward it
> instead. This would be particularly applicable to Serval which broadcasts
> multiple Serval packets inside a single UDP packet, so if a node somehow
> *knew* that the intended next hop was unavailable we could forward the
> packets intended for them (and communicate to the previous hop/sending node
> that their intended next hop was unavailable). Exactly how we would know
> this (bearing in mind that the previous node obviously didn't know this) I'm
> not too sure, obviously we wouldn't want 'false alarm' situations where we
> think the next hop is unavailable forward the packet, and it turns out that
> the intended next hop is available and has forwarded the packet as well, and
> so on. However, I think if we do some calculation based on location as well
> as signal, it could be possible to identify (with *reasonable* accuracy)
> instances in which we have overheard a packet intended for a node no longer
> available and take over forwarding.

Tricky, but probably not very useful. Available bandwidth on wifi is
actually quite high, with quite low latency. Making a routing decision
in one place is going to be much easier to implement, leading to less
overheads.
The most I would suggest is an early NACK, eg "I don't think Node N
can hear you any more". The original transmitter can confirm that
information and retransmit the payload.

> As for getting the coordinates into servald, this could be slightly tricky
> but I don't think impossible. Although Android does not provide a C library
> which will allow NDK code to directly access the GPS device there have been
> several successful attempts at doing this. One possible approach would be to
> write a C wrapper for using the Android Locationmanager library, another
> could be to pass messages between the Java portion of Serval and the C
> portion, while a more convoluted (and probably slower) approach would be for
> the Java part to write GPS locations to a text file which would then be read
> by servald.

I would add a "command line" function that sends an mdp frame to the
daemon. This can easily be called from Java as we already have a jni
wrapper on this interface.

> This makes sense. Although I'm obviously aware that Serval places its own
> packets inside an UDP packet for the actual transmission, I somehow keep
> treating these packets as though they are independent packs being unicast.
> That is why I didn't consider the possibility of creating my own type of
> broadcast packet and sending it alongside the existing packets. I'm curious
> though, would it not make more sense to just add fields for coordinates to
> the existing packets? I would only need two fields, one for longitude and
> another for latitude so they shouldn't take up too much space. The only
> downside I could see to this is if the packet format change considerably (as
> you mention), but if I am only adding two new fields would I really need a
> new packet type?

The biggest reasons?
Encapsulation - keep all of your new code as separate as possible from
everything else. As I've said before, I want to experiment with other
kinds of routing engines. We could potentially run more than one path
discovery process at the same time while evaluating them.
Maintenance - it will be easier to merge our latest version if you
only need to insert a couple of method calls here and there.
Compatibility - will your new payload format break nodes running older
or newer versions of the software?

Jeremy Lakeman

unread,
Dec 16, 2012, 5:58:42 PM12/16/12
to serval-proje...@googlegroups.com
On Sat, Dec 15, 2012 at 12:47 AM, Fraser Cadger <cad...@googlemail.com> wrote:
> I had actually been meaning to send another message about Rhizome for some
> time. I did some research into the code and I also came across a blog post
> by yourself discussing your tests with Rhizome implemented over MDP;
> http://servalpaul.blogspot.co.uk/2012/12/rhizome-over-mdp.html . I find the
> concept of the Rhizome journalling quite interesting, and I think it would
> definitely be relevant for the sharing of on-demand (recorded files),
> however I am not sure if it would work for the interactive video
> calling/VoIP traffic, but I must admit I have not looked at Rhizome in much
> detail so I could be wrong. Actually one question I have, leading on from
> your blog post; is Rhizome intended to be implemented over MDP? If so, aside
> from the journal concept of a file being a version which can be updated, and
> the use of store and forward, is there much of a difference between Rhizome
> routing and MDP routing?

Rhizome doesn't route. It just floods, one hop at a time. Though now
that we have a rhizome transport built on MDP it would be possible to
transfer content directly over a number of hops.

Just because we have a novel tool doesn't make it a good fit for every
problem. Just because you have a good hammer, doesn't make every
problem a nail.

Fraser Cadger

unread,
Dec 17, 2012, 3:05:00 PM12/17/12
to serval-proje...@googlegroups.com
That was just a thought that came through my head when I was typing. A NACK 'early warning' system could be useful as a minor addition (if implemented properly), but this would be more of 'try it at a later date and see if it works' sort of thing. 

> As for getting the coordinates into servald, this could be slightly tricky
> but I don't think impossible. Although Android does not provide a C library
> which will allow NDK code to directly access the GPS device there have been
> several successful attempts at doing this. One possible approach would be to
> write a C wrapper for using the Android Locationmanager library, another
> could be to pass messages between the Java portion of Serval and the C
> portion, while a more convoluted (and probably slower) approach would be for
> the Java part to write GPS locations to a text file which would then be read
> by servald.

I would add a "command line" function that sends an mdp frame to the
daemon. This can easily be called from Java as we already have a jni
wrapper on this interface.

Thanks for the suggestion, as I indicated in my last post it was something I'd only really looked at on the surface so far. 

> This makes sense. Although I'm obviously aware that Serval places its own
> packets inside an UDP packet for the actual transmission, I somehow keep
> treating these packets as though they are independent packs being unicast.
> That is why I didn't consider the possibility of creating my own type of
> broadcast packet and sending it alongside the existing packets. I'm curious
> though, would it not make more sense to just add fields for coordinates to
> the existing packets? I would only need two fields, one for longitude and
> another for latitude so they shouldn't take up too much space. The only
> downside I could see to this is if the packet format change considerably (as
> you mention), but if I am only adding two new fields would I really need a
> new packet type?

The biggest reasons?
Encapsulation - keep all of your new code as separate as possible from
everything else. As I've said before, I want to experiment with other
kinds of routing engines. We could potentially run more than one path
discovery process at the same time while evaluating them.
Maintenance - it will be easier to merge our latest version if you
only need to insert a couple of method calls here and there.
Compatibility - will your new payload format break nodes running older
or newer versions of the software?

You are absolutely right. That's common sense, but I suppose I was being a bit lazy and thinking I could get away with a couple of lines of code. Creating a new message type definitely makes more sense, and as it's just going to get carried inside the existing types it's not as if we'll be sending it separately. Backwards comparability and inter-operability with different protocols are obviously very important, and if I'm just creating a new packet type that gets stored as a payload then it can be ignored by nodes running older versions/different protocols whereas playing about with the header might, as you suggest, cause compatibility issues.

Fraser Cadger

unread,
Dec 17, 2012, 3:07:29 PM12/17/12
to serval-proje...@googlegroups.com
Thanks for the clarification, I thought Rhizome did some form of routing. I realise Rhizome isn't necessarily universally applicable, but I think it could be useful for sending recorded media in the future. That said, for now I am focussing on the problem of live media; initially VoIP and then video calling, for which I think the current MDP is the best place for me to work on routing.

Thanks again for your assistance,

Fraser  

Fraser Cadger

unread,
Jan 21, 2013, 3:29:57 PM1/21/13
to serval-proje...@googlegroups.com

Hi everyone,


It's been a wee while since my last post, but I was quite busy over the last couple of weeks and am only just turning my attention back to working with Serval.


Ok, so I've had a think about how best to implement location update message (which I'll refer to as beacons), without modifying the existing message structures. My initial thought was to create a struct called beacon (or similar) which would contain all of the fields (initially our address, and the longitude and latitude coordinates + the timestamp of when the coordinates were recorded) and when we were sending a beacon we would create an instance of this struct. The downside I thought would be that the beacon struct would have to be embedded in another packet as a payload, and so this would mean modifying existing messages (not good) or creating a new message and adding the struct as a payload which seems wasteful. So at this point I can't see any advantages in using the struct (except possibly as a template).


Instead I thought an approach similar to the node announcements could be used. A function called overlay_add_beacon() (for instance) could be called from overlay_init_packet(). The purpose of this would be to create the beacon message, it would be similar in form to overlay_route_add_advertisement(). So it would call call overlay_frame_build_header() to create the frame and then it would need to place the coordinates in the packet. I think the best way to do this would be to use a similar method to add_advertisement() which uses ob_append_byte() to add the best link score and observations to the packet, and before that overlay_address_append to add the address of the node we are advertising. However, if we are sending beacons for ourself then we do not need to append a subscriber address as the packet is obviously about us. Similarly, add_advertisement() is accessed by calling the callback function enum_subscribers() which is used to find the next (unadvertised) subscriber in the tree and the use of enum_subscribers() is obviously not necessary for sending messages about ourselves.


Therefore, I was thinking that two types of beacons would be better; i.e. self_beacon and subscriber_beacon. self_beacon would be similar to a self announce message while subscriber_beacon would be similar to a node announcement. This would entail slight differences in how they were sent. self_beacon could have a method called overlay_add_self_beacon() which function similarly to overlay_add_selfannouncement() except instead of calling ob_append_ui32() to add timestamps it would call ob_append_byte() twice as described above, to add the two coordinates and timestamp (or would it make more sense to use ob_append_ui32 for this timestamp?) . However it would not call overlay_address_append. On the other hand, subscriber_beacon would have a method called (for instance) add_subscriber_beacon() that would function identically to overlay_route_add_advertisement(), and would call enum_subscribers() as a callback to a function called add_beacon() which would be similar to add_advertisement() except that it would call ob_append_byte to add coordinates (instead of score and observation), then append the timestamp but would still use overlay_append_byte() to add the address.


Both overlay_add_self_beacon() and add_subscriber_beacon() would be called from overlay_init_packet() inside overlay_queue.c; with overlay_add_self_beacon() being called after overlay_add_selfannouncement() and add_subscriber_beacon() being called after overlay_route_add_advertisements() is called. Hopefully I'm right in thinking that doing this will mean that the self_beacon messages will go along with the self_announcement messages and the subscriber_beacon messages will be along with the corresponding node_advertisement messages for the same subscriber in the one real packet. Even if they are 'separated' (i.e. we receive a subscriber_beacon without a node_advertisement for the same subscriber) I could probably still handle it (i.e. search for that subscriber).


Other than the above, I think the only other step needed for using the self_beacon and subscriber_beacon messages would be to add them to constants.h. Actually, I'm wondering what numbers to use, I noticed 0x10 to 0x70 are taken by existing messages. Should I just use the next numbers up (0x80 and 0x90), or are there any plans by the Serval team to use those? I'm assuming these numbers are hex numbers, and as type is an integer number constants.h maps the message type i.e. OF_TYPE_SELF_ANNOUNCE to its corresponding hex number?


There might have been a couple of things I've misunderstood in the above, but I think the general approach I described should allow for me to add the beacon messages without causing any conflicts with devices running different versions or their own routing. Hopefully doing it the way I described above would mean that such devices would just ignore my beacon messages as they wouldn't recognise the codes for them and there would be no complications.


I intend to start the coding soon, but I'm just thought I'd check to see if my assumptions are correct and if other people think the way of doing this makes sense before I jump in. I've not mentioned anything about dealing with the beacons once they've been received, as I've mainly been focussing on sending them first. However, I imagine this would involve updating either the subscriber or node entity for that device to change location. I'll look into this in more detail while I'm working on sending the beacons.


Thanks,


Fraser

Reply all
Reply to author
Forward
0 new messages