You are here

olsrd vs BATMAN

10 posts / 0 new
Last post
KA9Q
olsrd vs BATMAN

While investigating the transient routing loops (and accompanying packet losses) that I've seen in our (San Diego) network, I came across this essay from the German group that wrote much of the existing code in olsrd, the Optimized Link State Routing daemon that is currently running in both AREDN and Broadband Hamnet nodes.

https://www.open-mesh.org/projects/open-mesh/wiki/The-olsr-story

It's easy to criticize someone else's work. And I don't have an opinion about their newer BATMAN protocol because I haven't studied it yet. But when somebody invests a lot of effort in something only to say that their own work is fundamentally flawed and should be junked in favor of something else, that tends to carry a lot of weight with me.
 

AE6XE
AE6XE's picture
KA9Q,  your research into
KA9Q,  your research into what that "something else" is would greatly contribute.   At present, that "something else" hasn't been known to have arrived.

Note, the document you reference doesn't represent the last 10 years of OLSR development and improvements--it's quite old and looks like the split in 2006 that created BATMAN.   It's nice to see that these groups are healthily competing to evolve the state-of-the-art:  http://battlemesh.org    

Conrad and I are following along on the OLSRv2 developer list.  They still have a way to go for consideration.   What would you recommend is a better path than to fall into OLSRv2 when it is released?   

Joe AE6XE
KA9Q
The "something else" is the B

The "something else" is the B.A.T.M.A.N. (Better Approach To Mobile Adhoc Networking) protocol I mentioned. I noticed it in the Linux kernel quite some time ago so I know it's not brand new.

I don't have an opinion on whether it is better or worse than olsrd because, as I said, I haven't dug into it yet. I just thought it interesting that we're encountering some of the same problems (specifically routing loops) that induced the writers of that essay to abandon olsrd entirely.

It did occur to me a while ago that synchronized clocks might help minimize routing loops by allowing nodes to update their routing tables simultaneously. It probably wouldn't eliminate loops entirely because, as long as routing information can be delayed or lost in transit, nodes aren't guaranteed a consistent view of the network from which to build their routing tables. But the only way to know is to try things out; routing algorithm behavior can be amazingly counter-intuitive.
 

AE6XE
AE6XE's picture
OLSR maintains it's own
OLSR maintains it's own internal clocks and relative time reference to all neighbors. Don't know about these other options. The results on the battlemesh site will tell us a lot about how all they compare. Surely the test case challenge would include loop potential, but I've not found the time to review yet.
KG6JEI
Moderation comment:

Moderation comment:

Moved to Development forums , Protocol Specifications, v4
AJ6BL
AJ6BL's picture
B.A.T.M.A.N.
Hi, I just found this thread and while I know its a bit old, I was wondering what is being used in the current firmware. I stumbled across the B.A.T.M.A.N. site and read a great writeup about loop mitigation. 
https://www.open-mesh.org/projects/batman-adv/wiki/Bridge-loop-avoidance
https://www.open-mesh.org/projects/batman-adv/wiki/Bridge-loop-avoidance-II
Im curious what is currently in the AREDN firmware that is dealing with loop issues?

Thanks
AJ6BL
AE6XE
AE6XE's picture
AJ6BL,   There's a couple of
AJ6BL,   There's a couple of types of loops:

A) "Bridge" loop avoidance -- when nodes are configured as bridges such that a broadcast packet is propagated from node to node
B) "Transient Route Change" avoidance  -- when there is a change of routing with time delay for all nodes in the mesh to receive updated information

These BATMAN documents are  'A'.   The AREDN mesh nodes, using OLSR, aren't setup in this kind of a bridge, they don't pass through broadcast packets, rather each node is a layer 3 or IP router.   There must be an IP address of the final destination and the mesh node has routing tables to identify the neighbor to send the traffic to.   Broadcast packets would flood the RF networks everywhere if the nodes propagated this traffic and with more and more nodes would too quickly make the network unusable. 

AREDN nodes are susceptible to option 'B', which is inherent in designs that are based on "Link State".   The physics are such that there is a delayed time for the 2 nodes on each end of the a direct RF link to communicate information about this link to everyone else.   If, e.g. the conditions suddenly change and the link degrades,  the different delay times for other nodes to receive this information has potential to route IP traffic in a loop for a few seconds until receiving the update.  There would have to be an alternative multi-path options for this condition to occur, with similar ETX value.

BATMAN does appear to be a better option to scale.  However, OLSR has been re-writing their approach called OLSRv2. Potential for these competing groups to leap frog one another.    AREDN uses OLSRv1 today.  If we start to see scaling issues with OLSRv1, we'd be highly motivated to jump to BATMAN or OLSRv2 and make a decision.    No one is eager to make this jump because the protocols are not compatible old vs new.  It would be challenging for groups to have to go though this migration to upgrade their entire network all at once.

Joe AE6XE
AJ6BL
AJ6BL's picture
Thanks Joe,
Thanks Joe,

That helps clear things up a bit for me. I am doing some research this week into some looping issues we are experiencing here in the LA/Ventura county area and wasn't clear about the protocols we are currently using. 

Also, according to the BATMAN docs there is a way to integrate with non BATMAN devices and still take advantage of the benefits, but you have to setup an interface for them and bridge it to the BATMAN interface. I don't know if this would be a possible work around for integrating non-batman nodes prior to upgrading everyone (if we go this route), but I thought I should mention it. Maybe creating a 2nd SSID that would be for the BATMAN nodes and keep the existing AREDN-v3 ssid for "legacy" nodes? Just a thought. Im not an expert in mesh but I am trying to learn as much as I can so please bare with me here. Check out the link below for more info. It does say "a couple computers" so this idea may not work after-all.

https://www.open-mesh.org/projects/batman-adv/wiki/Quick-start-guide

Mixing non-B.A.T.M.A.N. systems with batman-adv

If you have a couple of computers that you don't want to run batman-adv on but you still want make use of the mesh network, you will need to configure an entry point for them on a node running batman-adv...

Lastly, what is the optimal setup for a switch that has >1 node connected to it, in proximity to each other? Is channel separation important (since they are tied together with the switch)? Should we be enabling RSTP or anything like that? I know you said broadcast traffic isn't able to flood the network so STP shouldn't benefit anything. Any other suggestions to avoid looping issues? This more of a question related to the bigger sector antennas and backbone nodes, we have up on the peaks not really for the smaller nodes, but any advice would be appreciated.

Im going to be reading up on OLSR now so I may have some more questions later.

Thanks again Joe, 
AJ6BL
AE6XE
AE6XE's picture
"Is channel separation
"Is channel separation important (since they are tied together with the switch)?"   yes, all the nodes on the same channel significantly impacts latency with more handshaking, which affects voip.    Also, thoughput is cut more than in half with 2 tower sectors sharing the same channel.   Ideal approach is to dtdLink all the nodes together and put them all on different channels.   The 'mesh' per se is not at the RF level, rather above that at the IP level.  putting everything on the same RF channel will have reduced performance and won't scale.   

Another scenario, consider two tower sites that can hear one another on the same channel, could be 20+ miles away.    Each site has lots of clients.   Only one site could transmit at a time blocking out all other traffic, the other tower site couldn't hear a local client at the same time, etc.

Look for parallel multi-hop paths between two nodes with similar ETX.  This is where you will find transient route loop flapping.   One path can get loaded down with a video stream, the ETX drops, then the other path gets selected by OLSR.  The other path gets loaded down with the traffic, the ETX drops, repeat.  The symptoms are such that everything seems to be working fine, but then the connection times out, drops out, or can't keep up with lost data.   too many links or coverage nodes on the same RF channel will just have poor performance most of the time, a different symptom.  

Joe AE6XE
K5DLQ
K5DLQ's picture
From the research that I did
From the research that I did last year, it looks like the industry may be landing on using BATMAN-Adv WITH BMX7.

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer