You are here

Tunnel setting changes cause nodes to stop responding

7 posts / 0 new
Last post
K6CCC
K6CCC's picture
Tunnel setting changes cause nodes to stop responding
Executive summary version:  Saving changes on Tunnel Client or Tunnel Server page results in node becoming unresponsive for a while (10 - 120 seconds).

Detailed version:  I was making a bunch of changes to the two Rocket M5 XM nodes at work because of some node name changes.  This included changes to tunnel connections.  I was finding that every time I made any change on the Tunnel client page and clicked the save changes button, the web GUI would eventually show a page reset or page timeout error.  As a troubleshooting test, I started a ping session to the node (LAN IP) and found that the pings would fail for about a half minute after saving on the Tunnel Client page.  According to the Node Status page, the uptime indication did NOT show that the node had rebooted.  Both Rocket M5 nodes have Nightly Build 1476.

This morning, I did some further testing, first with a Rocket M3, then with a GL.iNet USB150, a Mikrotik RB-LHG-5nD, and lastly with a Mikrotik hAP.  In the case of the LHG-5nD and the Rocket M3, I had to update the nodes from Nightly 1476 to Nightly 1491 in order to load the tunnel software before I could perform the tests.  The USB150 and hAP had Nightly Build 1476 and already had tunnel software loaded, so I did not need to update them prior to the test. For each node tested, I first opened two ping sessions - one to the mesh IP, and the second to the LAN IP.  All the nodes exhibited the same issue.  With the exception of the Rockets, all the nodes would stop responding to pings for 10 to 15 seconds when saving changes on the Tunnel Client page, and a little longer with changes on the Tunnel Server page.  In the case of the Rocket M3, (as had been observed on the Rocket M5 nodes), the times were much longer.  Saving changes on the Tunnel Client page resulted in pings failing for 20 - 25 seconds, and saving changes on the Tunnel Server page resulted in pings failing for up to two minutes.  The pings would at times fail again several minutes later for up to two minutes.  During the ping failures, as expected, the web GUI was not reachable.

All these tests were 100% repeatable.
 
nc8q
nc8q's picture
I think OLSRD restarts

yes, you lose the link for ~ 1 minute.
 

K6CCC
K6CCC's picture
Yes, remotely
Correct.  The two nodes at work are 26 miles away and a rather round about route to get there.  The nodes that I tested this morning were:
hAP-Portable = one DtD link away
USB150 = one RF link away (about 6 feet)
Rocket M3 = one DtD + three RF links away (total about 30 feet)
LHG-5nD = one DtD + two RF links + one more DtD (also total about 30 feet)

I have not observed this delay until either 3.20.3.0 or Nightly build 1476.  I don't remember noting any issues with 3.20.3.0, but I also likely did not make any tunnel changes (other than installing tunnel software after the update) between installation of 3.20.3.0 and installation of 1476.


And yes, I know that you have to be on either the current production or current Nightly build in order to install tunnel software (at least without doing it manually).  Only purpose of that part was to list why those nodes had a different firmware version.
nc8q
nc8q's picture
OLSRD and tunnel re-confurations
I have observed this 'loss of link'.
I have used tunnels on a Nanostation locoM2, GLiNets AR150, USB150, AR300M, AR750, and Mikrotik hAP.
YMMV. ;-)
 
K5DLQ
K5DLQ's picture
When saving tunnel connection
When saving tunnel connection changes, OLSR does restart as it needs to recognize new interfaces.
K6CCC
K6CCC's picture
No tunnels involved
Keep in mind that none of the nodes involved had tunnels active at all (generally not enabled).  Also I would not think that an OLSR restart would cause a ping to the node's IP to stop responding.
 
AE6XE
AE6XE's picture
When OLSR is restarted, the
When OLSR is restarted, the node no longer has any routing tables, which OLSR maintains.   The node has no way to know how to route back to any traffic it receives.    If you are many hops away:

A) time for OLSR to restart
B) time for OLSR to receive updates and knows routing info back to you
C) time for the node's routing information to propagate back over the mesh to you (if the info has timed out)

Maybe an improvement, to not restart OSLR if no relevant information has been changed, e.g. just comments.

Joe AE6XE

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer