You are here

OLSRD restarts

10 posts / 0 new
Last post
WB6WGM
OLSRD restarts
OLSRD restarts    
We have about 25 nodes connected, several with many services.  Not every connection is 100% many are not line of site because of the terrain.  We are running V3.16.1.0b.02.   The nodes have been running for 78 days without reboots but the OLSRD keeps restarting.  In the 78 days many of the nodes have had 1000 to 1200 restarts and the main connection node from North to South has had OLSRD restarts 2100 times.  Several times when we accessed a node it becomes unavailable due to being in the OLSRD restart mode"   We are trying to understand what is happening, suggestions have been that the many services hanging on the node are causing problems, or with intermittent connections the OLSRD wait time is exceeded and the OLSRD goes into restart mode.    We have isolated three main node connections  - changed the SSID so they don't connect to anything else and so far there have been no OLSRD restarts in a day.  The LQs are 63 and NQ 100 (not all 100 -100)   before the change of SSID the LQ and NLQ were 40 - 80%

I have a dump of the worst node before we changed the SSID if that will help.

WB6WGM
AE6XE
AE6XE's picture
WB6WGM,  anyway to convince
WB6WGM,  anyway to convince your group to upgrade to 3.16.1.0 release?      Do you by chance have tunnels or meshchat installed on the OLSR problematic mesh node(s)?   During beta, there were issues with consuming too much memory that were addressed.  When memory runs low, typically on the non-Rocket devices with only 32M RAM, then the linux OS starts randomly killing things.  I suspect OLSR would be a big ticket item to get killed.   Alternatively, if lack of memory is is the root cause, shutdown any other apps installed on the mesh node not providing core services.

Joe AE6XE
WB6WGM
We can upgrade to production
We can upgrade to production release but only wanted to change one thing at a time.   The worst mesh node had nothing on it and was stand alone RF wise.
another node that was attached does have jabber on it.   That node is now off the mesh.  
 
AE6XE
AE6XE's picture
Yes, please upload a support
Yes, please upload a support dump. If only ssid changed, then the prior dump. Otherwize a current state capture. Joe AE6XE
KG6JEI
I would upgrade to the stable
I would upgrade to the stable release and try and reproduce.

A ticket against the closed beta will likely be closed without action, or at least a request to duplicate in stable since betas by their nature are not recommended for production and are expected to have possible flaws.

If 3.16.1.0 were still in beta that would be one thing, but since that beta has closed out and a stable release exists there is little point in filing a ticket against it.
WB6WGM
upgraded all nodes to released version - OLSRD restarts less but
I have upgraded the key nodes in our mesh to the production version with a different SSID.  We have gradually added other nodes by changing to the newer SSID to determine who may be causing the OLSRD restarts.  The mesh ran for a week without any OLSRD restarts.  We then added another node (KK6ISP) that has a very high SNR to node KF6LCS (LCS had had the very high number of restarts in the past).  In a few days there were 2 OLSRD restarts. I don't understand why the restarts happened especially when a good path link was added.    The curious part - is that the LCS node in the only one with the restarts and it has a free memory of less than 4000 and all other nodes have a free memory of greater than 4000.  I included a status dump.
Regards, Robert, WB6WGM
AE6XE
AE6XE's picture
Robert,   grab the 'support'
Robert,   grab the 'support' download from a button at the bottom of the Administration page in Setup on any mesh node suspected might be invovled with these symptoms.   (Did I miss the 'status dump' attachment?   Need 'support' download.)

Joe AE6XE  
WB6WGM
downloads available
Joe,
I recently received the download of the node (KK6ISP)- that when added to the MESH started KF6LCS to have OLSRD restarts.   The number has been drastically reduced since we upgraded to the released version.   See attached   Robert, WB6WGM
Support File Attachments: 
AE6XE
AE6XE's picture
No smoking gun...
No smoking gun...

Robert, I'm not seeing anything to help find why olsrd restarted.    How many restarts are you still seeing?     

What is happening is the olsr program overwites/creates a temp file at 5 sec intervals, a heart beat.  Another program wakes up every 15 seconds, and if this file hasn't been overwritten, will restart the olsr program.    So for some unknown reason, the olsr program is ether crashed or not responding for 15 seconds.

I'm not seeing any evidence of a crash, just seeing olsr starting up a couple times.  Possible causes include:

1) RAM consumed and not available
2) some package was installed or some configuration change took place that caused the network interfaces to go down/up.  (manually pulled dtdlink cable?)
3) too low voltage--general weirdness starts happening

What's different about this mesh node, anything?   

Joe AE6XE


 
WB6WGM
second support file
Joe here is the second support file.   Robert

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer