You are here

Deaf node

10 posts / 0 new
Last post
KD2EVR
KD2EVR's picture
Deaf node

We have a xm nanostation m2 that I only have rf access to. I did a ota upgrade to the latest production release. When I checked a day later over rf it had gone "deaf". The mesh status of local node I was using to connect showed an IP address instead of a name for the node in question, good LQ but zero NLQ.

I had someone with physical access cylce power for me and it recovered. However a few days later it has gone deaf again.

Is this still a known issue? Should I downgrade to the previous production version for now?

AE6XE
AE6XE's picture
KD2EVR,    Need a support

KD2EVR,    Need a support data download to see what is going on.    10 min. after booting, capture a baseline support download.   Then when you see symptoms, capture another download.     Let's see where the data leads.   

I'm not aware of NLQ% suddenly going to 0% in 3.18.9.0.   No issues entered or forum threads I recall?

Joe AE6XE

KD2EVR
KD2EVR's picture
Thanks Joe. I'll try to get

Thanks Joe. I'll try to get the support data but I'll need to arrange for physical access first.

Just to be clear, the neighbor node is showing nlq zero on the bad node. In other words the bad node is transmitting fine but not receiving anything at all.

FYI there are two remote neighbor nodes that occasionally show up as immediate neighbors on RF very weakly. Don't know what that might be a factor.

AE6XE
AE6XE's picture
I think we're on the same

I think we're on the same page:

Node A = is your local node
node B = is the node to arrange physical access 

In mesh status on node A  you see in the Neighbor column, (node B  100%LQ   0%NLQ), correct?      As you've indicated, this means node B is not receiving OLSR packets or hearing node A.   node B probably has another link connected to a 3rd node C.     Otherwise, B would not be showing as a direct neighbor on A, because OLSR has not directly handshaked both directions to establish a direct link between A and B.   When node C is communicating information about B back to node A in OLSR, then B can show up on A as a neighbor.  Maybe one of these other weak link Neighbors is node C.

The original deaf channel condition was selective in that it would stop hearing some neighbors, but still hear others.  The symptoms would generally take hours to days to appear.  On node B, we're looking for the log information in /tmp/rssi.log, which will be in the support download.    There is logic to trigger the radio to recalibrate and start over when there are unexpected jumps (up or down) in received signal strengths.  If the signal degrades from a given neighbor, from the last 1 hour window signal average by more than ~2 standard deviations, then the receiver is triggered to recalibrate.    This logic has prevented these symptoms from showing up for a couple years now.     I suppose it's possible, we could change this to a tighter ~1 std dev test on your node.  But the data will tell us if we're looking in the right place.

Other possibilities include local interference, generally the 1st suspect area to look.  Anything changed recently at node B's site?   Interference was a big factor in deaf channel symptoms.  But If the radio is struggling to survive with other noise/interferences, there's not much that can be done except to make the interference go away.

Joe AE6XE

KD2EVR
KD2EVR's picture
We're on the same page with

We're on the same page with the exception that there's no alternate route for Node A.  See the attached diagram if you're interested. 

I use the A' airrouter as a convenience to check on the mesh from my house it is occasionlly heard weakly by B.  It is not powered up 24/7.  Node D facing away from B but is occasionally heard very weakly.  Everything is on channel -2, 10Mhz bw. 

I have not attempted to access B through C or D - I will try today.  If successful I can easily get your support data and trigger a wifi scan and/or reboot.  

Only recent change was Node D was added in October and maybe recent conditions (leaves dropping) are what are starting to allow it to be heard a little by B.  I can always try putting it on another channel.  
 

Image Attachments: 
KD2EVR
KD2EVR's picture
So, looks like C sees the

So, looks like C sees the same thing as A.

AE6XE
AE6XE's picture
This symptom would steer

This symptom would steer towards local interference at the site blocking the signal (trees?) or 3rd party interference?   It's just not hearing anyone, correct?   The node going deaf symptoms has been selective to neighbors  -- the logic in the driver ends up receiving one neighbor well while tuning out another -- not distinguishing the difference between noise and a legit neighbor to link with. 

Joe AE6XE

KD2EVR
KD2EVR's picture
FWIW, After disabeling all

FWIW, After disabeling all other nodes and then parking right underneath it I was able to connect and grab the attached support data. When I have more time I'll camp out and get more data.

As I drove away the nlq dropped quickly from 100 back to zero. I wonder if somehow the distance setting got corrupted... That would sort of make sense.

KD2EVR
KD2EVR's picture
Ok, so it went deaf again. 

Ok, so it went deaf again.  This time the weak node was on a different freq. so that theory was eliminated.  

I captured some support files:
1. While deaf
2. Shortly after reboot
3. Shortly after reboot 2 the other nodes dropped off the mesh status briefly and then came back
4. Ten minutes after reboot.  
 

AE6XE
AE6XE's picture
KD2EVR,

KD2EVR,

It looks like the condition started shortly after Dec 31 21:30.  The log file shows history from Dec 26 to the 31st of data available.  There were a handful of times rssi_monitor would trigger the radio to recalibrate each day.   Then it stops taking any action as a couple of signals are attenuated out and the strong signal remains, although lower SNR now, continued through today.   The logic treated the other signals as interference and made the environment pristine to only receive the remaining strongest signal.   

What must be happening at this site is the conditions are just right/wrong and the receiver very very slowly goes into this state, such that the re-calibration trigger is not called to prevent the condition.    What you can do is change the threshold to trigger re-calibrations, to make the test more sensitive.     This can be done by  finding and editing the following line in "/usr/local/bin/rssi_monitor":

from:  $sdV3 = int(3 * $rssiHist{$_}{"sdV"} + .5);
to:       $sdV3 = int(2 * $rssiHist{$_}{"sdV"} + .5);

AND

from: $sdH3 = int(3 * $rssiHist{$_}{"sdH"} + .5);
to:      $sdH3 = int(2 * $rssiHist{$_}{"sdH"} + .5);

Change the '3' to a "2".   It will then trigger a recalibration if a received signal is only (2 standard deviations + .5) instead of (3 standard deviations + .5) drop from the rolling average of the Received Signal Strength (RSS) .   The unit is in dB.     There are 1 minute samples for the last 1 hour window for tracking average and standard deviation of the RSS for each neighbor.   This data can be found in /tmp/rssi.dat.  For dual polarity devices there are 2 values for AVE and 2 values for STD for each neighbor.

Monitor the file /tmp/rssi.log to see when the trigger is called and how often.     You can change this threshold value calculation  to make it just the right amount of sensitivity to ensure the condition no longer occurs.  For some environmental reason, your 'lucky' site needs a more sensitive threshold. 

Joe AE6XE 

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer