Anyone seeing repeated issues with WireGuard Tunnel clients/servers losing connectivity via the tunnel? I have 5 clients and 1 server configured. Periodically, I'm unable to connect to the nodes across the tunnels. This seems to be somewhat random. Of the 5 tunnel clients, I may lose three of them but the other two are fine. The tunnels show as active, but no communication. To attempt to gain connectivity back, I've asked the clients to reboot but still have no connectivity. Only when I reboot my node am I able to regain connectivity. After reboot, all tunnel connectivity is restored for a day or two, then I lose 1-3 tunnels.
Yesterday, I downloaded and installed the latest nightly build, and three hours later, lost one server tunnel and two clients.
In the log files, I'm seeing duplicate IP warnings from OLSRD. The strange thing is that the duplicate IP is associated with a tunnel that is configured on one of my client's nodes, not mine. Also, seeing Kernel warnings about out of memory.
What, if any, troubleshooting can I perform to determine the cause of this issue? I thought the latest nightly was supposed to resolve some of these issues?
I have downloaded a support data file if necessary.
Thoughts and ideas would be greatly appreciated!
Jim - AA7CL
Yesterday, I downloaded and installed the latest nightly build, and three hours later, lost one server tunnel and two clients.
In the log files, I'm seeing duplicate IP warnings from OLSRD. The strange thing is that the duplicate IP is associated with a tunnel that is configured on one of my client's nodes, not mine. Also, seeing Kernel warnings about out of memory.
What, if any, troubleshooting can I perform to determine the cause of this issue? I thought the latest nightly was supposed to resolve some of these issues?
I have downloaded a support data file if necessary.
Thoughts and ideas would be greatly appreciated!
Jim - AA7CL
The navigation page via Babel only showed the one node the user is connected to and not even the other side of the tunnel. The OLSR side shows the entire mesh network and functions. I could not download a report from the tunnel server. I've rebooted the tunnel server node and Babel routing has returned.
One of the users is on a Babel only version so had absolutely nothing at all. Thank goodness I haven't yet gone to Babel only on any nodes I depend on otherwise I would have zero control over my remote nodes.
The nightlies 0909 and beyond seemed to be just fine ... until this morning.
Ed
And remember the only difference between Babel-only builds and the regular nightly builds is the presence or absence of OLSR. If both ends of a link are Babel capable, they'll run Babel instead of OLSR. regardless of what version of nightly build they're on.
Orv W6BI
I'm not able to locate the fix you mention. Is this a current bug tracking number (#2947)?
Where might I find this info?
I attempted to update to the latest nightly; however, my hAP won't accept the firmware upgrade.
Looking at the log files, I see more complaints about duplicated IPs and Memory crashes. I even tried the Babel-nightly build and same thing, my hAP reboots with the same firmware. Not sure what is going on.
Snippets:
Thanks,
Jim - AA7CL
If you have the internal radios in the hAP enabled, turn them off and try the upgrade. It turns out the radio daemon is a memory hog.
There are two methods of updating an AREDN node. One is to have the node find it and download it. If that fails, go to https://downloads.arednmesh.org/afs/www/ . enter your model and download it to your PC. Then upload it to your hap.
Orv W6BI
There is no repository linked and available to earlier nightly builds. I cannot go find the 0912 nightly for a node type I have not already downloaded and archived. Many of our problems have occurred by applying nightly builds that turn out to be new problems.
Your post was on Sept 16, and you reference a nightly from Sept 12. Sure, I'd try that one as no one has had any emergencies (at least that I'm aware of). But the website won't let me choose this.
Ed