Hi,
The services I have defined on my HAP AC Lite (Mikrotik RouterBoard 952Ui-5ac2nD) keep disappearing from the mesh status page every few hours. They run on a raspberry pi on the LAN and it is reachable from both the RF side and LAN side all the time. "Provide default route to LAN devices" is set to ON in the Advanced Configuration panel.
This didn't happen before when my services were advertised from a LiteBeam. Only happens with the HAP. The firmware version is 3.23.4.0.
Any idea what the problem might be?
Thanks,
Bernard KI6TSF
The types of services I have are http:// telnet:// and mqtt://
I have them running on a raspberry pi and port-forwarded on the HAP AC Lite.
Webcam towards Black Mtn http://ki6tsf-hap-1.local.mesh:8081/
N0ARY Packet BBS (login as user bbs then type callsign) telnet://KI6TSF-HAP-1.local.mesh:10023/
MeshChat http://ki6tsf-hap-1.local.mesh:8082/meshchat
MQTT Broker (use topic /yourcallsign/whatever) mqtt://KI6TSF-HAP-1.local.mesh:1883/
Speed Test http://ki6tsf-hap-1.local.mesh:8082/speedtest
They are up and reachable all the time in my LAN at their original ip and port, and from the RF side via their forwarded ports.
When the HAP AC Lite decides to de-advertise them, it shows an exclamation mark (!) to the right of the "Advertised Services" list in Setup.
This used to work fine all the time before when all those services were port-forwarded from another device (LiteBeam).
Bernard KI6TSF
I knew there was some indication, but could not remember what it was. It takes an hour or two for a service to de-advertise.
The ability to check to see if a service was actually reachable was added some time last year. The purpose was a response to the HUGE number of advertised services that were not actually there. When first added to the nightly builds, there were some issues with certain types of services that were not detected as actually reachable, but common types such as we pages were not an issue. As I recall, one of the first things the node tries to do is ping the device. Ping does not recognize TCP ports, so that may be an issue depending on what you are actually doing. Additional tests are performed if the ping fails, but I do now know details.
Yes, you are right. I initially forgot to enable NAT Mode on that node. I switched to NAT mode but that didn't fix this particular issue, the services were de-advertised again this morning.
The RasPi is connected to a LAN port on the HAP, there is no router in between. There is actual port forwarding going on the HAP and it seems to work, I tested it multiple times from the RF side using another node, it's just the advertising that goes away after a few hours.
I will check the HAP log files if there are any errors, hopefully the advertising/de-advertising code logs some entries.
Thanks,
Bernard KI6TSF
Ok, so on firmware 3.23.4.0 on my HAP AC Lite, the check-services hourly crontab systematically fails. It calls /usr/local/bin/olsrd-config which passes the pings but systematically fails at all the http checks and tcp fallback checks. The reason is it uses the NAT hostname and ports defined in each port forwarding rule and those ports are not accessible from the HAP itself, i.e. the HAP is not able to reach any of the local ports it forwards although the port forwarding rules are active and they are reachable from the LAN. The FW4 firewall config does not allow the device's own SRC address to access those ports. They are all accessible from the LAN and RF-side WAN (which is DtD in my case) but not from the host itself using any of its own interfaces IP addresses (127.0.0.1, 10.x.x.x, 172.x.x.x, 192.168.x.x). The tests in olsrd-config therefore systematically fail. I have debugged and traced all the LUA code and confirmed that this is the issue on my HAP, and this is what causes all the services to be de-advertised while in fact they are all reachable from anywhere.
Bernard KI6TSF
Submit a bug report on GitHub
W4JWC and I have seen this delisting issue on 3.23.4.0 and 3.23.8.0 for Winlink service and AXIS IP Camera. Service resolves on local mesh nodes and tunnel. The services are running but ping and arping fail when called from /usr/local/bin/olsrd-config . Services are port forwarded and resolvable, but are not pingable, hence getting delisted. Services are still in /etc/config/services.
They are also in /tmp/service-validation-state and have recent epoch dates for them, but when olsrd-config executes it calls /usr/local/bin/olsrd-namechange to pull (delist) the service when it writes /var/run/hosts_olsr.stable
I noticed the LUA manger has been choking when trying to kill a process.
root@AI4Y-160-201-233:~# cat /tmp/manager.log
09/26 11:33:32: linkled: Terminating manager task: linkled
09/26 11:33:32: periodic-metrics: Terminating manager task: periodic-metrics
09/27 04:10:45: namechange: /usr/local/bin/mgr/namechange.lua:126: bad argument #1 to 'kill' (integer expected, got nil)
09/27 04:14:50: namechange: /usr/local/bin/mgr/namechange.lua:126: bad argument #1 to 'kill' (integer expected, got nil)
09/27 04:17:50: namechange: /usr/local/bin/mgr/namechange.lua:126: bad argument #1 to 'kill' (integer expected, got nil)
The kill errors from not passing a valid pid happen when its executing the following code in /usr/local/bin/mgr/namechange.lua:
function dns_update()
local pid = capture("pidof dnsmasq")
if pid ~= "" then
nixio.kill(tonumber(pid), 1)
end
end
In particular, I'm interested in a setup where arping is failing and yet you can can still contact the device when you try.
I found that if I connect the device (Axis IP) camera directly to my ToughSwitch Pro where all the VLAN's are in place (1- wan, 2 - dtd, 11 untagged) and the camera port is connected to an untagged port with wan access it successfully pings from the Aredn node and no longer gets delisted. I'm not surprised honestly. I have 3 networks at my QTH and the Aredn 10. subnet has to cross into the 192 subnet to get to the Axis IP camera which is assigned a 10. IP in the same Arden subnet. Nasty I agree, so I took steps to connect it directly to the ToughSwitch. W4JWC however seems to still have troubles with his Windows PC running Winlink service being delisted when directly connect to a MikroTil hAP ac lite (RB952Ui-5ac2nD-US)