You are here

HAP AC Lite keeps unadvertising services despite them being up

14 posts / 0 new
Last post
KI6TSF
HAP AC Lite keeps unadvertising services despite them being up

Hi,

The services I have defined on my HAP AC Lite (Mikrotik RouterBoard 952Ui-5ac2nD) keep disappearing from the mesh status page every few hours.  They run on a raspberry pi on the LAN and it is reachable from both the RF side and LAN side all the time.  "Provide default route to LAN devices" is set to ON in the Advanced Configuration panel.

This didn't happen before when my services were advertised from a LiteBeam.  Only happens with the HAP.  The firmware version is 3.23.4.0.

Any idea what the problem might be?

Thanks,

Bernard KI6TSF

K6CCC
K6CCC's picture
What kind of services?  The
What kind of services?  The newer firmware versions check to make sure the services are actually reachable, and de-advertise them.  As I recall, it will show on the node with some sort of indication (can't remember if / what), but not propagate the service.
 
KI6TSF
I have those services running

The types of services I have are http:// telnet:// and mqtt://

I have them running on a raspberry pi and port-forwarded on the HAP AC Lite.

Webcam towards Black Mtn http://ki6tsf-hap-1.local.mesh:8081/
N0ARY Packet BBS (login as user bbs then type callsign) telnet://KI6TSF-HAP-1.local.mesh:10023/
MeshChat http://ki6tsf-hap-1.local.mesh:8082/meshchat
MQTT Broker (use topic /yourcallsign/whatever) mqtt://KI6TSF-HAP-1.local.mesh:1883/
Speed Test http://ki6tsf-hap-1.local.mesh:8082/speedtest

They are up and reachable all the time in my LAN at their original ip and port, and from the RF side via their forwarded ports.

When the HAP AC Lite decides to de-advertise them, it shows an exclamation mark (!) to the right of the "Advertised Services" list in Setup.

This used to work fine all the time before when all those services were port-forwarded from another device (LiteBeam).

Bernard KI6TSF
 

K6CCC
K6CCC's picture
I'm going to be very picky
I'm going to be very picky about this question.

The types of services I have are http:// telnet:// and mqtt://

I have them running on a raspberry pi and port-forwarded on the HAP AC Lite.

You said they are port forwarded.  Normally that implies NATing.  Is the RasPi directly connected to a LAN port on the hAP and therefore has a mesh reachable 10.x.y.z address, or is it connected to or via something else (a router for example)?  Are you actually simply advertising the services or is there actual port forwarding going on the hAP?

When the HAP AC Lite decides to de-advertise them, it shows an exclamation mark (!) to the right of the "Advertised Services" list in Setup.


I knew there was some indication, but could not remember what it was. It takes an hour or two for a service to de-advertise.

The ability to check to see if a service was actually reachable was added some time last year.  The purpose was a response to the HUGE number of advertised services that were not actually there.  When first added to the nightly builds, there were some issues with certain types of services that were not detected as actually reachable, but common types such as we pages were not an issue.  As I recall, one of the first things the node tries to do is ping the device.  Ping does not recognize TCP ports, so that may be an issue depending on what you are actually doing.  Additional tests are performed if the ping fails, but I do now know details.
 
KI6TSF
You said they are port

You said they are port forwarded.  Normally that implies NATing.  Is the RasPi directly connected to a LAN port on the hAP and therefore has a mesh reachable 10.x.y.z address, or is it connected to or via something else (a router for example)?  Are you actually simply advertising the services or is there actual port forwarding going on the hAP?


Yes, you are right.  I initially forgot to enable NAT Mode on that node.  I switched to NAT mode but that didn't fix this particular issue, the services were de-advertised again this morning. 

The RasPi is connected to a LAN port on the HAP, there is no router in between.  There is actual port forwarding going on the HAP and it seems to work, I tested it multiple times from the RF side using another node, it's just the advertising that goes away after a few hours.

I will check the HAP log files if there are any errors, hopefully the advertising/de-advertising code logs some entries.

Thanks,

Bernard KI6TSF
K6CCC
K6CCC's picture
I don't know if running NAT
I don't know if running NAT mode is causing your problem.  However, is there a particular reason you are using NAT mode as opposed to "direct" mode?
KI6TSF
Yes I prefer NAT to prevent
Yes I prefer NAT to prevent direct access to other TCP/IP services such as sshd or other services I plan to set up and test in the LAN first.  Also I'm going to migrate my RPi 4 services to a RPi CM4 (inside a Seeed reTerminal which is basically the same as a Raspberry Pi but with an integrated LCD touchscreen display) and I think using NAT with port forwarding will make the transition simpler.
KI6TSF
Ok, so on firmware 3.23.4.0

Ok, so on firmware 3.23.4.0 on my HAP AC Lite, the check-services hourly crontab systematically fails.  It calls /usr/local/bin/olsrd-config which passes the pings but systematically fails at all the http checks and tcp fallback checks.  The reason is it uses the NAT hostname and ports defined in each port forwarding rule and those ports are not accessible from the HAP itself, i.e. the HAP is not able to reach any of the local ports it forwards although the port forwarding rules are active and they are reachable from the LAN.  The FW4 firewall config does not allow the device's own SRC address to access those ports.  They are all accessible from the LAN and RF-side WAN (which is DtD in my case) but not from the host itself using any of its own interfaces IP addresses (127.0.0.1, 10.x.x.x, 172.x.x.x, 192.168.x.x).  The tests in olsrd-config therefore systematically fail.  I have debugged and traced all the LUA code and confirmed that this is the issue on my HAP, and this is what causes all the services to be de-advertised while in fact they are all reachable from anywhere.

Bernard KI6TSF

K6CCC
K6CCC's picture
You are way beyond my ability
You are way beyond my ability to help.  I can spell linux and not much more than that!
Submit a bug report on GitHub

 
KI6TSF
Thanks anyways! I'll submit a
Thanks anyways! I'll submit a bug.
W4JWC
Update please
I'm curious it there has been a resolution to this. I and another ham are having the same issue. Thanks
AI4Y
Also experiencing the delisting issue

W4JWC and I have seen this delisting issue on 3.23.4.0 and 3.23.8.0 for Winlink service and AXIS IP Camera. Service resolves on local mesh nodes and tunnel. The services are running but ping and arping fail when called from /usr/local/bin/olsrd-config .  Services are port forwarded and resolvable, but are not pingable, hence getting delisted.  Services are still in /etc/config/services.

They are also in /tmp/service-validation-state and have recent epoch dates for them, but when olsrd-config executes it calls /usr/local/bin/olsrd-namechange to pull (delist) the service when it writes /var/run/hosts_olsr.stable

I noticed the LUA manger has been choking when trying to kill a process.

root@AI4Y-160-201-233:~# cat /tmp/manager.log
09/26 11:33:32: linkled: Terminating manager task: linkled
09/26 11:33:32: periodic-metrics: Terminating manager task: periodic-metrics
09/27 04:10:45: namechange: /usr/local/bin/mgr/namechange.lua:126: bad argument #1 to 'kill' (integer expected, got nil)
09/27 04:14:50: namechange: /usr/local/bin/mgr/namechange.lua:126: bad argument #1 to 'kill' (integer expected, got nil)
09/27 04:17:50: namechange: /usr/local/bin/mgr/namechange.lua:126: bad argument #1 to 'kill' (integer expected, got nil)

The kill errors from not passing a valid pid happen when its executing the following code in /usr/local/bin/mgr/namechange.lua:

function dns_update()
    local pid = capture("pidof dnsmasq")
    if pid ~= "" then                                                                  
        nixio.kill(tonumber(pid), 1)              
    end                                 
end

KN6PLV
KN6PLV's picture
I would be good to get more
I would be good to get more information about these service delisting as ... obviously ... this isnt how it should be.
In particular, I'm interested in a setup where arping is failing and yet you can can still contact the device when you try.
AI4Y
Yes, I can still contact the device after its delisted.

I found that if I connect the device (Axis IP) camera directly to my ToughSwitch Pro where all the VLAN's are in place (1- wan, 2 - dtd, 11 untagged) and the camera port is connected to an untagged port with wan access it successfully pings from the Aredn node and no longer gets delisted. I'm not surprised honestly. I have 3 networks at my QTH and the Aredn 10. subnet has to cross into the 192 subnet to get to the Axis IP camera which is assigned a 10. IP in the same Arden subnet. Nasty I agree, so I took steps to connect it directly to the ToughSwitch.  W4JWC however seems to still have troubles with his Windows PC running Winlink service being delisted when directly connect to a MikroTil hAP ac lite (RB952Ui-5ac2nD-US)
 

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer