You are here

hAP ac lite update fails with "Bad Gateway"

30 posts / 0 new
Last post
K2QA
hAP ac lite update fails with "Bad Gateway"
Hello,

Trying to flash​ hAP ac lite - (RB952Ui-5ac2nD-US) with AREDN firmware using Windows 10 instruction video from KK6RAY.

I can TFTP boot from .elf file with tiny PXE.

When I try update using setup/administration to upload the .bin file, wireshark shows lots of packets from Windows to hAP, then hAP returns "Bad Gateway" and resets the TCP connection.

I have tried:
aredn-3.18.9.0-mikrotik-rb-nor-flash-16M-ac-sysupgrade.bin
aredn-3.19.3.0-mikrotik-rb-nor-flash-16M-ac-sysupgrade.bin

I've tried many times.

Any suggestions?

Thanks
K5DLQ
K5DLQ's picture
unplug your cat5, close your

unplug your cat5 from your computer, close your browser.  try again.

K2QA
Firmware File Not Valid
Multiple attempts, always one of two outcomes; hAP always resets the connection.

1. Bad gateway message from hAP
2. Message on Administration screen: Firmware CANNOT be updated, firmware file is not valid, Failed to restart all services, please reboot this node., current version: 3.19.3.0, hardware type: mikrotik (rb-952ui-5ac2nd)

It's like the upload process crashes.

I have Wireshark and supportdata if they will be useful. (hAP, PC w/Wireshark on 192.168.1.100, PC w/TinyPXE on 192.168.1.10, all plugged into dumb switch)

I can try the Linux process, but it would be useful to find out why Windows update fails.
AE6XE
AE6XE's picture
Make sure you are loading the
Make sure you are loading the 3.19.3.0 bin file when it has been booted with the 3.19.3.0 elf file.    Don't try to mix between releases, would be unknown behavior.   

I've seen this symptom periodically.  We need to be able to reproduce the problem reliably to fix it.   If you have some command line familiarity, you might proceed:  scp the bin file to /tmp on the node, then run this command after telnet or ssh into the node, you would see any errors that might cause the bad gateway message.  

sysupgrade -n /tmp/<image name>.bin

Joe AE6XE
K2QA
sysupgrade worked
Joe,
I've been traveling and just getting back to this.
Thanks for the info. It took me a while to find the ssh port to use, but I was able to scp the 3.19.3.0 bin file and run sysupgrade.
As soon as sysupgrade started, the connection was closed, so I couldn't watch for any messages.
Is there a secret to keeping the connection open until reboot?
Are there any log files you want to look at?

John K2QA
 
kc5hwb
Are you moving the cat5 from
Are you moving the cat5 from the internet port to one of the LAN ports when attempting to load the 'sysupgrade' firmware?
AE6XE
AE6XE's picture
The only way to see further
The only way to see further details is to connect to the serial console port. This result is expected behavior (and rules out some issues since you didn't see error messages).    Need more details on what happened, what cables were unplugged and when.     I have found it best to not do anything for ~4 minutes after typing the sysupgrade command. Then unplug the network cable from port1, wait 15 sec or so, then plug into port 2, make sure the laptop acquires a new IP address via DHCP.

Joe AE6XE 
k1ky
k1ky's picture
Initial load sysupgrade issues - "new" units??
I (we) too are experiencing similar behavior with several "new" HAP AC Lite models.  I noticed that the packaging box is a little bigger on the new units as well.  The model number appears to be the same.

Where do we go to see the boardid version on factory Microtik units??  I intend to spend more "productive" time with this either tomorrow or over the weekend.  I have been successful loading all versions of AREDN firmware on these HAPAC Lite units.... until this past few weeks.  It "appears" that the upgrade fails or restarts anywhere from 4% up to around 96% during the load (using Chrome).  Still fails with other browsers as well.  I have tried old and new nightlies as well as production versions.  Just a "few" stubborn nodes at this point so far, but it's starting to sound and look like a trend.
KB9OIV
I have recently flashed two

I have recently flashed two of these devices.  

I did get the 'Bad Gateway' message on both units when I tried newer initial firmware.

I was able to flash 3.19.3.0 firmware successfully, however.

I did not try to reproduce the 'Bad Gateway' message to see if it was repeatable, by starting over.  I have not tried any newer permanent firmware.

K2QA
Bad Gateway on another hAP AC Lite
Joe,

Since i was able to flash my first unit using scp, I decided to buy another one to use for debugging.
I get the same Bad Gateway error when doing upgrade via browser on the node after tftp booting with 3.19.3.0.

When I hit the upload button, I get the 'Bad Gateway' message and ssh session also disconnects.

logread, isn't very useful since SSH terminated.
-----------------------------------------------
  3.19.3.0, r7676-cddd7b4c77
    root@NOCALL:~# logread -f
      Fri Mar 22 19:29:00 2019 cron.info crond[1022]: USER root pid 3150 cmd /usr/local/bin/clean_zombie.sh
      Fri Mar 22 19:29:29 2019 kern.info kernel: [  217.099458] sh (3203): drop_caches: 3
      Failed to find log object: Not found 

I have also attached Wireshark conversation extract which might also give some insight. Entire trace is 6MB, so it uploads most of the .bin file.

What other tools are available to help diagnose?

Thanks,

John K2QA
AE6XE
AE6XE's picture
Support data?
can you click on the support data download link at the bottom of the Administration page?   This will have the info I need.   Been traveling last 2 weeks, I think someone sent this data on the suspected new hardware earlier.   Getting home today should have time to look at I think this week.
K2QA
Joe,
Joe,

After 'Bad Gateway' I went back to 192.168.1.1 - web server still worked, so I collected support data.
I bought this second hAP just to diagnose this problem, so let me know what else I can do to help.

P.S. I checked md5sum of upload file and it is good, but wireshark trace suggests that there is an issue with the data.
....
OpenWrt kernel loader for AR7XXX/AR9XXX
..Copyright (C) 2011 Gabor Juhos <juhosg@openwrt.org>
....Incorrect LZMA stream properties!
..
System halted!
....Decompressing kernel... ....done!
..failed, ....data error!
....Starting kernel at %08x...

....m....L.;.......o......L9.i$.zn.<.}N.qB.S\.`$S6.....6...:....H.%...4.
...


John K2QA
Support File Attachments: 
AA7AU
AA7AU's picture
I ran into something similar

I ran into something similar when I was trying to un-brick my CPE220. If memory serves, I had used tftp to upload, which seemed successful. but I couldn't get past the upload phase, and then tried 192.168.1.1:8080 and found my CPE220 responded with my nodename and all my prior settings (including where I had turned off RF) even though it wouldn't respond to "localnode:8080". It's possible that I uploaded the upgrade image rather than factory image (but I thought I had it right) and I finally had to tftp upload the correct image all over again to get all to thankfully reset - the node went totally off into never-never land after trying to continue with that first cycle.

This experience raises a number of questions in my mind about data cleanup during firmware upload, etc and/or how incredibly creative pilot error can be ...

But the reason I post now is that I did see that same unexpected presence at 192.168.1.1 during my flailing about.

Just thought I'd throw that in for whatever it's worth,
- Don - AA7AU

AE6XE
AE6XE's picture
out of RAM
The system log is showing the device is running out of RAM and the kernel is going into Out-of-Memory (OOM) mode, killing processes to survive.  We'll need to reproduce and review the options for Mikrotik devices.   I'd expect all Mikrotik models to have the same issue with 64MB RAM.  The kernel ranks processes to decide which one to kill.
 
[  451.464626] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[  451.599359] [ 3764]     0  3764     1229     1057       5       0        0             0 setup
[  451.634765] Out of memory: Kill process 3764 (setup) score 67 or sacrifice child
[  451.642432] Killed process 3764 (setup) total-vm:4916kB, anon-rss:1476kB, file-rss:2752kB, shmem-rss:0kB
 
[  428.586694] [ 3667]     0  3667     1128      956       4       0        0             0 admin
[  428.613178] Out of memory: Kill process 3667 (admin) score 61 or sacrifice child
[  428.620847] Killed process 3667 (admin) total-vm:4512kB, anon-rss:1080kB, file-rss:2744kB, shmem-rss:0kB

...and a bunch more.

As a work around, for new devices, load 3.19.3.0 -- using both the .elf and .bin files from this release.   Once AREDN is loaded to flash and booting,  then go to the admin page, and upload the nightly build .bin file, a 'sysupgrade'.     It should only be the nightly build .elf and .bin combination usage triggering this failure.  This assumes it really is a memory root cause in the nightly build and not something else.   Can you please confirm this works?

We are consuming more flash and RAM to replace the UI.  We know we'll push the limits with 2 languages supported right now.  At some point, we can retire perl, a large consumer of flash and RAM, the current UI depends on.   This will free up a lot of RAM/flash and, cross fingers, more memory head room on 32MB RAM devices to extend their life span. 

Joe AE6XE
K2QA
Microtik hAP Testing
Joe,

Thanks.
I'm not sure exactly what you want me to do.

I was able to flash the first unit by tftp booting 3.19.3.0.elf, using scp to copy 3.19.3.0.bin, and then running sysupgrade from terminal, so I know that works.

So I bought the second unit to see if I could reproduce and debug the problem.

After tftp boot with aredn-3.19.3.0-mikrotik-vmlinux-initramfs.elf, it is the browser admin/upload of aredn-3.19.3.0-mikrotik-rb-nor-flash-16M-ac-sysupgrade.bin that is failing.

Are you saying to try to admin/upload the nightly build .bin file instead?

I can also send you the unit if you want it for testing.

John K2QA

 
AE6XE
AE6XE's picture
John,  

John,  

Working around 2 issues here:

Issue 1 - installing the nightly build
Yes, fully install 3.19.3.0 to be a working AREDN mikrotik device as step 1 (using the 3.19.3.0 .elf and .bin).   Then as a step 2, if desired to install the nightly build firmware, from the admin page, upload the nightly build .bin.     

Issue 2 - booted with .elf and AREDN UI upload of .bin file returns 'bad gateway'
For now the command line 'sysupgrade' work around can be used.   Further investigation is needed.

Joe      

K2QA
hAP Nightly Build Upload Successful

Joe,

I successfully updated first unit using admin/upload from 3.19.3.0 to nightly build aredn-853-94816c4-mikrotik-rb-nor-flash-16M-ac-sysupgrade.bin.

firmware version 853-94816c4
configuration       mesh
free space
flash = 9144 KB
/tmp = 30260 KB
memory = 35120 KB

John

W9IKU
No luck following suggestions in forum on install

Has this been resolved?  I am trying to get the hAP installed. I can get the elf loaded. However, through the GUI and Telnet I stall out.  Using the command line, I get the "killed" response.

Please help

Greg
W9IKU
greg@w9iku.net

K2QA
I ended up using a friend's
I ended up using a friend's computer and the sysupgrade to 3.19.3.0 worked fine on two hAP units.
Do not know root cause of why it failed on my computer.
3.19.3.0 .elf always loaded. Tried different browsers, etc., but sysupgrade always failed.
Worked first time on friend's computer.

 
AA7AU
AA7AU's picture
Which computers?

Please post the model and O/S type and level for each of the two that you tried. Maybe there's some pattern here someplace. I struggled hard a few months back for my three installs on the hAP units.

TIA,
- Don - AA7AU

AB7PA
Web interface vs command line sysupgrade install
One method that has always worked for me is described here:
  https://arednmesh.readthedocs.io/en/latest/arednGettingStarted/installin...
Installing the sysupgrade image via command line seems to work when the web interface doesn't.
W2GMD
W2GMD's picture
For those landing on this
For those landing on this thread, like I did, after failing to upgrade the MikroTik hAP via the web interface, follow the directions KC0EUW pointed at, specifically:

1) TFTP install the elf image (as documented).
2) DO NOT ATTEMPT TO UPLOAD FIRMWARE VIA WEB, INSTEAD:
3) scp -P 2222 aredn*bin root@192.168.1.1:/tmp
4) ssh -p 2222 root@192.168.1.1 'sysupgrade -n /tmp/aredn*.bin'
5) Resume documented procedure to setup node.

 
AE6XE
AE6XE's picture
another option to try on hAP ac lite, if 'bad gateway'
I've found while testing the upgrade to the latest openwrt 19.07 release a problem only affecting the hAP ac lite model that may be related to what we have been seeing with 'bad gateway' symptoms.     This is related to the 5GHz 802.11ac wireless driver used in the hap ac lite only used on this device in AREDN frimware.   This driver is gobbling up RAM (and should not be).   If you can telnet or ssh into the node over the network cable and type the command "wifi down", this will free up RAM, including what is not supposed to be consumed.  Then the upload of firmware from the UI may work.  

If anyone tries this command, please capture the before/after free memory to confirm, from the node's command line:

free
wifi down
free

Please post any results back.    

Joe AE6XE
kg6wxc
kg6wxc's picture
That did it!

Nearly gave up on mesh networking yesterday due to all this Bad Gateway/Connection Reset garbage.
Spent all day trying different things. Then this morning others in our group linked me here. (Guess I should read the forums more eh? ;) )


Anyways...

root@NOCALL:~# free
             total       used       free     shared    buffers     cached
Mem:         60700      48684      12016         84          0      25712
-/+ buffers/cache:      22972      37728
Swap:            0          0          0


root@NOCALL:~# wifi down

root@NOCALL:~# free
             total       used       free     shared    buffers     cached
Mem:         60700      38900      21800         84          0      25712
-/+ buffers/cache:      13188      47512
Swap:            0          0          0
 

I was then able to proceed normally and uploaded the sysupgrade file without issue from the GUI!
K6CCC
K6CCC's picture
Found out the hard way, that
Found out the hard way, that the "wifi down" command takes down the 5 GHz WiFi AND the 2 GHz AREDN mesh.  Attempted this remotely and the node instantly stopped responding.  Fortunately this was just my "portable" node that just sits at home if it's not doing anything more useful (almost all the time).
Before I started the free command showed:
60700, 43348, 17352,  352, 4144, 10952
buffers:  28252, 32488

 
AB7PA
wifi down tip added to online documentation
This troubleshooting tip for Mikrotik hAP ac lite devices has been added to the AREDN online documentation here:
https://arednmesh.readthedocs.io/en/latest/arednGettingStarted/installin...
AA7AU
AA7AU's picture
Next UI upgrade for hAP lite?

Joe wrote: 'If you can telnet or ssh into the node over the network cable and type the command "wifi down", this will free up RAM, including what is not supposed to be consumed.  Then the upload of firmware from the UI may work.'

Do you think it might be a good pre-upgrade step for a currrently functioning hAPlite unit to turn *off* the 5.8 WiFi and then power cycle before attempting a normal UI upgrade?

TIA,
- Don - AA7AU
 

AE6XE
AE6XE's picture
The node reboots twice on
The node reboots twice on 'keep settings'.  Right now, the 1st time it boots in "first boot" state, the wifi will still be on regardless.  Then after applying the preserved settings, it reboots a 2nd time.

There is dialog in the Openwrt community to figure out what to do with this issue.  Basically, 802.11ac 64MB devices are unusable.   Let's see what they decide to do. 

https://github.com/openwrt/openwrt/pull/1077

Joe AE6XE
 
AA7AU
AA7AU's picture
Installed stable release

I just took my AREDN Swiss Army Knife (hAP ...), which was running 713-xxxx, and installed 3.19.3.0 on it using the GUI (from laptop). It had been quite some time since I worked with this great little node and I had forgotten the on-going problems I had with it earlier during any reboot when an ethernet cable (non-POE) was connected between its WAN port and a dumb but active Gig-switch. If cable is in place. it hangs on reboot (forever); if not in place, it reboots OK (tho it always seems slow).

This made it a bit worrying as I installed the tunnel package after 3.19.3.0 as I needed the WAN connect for the package but then it had to reboot. As it started to hang, I unplugged the cable and hard power-cycled the unit and it booted OK with all in place.

BTW: For the stable update, I did turn a clean boot (no use) then turned off the 5.8 AP, then did the free/wifi down/free bit (and also stood on one foot and spun twice counter-clockwise):
# free
total   used   free sharedbuffers cached
Mem: 60700  31148  29552136   3772  10244
-/+ buffers/cache:  17132  43568
Swap:0  0  0
# wifi down
'radio0' is disabled
# free
total   used   free sharedbuffers cached
Mem: 60700  31116  29584136   3772  10244
-/+ buffers/cache:  17100  43600
Swap:0  0  0

I'd like to try some of the features and improvements in the nightly builds, but I just don't want the worry that I'll knock it down hard and then have to struggle to re-install etc.

When will it be safe to go back the water again?

TIA,
- Don - AA7AU

AE6XE
AE6XE's picture
Don,  We've recently
Don,  We've recently discovered that the wireless driver for the 5GHz on the hAP ac lite is consuming so much memory that a node could run out of sufficient memory to function.   There is an issue with the upstream project of this wireless driver and the design --  they are primarily focused on the masses of people with desktops that have GigaBytes of RAM.    The little embedded devices with 32MegaBytes, 64, or even 256 are very small in comparison.   The RAM usage is to accommodate the newer wireless cards with 802.11ac and 160MHz of bandwidth -- to transfer an order of magnitude more data in the same time.   What is happening is devices up to 256MB of RAM are also running out of RAM, some with 2 and 3 wireless cards, and crashing.   

There's a lot of dialog right now with the openwrt developers working to figure out what patches they will apply.  The design for embedded devices  (WISP devices, home wifi routers, etc.) is necessary to accommodate devices with limited memory.     It goes with the saying, you pay for what you get.  Devices with cheap low amount of memory won't be able to do the super high data rates, because they don't have the buffer space to store and forward that amount of data. 

I have a patch applied in the early testing images for the upcoming openwrt 19.07 upgrade.   This is probably the most stable available AREDN image for the hAP ac lite.  It is the only image that lowers the consumption of RAM when using the 5GHz radio.   This issue only affects the hAP ac lite, because it is the only device with AREDN firmware with an 802.11ac chipset.   The work around, during upgrades, to recover RAM, is to turn off the 5GHz radio before attempting a firmware upgrade.

Joe AE6XE

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer