You are here

hAP AC lite update fails

18 posts / 0 new
Last post
K2QA
hAP AC lite update fails

Joe,

I picked up another hAP AC lite.
I loaded 3.19.3.0.elf.
Tried to update via browser to aredn-3.19.3.0-mikrotik-rb-nor-flash-16M-ac-sysupgrade.bin.
File uploads and then gets error that firmware not valid. I know the file is good, I've used it on other devices.
I looked at the supportdata file.
Previous device: machine : MikroTik RouterBOARD 952Ui-5ac2nD
New device: machine : MikroTik RouterBOARD RB952Ui-5ac2nD
Is different signature a problem?

Thanks,

John K2QA

Support File Attachment: 
K5DLQ
K5DLQ's picture
yes.   It sounds like that is
yes.   It sounds like that is an issue.
AE6XE
AE6XE's picture
There does look to be a
There does look to be a definition for both the "RB" and without in the device name.

'MikroTik RouterBOARD 952Ui-5ac2nD' => {
      'name'            => 'MikroTik RouterBOARD 952Ui-5ac2nD',
      'comment'         => '',
      'supported'       => '1',
      'maxpower'        => '22',
      'pwroffset'       => '0',
      'usechains'       => 1,
      'rfband'          => '2400',
    },
    'MikroTik RouterBOARD RB952Ui-5ac2nD' => {
      'name'            => 'MikroTik RouterBOARD RB952Ui-5ac2nD',
      'comment'         => '',
      'supported'       => '1',
      'maxpower'        => '22',
      'pwroffset'       => '0',
      'usechains'       => 1,
      'rfband'          => '2400',
    },

Around 212 seconds after bootup, there's not enough RAM for the kernal to function, thus the out-of-memory program is picking a high value process and killing it.   Not obvious why it would be running out of memory, let me look at the data more.  Meanwhile, try to do the sysupgrade quickly after bootup.  
 
[  212.730003] Out of memory: Kill process 3132 (admin) score 61 or sacrifice child
[  212.737662] Killed process 3217 (sh) total-vm:1536kB, anon-rss:376kB, file-rss:908kB, shmem-rss:0kB
[  212.751585] oom_reaper: reaped process 3217 (sh), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[  309.956504] Out of memory: Kill process 3626 (admin) score 61 or sacrifice child
[  309.964173] Killed process 3721 (sh) total-vm:1472kB, anon-rss:316kB, file-rss:908kB, shmem-rss:0kB
[  309.980301] oom_reaper: reaped process 3721 (sh), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
...and a few more.
K2QA
hAP AC lite reboots after scp

Joe,

Seems like each one of these I buy presents different issues.
One before this was "Bad Gateway" but was eventually successful with scp.

With this one, I've tried scp many times with no luck.

I've tried both aredn-3.19.3.0-mikrotik-vmlinux-initramfs.elf and aredn-1022-412a1e5-mikrotik-vmlinux-initramfs.elf.

A few seconds after scp completes, the box reboots back to RouterOS, so I can't run sysupgrade.

I have time to run one ps command and then get "Killed" while trying to type sysupgrade.
===
root@NOCALL:~# ps

  PID USER       VSZ STAT COMMAND
 ...
  899 root      1224 S    /sbin/logd -S 64
  918 root      1452 S    /sbin/rpcd -s /var/run/ubus.sock -t 30
 1021 root      1204 S    /usr/sbin/crond -f -c /etc/crontabs -l 5
 1039 root      1208 S    /usr/sbin/telnetd -F -l /bin/login.sh
 1069 root      1296 S    /usr/sbin/uhttpd -f -h /www -r NOCALL -x /cgi-bin -t 240 -T 30 -A 5 -n 3 -R -p 0.0.0.0:8080
 1074 root      1004 S    /usr/sbin/xinetd -pidfile /var/run/xinetd.pid
 1089 root      1252 S    /bin/sh /etc/rc.common /etc/rc.d/S70ntpclient boot
...
 1593 nobody    1244 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf -k -x /var/run/dnsmasq/dnsmasq.pid
 1613 root      1200 S    /bin/sh /usr/local/bin/linkled
 1687 root      1200 S<   /usr/sbin/ntpd -n -N -S /usr/sbin/ntpd-hotplug -p us.pool.ntp.org
 2571 root      1064 S    /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -p 2222 -K 300 -T 3
 2725 root      1200 S    /bin/sh /bin/login.sh
 2728 root      1200 S    -ash
 2846 root      1200 S    sleep 60
 2855 root      1200 S    sleep 11
 2860 root         0 SW   [kworker/0:3]
 2862 root      1240 S    /bin/sh /sbin/hotplug-call net
 2863 root         0 SW   [kworker/u2:4]
 2870 root      1200 R    ps
 2873 root      1240 S    /bin/sh /sbin/hotplug-call net
 2877 root      1240 R    /bin/sh /etc/rc.common /etc/init.d/sysntpd reload
root@NOCALL:~# sysupgrKilled
 
Connection to host lost.

===
At what point does it kill tasks?

I watched top with scp running. Available memory dropped to 8192K free.
Got back to 8476K when scp ended, then rebooted to RouterOS.

Mem: 52224K used, 8476K free, 6616K shrd, 0K buff, 32284K cached
CPU:   5% usr  38% sys   0% nic  55% idle   0% io   0% irq   0% sirq
Load average: 1.63 0.60 0.27 1/46 3073

Are there processes I can kill before running scp? Since I'm using scp, can I kill uhttpd?

Thanks,

John 
AE6XE
AE6XE's picture
This is a mystery why these
This is a mystery why these symptoms are starting to appear with 3.19.3.0 release and the same hardware, after many months.   Something must have changed, but what?   To reduce RAM usage when the elf is running, there are a few things that can be done:

1)  turn off cron so snr logging and other processes don't consume memory:   "/etc/init.d/cron stop"
2)  directly trying to kill uhttpd, would likely trigger procd  to automatically restart, shutdown with "/etc/init.d/uhttp stop"
3)  /tmp doesn't need all that RAM space, re-mount smaller:  "mount tmpfs /tmp -o remount,size=10240K" or sufficient size with the new image to be loaded.

I'll review, to add code into the nightly build:  check firstboot condition to automatically do some of these commands.

Joe  AE6XE
K2QA
'fwtool_check_image' failed
Joe,

Using 3.19.3.0 files.

I've tried another half dozen times after stopping uhttpd.

Most times I got 'fwtool_check_image' failed / Killed
====
Another time I thought it was going to work:
 
root@NOCALL:~# /etc/init.d/uhttpd stop
..
run scp in another window
...
root@NOCALL:~# sysupgrade -n /tmp/rb.bin
Killed
Commencing upgrade. Closing all shell sessions.
...
It rebooted back to RouterOS.

I even tried stopping dropbear after scp completed to free up more memory.
If you think it is a memory issue, are there more processes I can kill / shut down?
Is there a log file I can tail in another window or other diagnostics?

 
AE6XE
AE6XE's picture
Going to such extremes
Going to such extremes shouldn't be necessary, so I wonder if something else is going on.  3.19.3.0 has been installed 1000s of times since it was released many months ago, as has the nightly builds on the same device-model.     There might be an error occurring with the sysupgrade that we are not seeing.  What I'd do is put a console on the serial port to see what is happening through the process and reboot cycle.   If you're inclined, mail it to me at QRZ address.    if this is a newly purchased device, maybe there is some hardware changes causing an issue.  There has to be something different or hardware failure to explain these symptoms.

Joe AE6XE
AE6XE
AE6XE's picture
"At what point does it kill
"At what point does it kill tasks?"

The out of memory (OOM) is triggered when the kernel has a need to allocate memory to function, which I think also includes starting up new processes.    OOM ranks the current running processes to determine which one, if killed, would free up the most resources, and kills it.  I've seen OLSR at the top of the list.   But in first boot state, OLSR is not needed for anything--not making links to other mesh nodes yet. 

Joe AE6XE
K2QA
Loaded Using Another PC
Joe,

Not sure what is different, but I was able to load 3.19.3.0 via browser on first try using a different PC.
Both are Win10. Failing PC was high-end workstation Xeon laptop, successful one was a low-end Surface clone.
I can understand where the PC that worked using browser upgrade might have different browser configuration.
But don't see where different configs could affect the scp/sysupgrade scenario on failing workstation laptop.
In any case, until I buy another MikroTik, I don't have one to play with.

Thanks for your help.

John K2QA
 
AE6XE
AE6XE's picture
I'm wondering if browser
I'm wondering if browser cached data is contributingto the symptoms.    The OOM errors shouldn't be related to browser client side issues, so a combination of things might be going on.

Joe AE6XE
W9IKU
I keep trying with a killed process
I have tried installing with a GUI and command line and I have had no luck with the last two builds. Has this been solved?

Thank you!

Greg
W9IKU
greg@w9iku.net
 
kg6wxc
kg6wxc's picture
Same thing here...
I just got a HaP lite RB952Ui-5ac2nD-US and I also cannot get it to take the .bin file no matter what I do. I have tried re-downloading the files, different browsers, direct connection to the device, using a dumb switch inline, pretty much everything.
I get a "connection reset" nearly every time I try to upload the .bin file, I have also gotten the "Bad Gateway" message, even when just reloading the admin page and *not* uploading anything. I am not sure what it going on here. I used the exact same procedure to Flash an LHG-XP-HP5 the other day without a problem. (no, I am not mixing up the .bin files, almost tho, almost! :) ) I can get the Hap lite to boot from tftp no problem. The status page shows me this though: Could that have something to do with it? I also tried to scp the file to the device and got a "broken pipe", the scp connection failed midway... I am just not having any luck today. I just thought I would share in case MikroTik changed something.... I am giving up for tonight and will try some other ideas in the morning.
I don't want to have to solder a connection to this suckers serial port! :S
73
AE6XE
AE6XE's picture
KG6WXC,    capture the
KG6WXC,    capture the support data download when booted from the .elf.   Since it is showing N/A for the flash, this suggests a flash issue.   Maybe this is a rev with a new chip?  the dmesg output will tell us what it finds (or doesn't find).

Joe AE6XE
kg6wxc
kg6wxc's picture
Support Data

Here you are sir.

*edit* this might be it?
<strike>init: Can't open /sys/block/zram0/disksize: No such file or directory</strike>
nah, probably not.

Support File Attachments: 
kg6wxc
kg6wxc's picture
This line?

Is it this?
m25p80 spi0.0: found w25q128jv, expected m25p80

googling shows that w25q128jv chip to be supported by openWRT 18.06.2, I also see patch files for it, yet still no flash space...
I tried a couple of things myself, no luck.
I know you can figure this out Joe!!

kg6wxc
kg6wxc's picture
weirdest thing

So after trying various ideas, I finally gave up and dragged out the old windows laptop.
After mucking around in that for an hour or so, mostly fighting windows of course, I was able to flash the HaP lite to 3.19.2 without issue!
I was then able to easily update to the latest nightly builds.
I am not sure *why* this worked, when for 2 days I got nowhere, maybe we are missing a needed option to dnsmasq when trying to perform this procedure from Linux.
I still could not get a fresh install from the newer nightly builds though, I would get a "Connection Reset" error after the firmware uploaded and the node would just sit there, I could tell that it has at least started the firmware loading process, as the nodes SSH server was shutdown.
If I tried the upload again, I would either get a "Bad Gateway" error or the same "Connection Reset" and I am pretty sure it rebooted back to routerOS at one point during my attempts to load a nightly from scratch.
 

AE6XE
AE6XE's picture
Glad it is working.  I was
Glad it is working.  I was just looking through the support data and not turning up anything.  The logs were showing the flash was found, not seeing any smoking gun.   
kg6wxc
kg6wxc's picture
Me too
I too am glad I got it working, I'm still perplexed as to why I had so many issues.
In the next month or so I will probably buy another HaP just so I can try it again and see if I can't figure out why this happens on some installs and not others...

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer