You are here

32 MB Nightly Build now available

21 posts / 0 new
Last post
w6bi
w6bi's picture
32 MB Nightly Build now available

With the latest nightly build (139) the AREDN code should be able to be successfully loaded and run on any supported device of 32 MB of RAM or higher.  Props to the AREDN guys, with yeoman help from Eric KG6WXC, for hacking and slashing at the code to get it trimmed down!

Downloads can be gotten from the AREDN web site (arednmesh.org) under Software / Nightly Builds

If you're installing it on a virgin Ubiquiti device, use the factory version, otherwise use the sysupgrade.   Downgrading the AirOS version is NO LONGER necessary

One caveat:  I've been told that none or one tunnel is all you should run on a 32 MB device

Known issues
- the Node Description field works, but not if you put an apostrophe in it (').  
- Sometimes the Firmware Update screen doesn't show prior to the node rebooting to install new firmware

For a detailed list of items being worked (not many at this point), see the Issue Tracker, under Developers.  (you need an account on github.com

Go Forth and test!  Suspected bugs should be posted in the Forum, under Development / Possible Bugs .

K5DLQ
K5DLQ's picture
For 32MB devices, this still
For 32MB devices, this still applies.... https://www.arednmesh.org/.../july-31-2018-nightly-build...

However, for the adventurous types... feel free to test and report back your findings.

 
K5DLQ
K5DLQ's picture
Also, this build is missing a
Also, this build is missing a patch for the loco-m-xw devices to prevent an ethernet port lockup.
That patch is pending.
k1ky
k1ky's picture
TUNNELS on 32MB Devices
Are you referring to a limitation of 1 Incoming or 1 outgoing active Tunnel connection for stability? 
w6bi
w6bi's picture
Tunnels and 32 MB devices
I was passing along a comment from KG6WXC who helped trim the code down.  He also said each tunnel consumed about a Megabyte of RAM.  
I've put the nightly build on an old Bullet, and it shows 4.6 MB of RAM available.  I guess theoretically you could run several.  Maybe the AREDN devs can provide better guidelines.
AE6XE
AE6XE's picture
We'll need to gain some live
We'll need to gain some live test mileage.    It's a bit tricky to measure RAM.  Linux doesn't free up everything that it can to keep good performance and cache things, but there is a measure of what could be freed up if it had to.  Also, a process uses several different categories of RAM, one category is shared RAM/code with other processes, so it didn't necessarily  double usage to add a 2nd tunnel.   

As the RAM demand increases to the limit, the kernel starts spending more and more time trying to manage and free up what it can.  At some point it will pick a process with a high point score and kill it to survive.   OLSR is high on the target list, which if killed is watchdog reset -- more instability.   Then, at some point it becomes so sluggish that it's not functioning.   

We are chipping away at things we don't need in RAM and reducing process size where able.  We were able to take more IPV6 code out of processes than had been previously removed.   We made snmp an optional install, and some other misc things.  We hope this leaves sufficient RAM to function.  If not, we'll look for more to chip out.

Joe AE6XE
K5DLQ
K5DLQ's picture
Agree with Joe.  I think the
Agree with Joe.  I think the BEST test is to get those of you who were experiencing "slugbug" to test this build and report back your findings. 
k1ky
k1ky's picture
And now we're up to AREDN-154 8/10/18
I see AREDN-154 is now available in the Nightly Builds.  Can't make heads or tails  - my head is spinning.  How did we get from 120 to 139 to 154 in a matter of days?? HELP!  I assume that we keep on top of the "latest" for testing??
K5DLQ
K5DLQ's picture
that number is a build JOB

that number is a JOB number (not a true BUILD number).  We've been running other jobs for things.  everytime we run a job, it increments the number.

The higher the JOB number, the newer the build (in the case of nightly builds)

ie.  For the "nightly build" workflow, there are 6 jobs that run (if there are changes to build) and only 1 job runs if there is nothing new to build.


 

Image Attachments: 
AJ6GZ
Reports

32MB Nanostation M2 is stable at 64+ hours on build 139 and AR-HP at 24+ hours on build 154 :)
Both around 5MB free memory. No tunnels. NSM2 is connected to 5 other live nodes via RF.

Other stuff ok on 139:
 3 PowerBeam M5-400 (1 tunnel server+meshchat api, 1 tunnel client, 1 clean)
 PowerBeam M5-300
 Nanostation M2 XW (tunnel client+meshchat api)
 NS M5 XW (No RF conn)
 MikroTik BaseBox 2

Ian
 

AE6XE
AE6XE's picture
Ian,  Do you have an NSM2 XM,
Ian,  Do you have an NSM2 XM, not in your list?  The XW Nanostations are 64 Mb devices.

Joe AE6XE
AJ6GZ
First one
Just the one mentioned in the first paragraph is the XM. Only 32MB here is the one NSM2 and the AR-HP. Everything else in the fleet is 64.
AE6XE
AE6XE's picture
got it.  Yes, this is great
got it.  Yes, this is great news.    There's 2 more hurdles where we have seen symptoms to get over. 

For whatever reason, when there are no mesh RF links, and the 32Mb RAM node is communicating via DtDlink to the mesh network, it would become unresponsive in the 3.17.1.0RC1.   Can you change the channel on the AR or the NSM2 XM so no other nodes have a link, and see if these nodes remain stable for more than a ~day?    These symptoms had usually occurred in ~hours.  If we have chipped enough away at RAM usage, and get over this hurdle, it would be even greater news.

The last hurdle, I suspect a change may be needed for sysupgrade process to be stable.  We don't yet have a change that was in 3.17.1.0RC1 to make sure /tmp,  which is in RAM, has enough room to upload a ~6Mb image file and complete the process.  However, the openwrt sysupgrade process has also had some changes to address limited RAM usage that may have compensated, which may also be related to why we sometimes don't see the "doing the upgrade screen".  

Joe AE6XE
AJ6GZ
AR

I am seeing the GUI slowness on the AR-HP under build 154 after 24 hours with only a DtD link to the live mesh. Is this addressed in 165? If so I'll bump it up.
flash = 1668 KB
/tmp = 13840 KB
memory = 3332 KB

I have had the upgrade not "take" a few times but on both 32 and 64MB devices.  It will reboot but stay on the installed version.  Not sure if it's related to the /tmp file issue above?  It seems to be less common with the later builds.

Ian
 

AE6XE
AE6XE's picture
Keep it running as is, build
Keep it running as is, build 165 will not make any difference.  Normally by now, the device would be unresponsive, so this is a a good sign.   I have a Bullet M2 up and running from today also.   

3.16.1.1 on occasion will sometimes not take a sysupgrade if it hasn't been reboot recently.  Many groups will always reboot a tower node before doing a sysupgrade.      My view is we can get over the last hurdle if it will sysupgrade shortly after a reboot reliably.  

Joe AE6XE
AJ6GZ
Update
At 36 hours it dropped off the mesh (still plugged in DtD only). Attaching a laptop this evening, it spit out an IP quickly, and then the status page took 60 seconds to load. Mesh Status took about 2 minutes and was, of course, empty. CPU >90% idle when not interacting with web pages. Free memory 2776K per the status page. Saw it go down to 1440K (via 'top') while page was rendering.
AE6XE
AE6XE's picture
Ok,  after reboot, please
Ok,  after reboot, please capture the support data in the Admin screen at the bottom and send to me or post here.  I'm running another node, going strong, I'll keep running it.  If you were accessing the node prior to it becoming sluggish, or just before, describe what the activity was.

Joe AE6XE
k1ky
k1ky's picture
Upgrade Procedures for better success
One thing that I try to always do before upgrading a node is to:

1. boot it before performing the upgrade - less chance of extra "suff" hosing up the system
2. Use Google Chrome browser (you can watch the upload progress in the bottom left corner)

 
AJ6GZ
Reboot!
Reboot! The universal solution for all of IT!
AB4YY
I too have had that "not take

I too have had that "not take" when doing an upgrade.  It seems to occur with all or most of the current Nightly Builds including aredn-165 but not all the time.  Once the upgrade starts I see the usual warning about the upgrading happening and to not power off but when the node comes back up it still has the same old version.  This has happened quite a few times on different nodes.  I haven't yet tried to reboot right before doing the upgrade.

73 - Mike ab4yy

AE6XE
AE6XE's picture
Testing latest build on 32Mb devices

A commit to the nightly build was added today and new 'nightly build' images will be available tomorrow, Aug 23, 2018.    This commit made several changes to mitigate the symptoms of 32Mb devices becoming sluggish.  Early test results indicate the problem has been solved.   However, we need more exposure and diverse environments to confirm.    Please upgrade your 32Mb devices appropriate to run the nightly build and feedback results, e.g. still ticking after 'x' days.   If you have a device with no RF link and communicate with a DTDlink or tunnel, please test. This was the scenario where the device would grind to a halt after hours or days, and expected to no longer be the case.

For all releases, it is best practice, particularly at tower sites, to reboot the node before doing a sysupgrade of new firmware and ensure clean-sufficient RAM is available.   This is also a 2nd scenario to test is stable and early testing has not revealed any concerns.

Joe AE6XE

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer