You are here

3.16.1.0b02 progress??

30 posts / 0 new
Last post
K5DLQ
K5DLQ's picture
3.16.1.0b02 progress??
So, how is the beta 2 going for everyone?
kg9dw
kg9dw's picture
Haven't tried
I haven't touched it. What's the status of the OTA upgrade bug?
AE6XE
AE6XE's picture
The jury is still out on some
The jury is still out on some of the symptoms that we think are related to memory limitation issues.    Fixes have been put in to address the OTA upgrade bug in b2, but this mitigates the issue by stopping unnecessary services prior to doing the upgrade--that we know to stop.  (test by upgrading from a b2 node.)    The root cause is not yet proven, but we think we're hot on the track.    Memory management is an ongoing issue, particularly when everyone installs additional services  beyond the base AREDN firmware.  This motivates us to do things like replacing perl with the more lightweight LUA language--which we're talking about to revamp the UI with OpenWRT-LUCI technology. 

Long story short,  it is becoming more important to not load down a mesh node beyond it's core purpose--to best manage the limited physical resources available.  Lots of opportunity to package and publish services on RasPi or other off-mesh-node devices. 

Joe AE6XE
K7DXS
+1 for the Lua idea. I'd love
+1 for the Lua idea. I'd love to help out with rewriting the web interface.
WL7COO
WL7COO's picture
Where can we download it from?

I've been checking ...software->downloads->experimental builds daily since it was announced.  Not finding it.
I have an opportunity to look at ...0b02 using some radios that won't be going into production for some time yet.

Still trying to decide if we should continue using 0b01 as we undertake adding nodes to our still isolated and easily controlled initial 4 node island construction.
(Though in my case,  going up the tower to retrieve/replace a 'bricked' NSM would be a big deal <g>.)

With my very limited experience, ...0b01 has many useful features (& throughput potential) that make it substantially preferable to 3.15.1 but, I'm always concerned about using betas in any real use case, for the next few months this will be build out, training and testing leading towards demonstrations.

In my formative years it was ineffably frustrating working in an environment where new hardware and software had to be planned and budgeted for far enough in advance so it was usually obsolete by the time we put it into production.  Maybe I'm hoping to make up for that now since I'm not spending 'Uncle's' money.

Is there a simple way to keep track of 'issues' which we might want to be aware if we're planning on using 0b0x for anything more than pure testing  i.e. a  summary of questioned functionality under review or being worked currently?

Are there any issues that would prevent you from using 0b01 for real use cases shy of Emcomm deployments?

Thanks
...dan
 

K5DLQ
K5DLQ's picture
You might clear your browser
You might clear your browser cache if you don't see it.
Otherwise, here is a direct link:    http://downloads.aredn.org/firmware/ubnt/html/experimental.html
Open tickets are found here: http://bloodhound.aredn.org/products/AREDN/dashboard

We have had reports (on 3.16.10b01) of some users having trouble updating.  They experienced nodes falling back into "pre-configuration" mode (where you need to set the password, name the node, set the freq/bandwidth, etc).  Beta2 contains fixes to reduce the chances of that occurring (among other changes).
We have noticed that the risk is slightly higher if running tunnels and/or meshchat on the node.

Officially, we never recommend using a beta release on a "production" node.
WL7COO
WL7COO's picture
Thanks for the links

I did not have to clear Chrome's cache - the 0b02 files simply started showing up sometime after your post. They were there when I checked a few hours ago.
Chrome on OS X 10.11.n  seems to work like it is supposed to.

Immediate observation after installing 0b02 on my QTH end of the NSM2 <--> NSM2  portion of our 4 node mesh is that TxMbps from here to the NSM2 on the Mtn went from 15.2 up to 26.
 
Just before I noticed that I had cut back the power on the  "Node That Shouldn't Be"  from 28 to 26 dbm.  Could that possibly have improved TxMbps that much or is it the 0b02 code base that is responsible or possibly the combination?  Yup - smack me again for making two changes <g>.

This NSM2 is in a window pointed almost directly at a 50' tall, now very happy pine tree that obscures approx 80% of the view towards the Mtn 2400' higher and 7.5 miles away).  The signal/noise ratio  (is it fair to call this the 'link margin'  in Ubiquiti speak or 'fade margin'  in ROW speak?) has been between 14 and 20 db even though it is 2400' below the other NSM2 on the Mtn with zero upward tilt - even 2 degrees up would make it like 90% obscured.  

Is the 'noise floor' fixed, always at -95?   I can't believe that the mwave  RF environment here  is really that  stable or the noise floor actually that low.   I still haven't seen any of the 4 radio's (well now it is 5,  Ken swapped NBM5s at his QTH 28 miles from the 5GHz Rocket w/sector on the Mtn) signal/noise ratios show anything other than -95 for the Noise Floor.  Both the 2.4 and 5GHz radios in every and any wx conditions, temps,  wind or precip  have always and only shown -95db as the noise floor though the signal level & hence the 'margin' has varied considerably (as much as 10 db) with those variables.  Color me suspicious  that 'noise floor' is a fixed vs measured value.

It did take more than one reboot for the service advertisements (Hamchat and a Voip phone Ip address) to re-appear - I may have complicated that by installing Hamchat, saving changes and rebooting twice since I wasn't seeing it advertised immediately like it was when installed under 0b01).  Eventually I'll sort the necessary from the 'ritual'. 

Do we need to install additional 'packages' to eliminate the NTP server/timestamp confusion, none of the date/time stamps make sense, UTC or otherwise & I've come to view NTP as a useful second to WWV/WWB. I'm really looking forward to reliable time stamps in AREDN & it's apps.

When & how do we actually forward location info to the AREDN Node map or are both of these issues of enabling, and configuring,  vlans or Gateway support beyond the checkboxes?   

Is it a bad thing to write  'mesh shot' forum msgs vs sticking with a single topic?  
Helps me think but I can envision it being a disaster for others to follow.
I want to think that 'chronicles of an AREDN Ardent'  will be useful to others but  ......??


As always, TIA
73
...dan wl7coo
 

K5DLQ
K5DLQ's picture
The calculation of txmbps

The calculation of txmbps changed in b02 (per release notes).
are you setting the timezone and providing Internet to the node so that it can reach an NTP server??

re: node map, after providing Internet to the node (which could be across the mesh), go to the setup page, set lat and lon values, save them, then Upload them.

WL7COO
WL7COO's picture
... Internet to the node so that it can reach an NTP server.

No, we are not yet enabling vlans or providing access to the Internet, though I did leave the Gateway box checked overnight I haven't tried to coerce any routing from the Surface Pro's (USB 3 Docking Station's provided) Ethernet Port to the in-QTH double NAT'ed 802.11ac Wifi.

I'll wait till I understand both actions more thoroughly, implement both and then retry NTP and uploading Map info again.

Re TxMbps - I do remember seeing that 0b02 calculates it differently.  The 0b01 node is still reporting 15.2 for TxMbps.
Any idea how accurate the number is in 0b02 or how it compares to speedtest.net numbers for throughput?

I'm ready to sysupgrade the Mtn Node now.

Your reply has been very helpful - Thanks.
...dan wl7coo
 

AE6XE
AE6XE's picture
"Is the 'noise floor' fixed,
"Is the 'noise floor' fixed, always at -95?".    

Short answer, "no".  

Let me elaborate, and maybe too much detail...  These chipsets in cmos do not have the physical ability to measure an absolute signal level with the expected accuracy.   The 802.11 specifications are also written such that everything is based on relative measurements.   What we really have is a SNR value and absolute values should not be compared and should be taken with a grain of salt.   In noisy environments, you can see the noise floor jump above -95dBm in the charts.  However, I did fix it to not display anything lower than -95dBm (below the physical hardware properties).   Although if we want to get really technical and picky, this hardware noise value improves 3dB when the channel width is cut in half, so the physical hardware noise levels are more like -101 @ 5Mhz, -98 @ 10Mhz, and -95 @ 20 Mhz.  

In these Atheros Chipsets, the signal measurement is taken after 3 AGC circuits, one of which is after the signal is converted to a digital form.  We know this from the filed patents.  The chip also recalculates a reference point (the 'noise floor')  to then measure the signal.  This noise floor calculation is influenced from ambient noise which includes other undesirable or conflicting signals.     So if there's a lot of ambient noise, the noise floor may go up and then the measured SNR will go down.   

Put this all in context that the goal of the receiver is not to maximize the SNR.  The goal of the receiver is to minimize the data error rate of a signal it is trying to decode, which doesn't always mean maximizing the SNR.  So the radio can increase the noise floor level to essentially hide other ambient noise to better decode a signal.  This will show a lower SNR and some weaker signals could drop out that are now below the calculated noise floor--which is undesirable if they are legitimate neighbors.   in noisy environments, you'll see signals jumping up/down in the charts upwards of 10dB increments.  This is not the absolute value of the received signal changing in the traditional sense, these are SNR levels internal to how an 802.11 SDR works.

Joe AE6XE
WL7COO
WL7COO's picture
Outstanding Explanation Joe - Thank You !!!

Color me no longer suspicious!.

As we're using 10MHz channels,  would you suggest we try 20 MHz on 5GHz (& soon to be 3GHz if that too is similarly low noise) to see if it will improve throughput?  

Since we're using 10MHz on channel -2 does this mean we're actually  already edging over the lower boundary of our authorized freqs with the bottom (2.5 ?) MHz or am I missing something else there?

 I'm curious, is there a way to predict the impact on throughput of cutting the 2.4GHz channel width back to 5 MHz  in the context of a -95 noise floor?  It is certainly easy enough to test.

Thank you again -- a lot !!!
73
...dan wl7coo

 

AE6XE
AE6XE's picture
I measure the channel width

I measure the channel width combinations to test and know without theorizing and speculating what actual throughput occurs.  I find that the environment significantly impacts the range of throughput numbers you can get--a wide variance. 

On ch -2 center freq is 2397 and bottom edge wth 10Mhz channel width is 2392 with 2Mhz to spare within part 97 2390 edge.

There's a link in the node's help file to the algorithms in use selecting the protocol rates.   Uniquely to each neighbor, packets are sent at various OFDM rates and measured, the best throughput rate is what gets used 90%+ of the time--given 'look around' packet rates.  But broadcast and beacon packets are sent at lowest rate so all neighbors can receive them.
 
I dug into the code and it basically keeps track of packet success rate to each neighbor.  Then uses a hardcoded standard size of  a packet to determine throughput over the time period.  beta1 was using this number in mesh status and it doesn't very effectively characterize the actual throughput--although it does rank the options correctly.   beta2 is the actual protocol rate in use from 802.11a/b/g/n specifications and multiplied by the measured % packet success rate.  I hope that translates to an improved characterization :) .

The TxMbps value is the raw rate including all the protocol bits of 802.11 (measure sending to the neighbor).   A speedtest.net measure does not include the protocol bits--client requests chunks of data and calculates their transfer speed (both measures of sending/receiving to/from the server-neighbor).  Apples (with protocol bits) and Oranges (without protocol bits).

Joe AE6XE

AE6XE
AE6XE's picture
All,  the timeline to full

[correction to the release # we are nearing :) ]

All,  the timeline to full release of 3.16.1.0 is dependent on sufficient confidence that the beta versions have matured to the point we can do the release.  One of the challenges the AREDN team has is in knowing just how much mileage a beta release has in the community.     "No news" is good news, meaning there aren't bugs being reported, but this might be because no one is trying it out--difficult to tell the difference.   

How can you help us release sooner?:

  1. Use it:  Take action to help us release sooner by loading the beta.    Test with equipment that is not a critical hub in the wheel or equipment on the bench to gain the confidence for further deployment.
  2. Post it:   try out the new feature at the bottom of the basic setup page and publish your nodes location -- this communicates the AREDN version in use to give visibility.
  3. Post it some more:  Post to the forum -- "I'm using b2 and my experience is ...."
  4. Keep up with latest version:   With a beta process, we try very hard not to let scope creep and yet more new features slip in from  beta version to version--things that destabilize and introduce more bugs.   We plan for and expect that a b2 will always be more mature than a b1 and should be considered as if only applying patches.    Staying current with the latest beta version also avoids confusion--did we fix that already and why are we troubleshooting a known problem again?

We are hoping to release 3.16.1.0 in days/weeks, not weeks/months.    All feedback is appreciated particularly when it is uplifting and helps motive to squeeze in the time.   

Joe AE6XE
 

kg9dw
kg9dw's picture
OTA bug

I've been able to reproduce (and have reported) the OTA bug reliably on b1 on a node that is an internet gateway. Has this scenario been tried as part of dev/test of beta 2? If so, I'll give it a shot!

K5DLQ
K5DLQ's picture
Hi Mike,
Hi Mike,
We are planning a patch for 3.15.1.0 to be released with 3.16.1.0 to "pre-introduce" the sysupgrade fixes, so, that the risk of losing the config from 3.15.1.0 to 3.16.1.0 is minimized.

Darryl
AB4YY
So far so good....

On April 3, I upgraded to it on my node, a neighbor node (OTA) and a remote node (also OTA) with no problem.  These are AirGrid and Bullet units.  Here's a few comments anyway.

  1. Unfortunately the 'network here still really has no or very few users so testing is limited.
  2. I like the SNR 'spinner'.
  3. I don't recall ever seeing the disclosure LAT/LON push statement.  Maybe I simply missed it.  Or maybe because I was just 'pushing' again with the new FW as I had done it with the previous FW.  (I don't recall if it looked like I really needed to Push it again or not.)
  4. The remote node currently is a weak/slower link but OTA upgrade went smoothly.
  5. It seems I went "around in circles" (or maybe just one time) finding the beta DL link.  No big problem.  Perhaps
    1. "Experimental Builds" should be written as "Experimental / Beta Builds" and
    2. maybe the real issue is when "Experimental Builds" is clicked on, the web page changes but looks almost the same because it doesn't go to the top of the table where "Latest Beta version is: 3.16.1.0b02" is shown.
  6. As a final comment and not related to the SW build, the website "News" is a real good thing but I sure would like to see a date stamp near each news item.

Thanks for all the good work guys!
73,  Mike

K5DLQ
K5DLQ's picture
FYI... the News article
FYI... the News article "update timestamp" is now at the bottom of each article.
 
K5DLQ
K5DLQ's picture
The map disclosure verbiage
The map disclosure verbiage is on the help page
AB4YY
"Help" page - minor problem

I just noticed when using the Help link and selecting the "Basic Setup" link (http://localnode.local.mesh:8080/help.html#setup), the target is actually "Mesh Status".  No biggie and you guys may already be aware of that.

- Mike

AE6XE
AE6XE's picture
My bad.  I added in the mesh

My bad.  I added in the mesh status help section and in the copy-paste to reuse code, didn't change the name...   Thanks for the catch, I'll check in the fix.  [edit...not.   one has to be quick on the draw!  Darryl beat me too it or already did it.  ]

KE2N
KE2N's picture
my experience with beta 2

as previously posted - my first OTA M2 upgrade went into "preconfiguration mode" - For the next one, I did the extended timeout patch first and it worked fine.  Just now I did an OTA upgrade of the busiest M2 node - with a couple of tunnel-client and tunnel-server connections running and video chat - and it worked fine. It kept the tunnel configuration too.  I was not taking any chances and first did the extended timeout patch on this one too.

K5DLQ
K5DLQ's picture
Thanks for the confirmation
Thanks for the confirmation Ken.
 
KE2N
KE2N's picture
ntp

I should add that I am using the N2MH NTP time server (10.148.231.14) across a tunnel - which seems to work fine.  As I recall, this required a tweak to the mesh firewall, which I see has been incorporated into the beta 2 release.


 

VE3RTJ
Is this the OTA 'bug'?
I'm having some trouble updating to b2. The first update, to a Pico using the Bullet sysupgrd file, went perfectly. This particular Pico has no other packages installed. The next round of update attempts to a Nano M2 and an AR HP didn't go so well. In both cases, the router took forever to come back online, and when they did, they had reverted to b1. 

The AR only had tunnel software (Vtun) added, not meshchat.

On one NS2, I removed meshchat and vtun, and it updated fine. On another NS, which had vtun and meshchat, I removed vtun and left meshchat. It hasn't reverted (yet) and now I can't contact it. Since it's remote, it'll take a field trip to correct. (Yeah, I know, don't experiment with remote hardware. That's not really the point of this post)

Lesson learned, remove the extra stuff before updating OTA. What about using the factory image? Would writing that to an existing node instead of the sysupgrd file leave me with a clean install ready to be re-configured? For most of my nodes I have alternate means to connect for admin puposes, so this wouldn't present a hardship.

 
AE6XE
AE6XE's picture
There are changes to address
There are changes to address this OTA upgrade issues in b2.  But this means you have to be upgrading a b2 node to obtain the benefit for now.   Thanks for the data points on what worked and didn't work.      Others will benefit to know that removing the meshchat and tunnel packages (from Admin) did help to successfully upgrade OTA from b1 to b2.   Note, I don't think anyone has yet experienced these OTA issues on a Rocket--which has double the RAM of NS, Pico, bullets, etc.

The factory image is installed with the 'tftp' process.  Since this requires the reset button to be physically pressed, it can't be done remotely.  The node will also boot up in 'AP' mode with a 192.168.x.x. address and can not be accessed over the mesh network or by another mesh RF link until configured.    

Joe AE6XE
K6AH
K6AH's picture
Use your cell phone
However, in this AP (Access Point) state you can simply connect to the node using your cell phone via WIFI and configure the node through that RF connection. Saves having to reconfig your computer network settings.

Andre, K6AH
KE2N
KE2N's picture
Rocket

My first Rocket (M2) OTA went into AP mode.  For the second Rocket, I did the "slow links" patch first and it went fine.
That's only two data points of course....

KE2N
KE2N's picture
Bill Gates

Today I tried upgrading a second airRouter and it absolutely refused to take it. Kept coming back with b01. I thought this might be deja vu of vendors not letting us flash new firmware. Except of course that the starting point was an AREDN software load.  I found (eventually) that my Win10 laptop was in need of rebooting due to an background upgrade.  But even after rebooting the laptop, the AR refused both sys upgrade and a factory versions of b01.  So I ended up having to TFTP it.   I have one more airRouter I can try later.  The good news is that I subsequently used the b02 AR to do an OTA upgrade of an M2 and it worked fine. 

kg9dw
kg9dw's picture
Results
I updated 4 nanobridge M5 units this evening to b2. 2 were on 3.15, and two were on 3.16.01b1. All worked fine. I also took and setup a wan gateway node on 3.15.01 and did an ota upgrade of it to 3.16.01b2. This is the same configuration I've had fail the ota upgrade twice before. Beta 2 worked on it successfully.

All of my basic function tests are working fine. I do see different numbers on the status page for throughput. I've not yet validated if they are accurate.

MB
kc8rgo
ARHP does not upgrade to b02

Just got back to Michigan - RocketM2, NSM2, and ARHP on the living room floor.

Rocket and NSM2 upgraded tp b02.  Rocket connected to laptop.  NSM2 upgraded over the air.  Both clean of other software.

ARHP cluttered with Tunnel client, MESHCHAT would not upgrade with either connected or over the air.

Support dump attached.

 

Support File Attachments: 

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer