You are here

2016 Firmware upgrade with OTA not keeping settings?

28 posts / 0 new
Last post
k1ky
k1ky's picture
2016 Firmware upgrade with OTA not keeping settings?

I have upgraded 3 units (NanoBridgeM5 and two Rocket M5's) from the 2015 Production to the 2016 Beta01 and none of the units kept the settings and reverted to "fresh install" requiring full setup.  I understand that we need to reload the Tunnel applications, bu I believe that we were able to successfully perform OTA upgrades without losing the setup information.

Is this a bug or should it not work??
 

KG6JEI
Since this is filed under
Since this is filed under tunnels, do you mean it lost the TUNNEL settings and had to be re-setup or that it lost ALL settings and had to be redone including node callsign rf power, distance, password, etc?
 
k1ky
k1ky's picture
It lost ALL setup information
It lost ALL setup information. Came back up as NO_CALL...on 192.168.1.1  Wasn't sure where to put this.

 
KG6JEI
[moderator comment: Moved to
[moderator comment: Moved to Development > Beta  forum]

I'm unable to duplicate this:
Did you accidentally uncheck the keep settings box?
By 2015 build do you perhaps mean 3.0.2 which requires you to apply a patch first?
Are you allowing the node enough time to set it self up and reboot twice after the upgrade (several minutes) as during the first reboot it will be unconfigured and ID interrupted might not go through the auto configure process.
 
k1ky
k1ky's picture
The keep settings boxes were
The keep settings boxes were checked.  Came from 3.15.1.0 AREDN firmware. Yes, I'm pretty sure we allowed plenty of time, otherwise I don't think we could have been allowed to log into the units?   I've upgraded many nodes without tunnels remotely and they all did just fine.   At any rate, no matter now if you can't duplicate as long as nobody else is experiencing this.  The Nanobridge may have originally been on 3.0.2, but all patches were applied.  The Rockets were clean loads of 3.15 betas that were subsequently updated to 3.15.1.0 last year.

 
kg9dw
kg9dw's picture
Same thing
I had a nanobridge M5 that was on 3.15.1.0 do the same thing tonight. I had completed OTA upgrades of other units without any issues. This one seems to have reverted to no settings. I'll have to visit the site to fix it and to confirm the issue. I'd recommend not doing OTA upgrades until more is known. 
KG6JEI
Please grab a support file
Please grab a support file from the node before you do anything to it )including power cycling) when you get there.

No one has been able to reproduce this here in our labs locally so a support file where it's is absolutely needed before this issue can go any further.

Also did you have tunnels installed or not? (99% my testing personally is without tunnels but I hear dozens of others with tunnels not having issues either but curious if there is perhaps a link)
kg9dw
kg9dw's picture
Unable...
There were no tunnels on this node. I can see that it received a DHCP address from my internet gateway. It would not allow any configuration through that address. I then moved the internet switch port from the node's upstream switch from VLAN 1 to 10, and poof, I blew everything up. Well not exactly, but I likely caused an IP address conflict as the node moved to 192.168.1.1 and then conflicted with the internet gateway. I've completely lost remote access to the site, so I'm blind at the moment. I'm 100% sure that the keep settings checkbox was set. 

How do I get a support file from the node without going into the configuration screens? Also the node has been remotely rebooted multiple times through a smart outlet strip, so I may not have any good logs for you anyway. 

 
KG6JEI
Its intended to be downloaded
Its intended to be downloaded from the node, so when you visit it you will need to plug into it and navigate to it.

Support dump may still be useful, but yes if its been rebooted we may not have the data, but its still worth getting a copy of to see if we get lucky.
KG6JEI
Hello Michael,
Hello Michael,

Everything I'm seeing from the support file I received from you seems to point to the checkbox having been unchecked.

None of the settings appear to have been saved, everything looks like a fresh install with no signs of anything having been retained. This really should only happen if the chebkox is unset.

Going out on a limb however to make sure we cover all the bases, what browser(and its version) did you use to do the upgrade? Just incase we are hitting some weird browser bug on the form submission.
 
Ve3xnc
Tryed to upgrade node
I tryed to do an over the air upgrade from 3.15.1.0 to 3.16.1.0bata. Everything seem to go ok but it did not do the upgrade. This unit has tunnel connections and uses static wan ip . After the unit rebooted everything worked as before even the tunnel connections but the wan ip info was missing on the setup page. The main page still showed the correct wan ip but the setup page (only wan fields) were blank. Before I tryrd this node I  successfully ugrade a node that was on hop away. All the connections were using rf not tunnel.

This might be a small bug just thought I would let u know

Thanks

 
KG6JEI
When you say it did not do
When you say it did not do the upgrade you mean it does not show 3.16.1.0b01 in the firmware version field on the status page?
n4ldr
Similar Issue

I upgraded from 3.15.1 to 3.16 beta and one node showed the firmware updating screen, and when it came up. it still had 3.15.1 on it.
I had to upload the beta again.  Thought maybe I had selected the wrong file when uploading, but had it happen on another node.

All of my nodes, lost all settings when upgrading to the 3.16 beta.
Had to re-install the Tunnel.

Another issue:  Making any changes to ports or services, the firewall file gets re-written and removes the tunnel port rule.

Last issue:  With tunnel active, I cannot ping or access nodes on the other side of the tunnel.
Currently testing on the AirRouter HP.
I had the same issue with the Tunnel usign a NanoStation.  If I down rev'd to 3.15.1 everything worked fine.  All of the issues returned when putting 3.16 beta back on.
Unable to perform the same test as I do not see a 3.15.1 version for the AirRouter HP
 

KG6JEI
You may have hit this issue
You may have hit this issue on the first one: http://www.aredn.org/content/beta-firmware-wont-apply

Regarding lost settings: I still need a support dump from someone to do anything more. Reinstalling the tunnel however is an expected mandatory requirement of the upgrade as it will not be kept across the upgrade.


Last Issue with client itself cant be pinged that is beeing looked we need a support dump from BOTH sides to finish looking at that one, so far we haven't gotten that so its just sitting so if you can get a support file from each side we can look at that.

Regarding the firewall rule overwrite: I'll go ahead and look into this with the tunnel team to see about that.
K5DLQ
K5DLQ's picture
Regarding the tunnels:
Regarding the tunnels:
As Conrad stated, you do need to hit the "Install Tunnel" button to install the binaries after a sysupgrade.
HOWEVER, your tunnel server and client settings are saved during the sysupgrade.
 
n4ldr
Support Dump
As far as the issue of not taking the first load attempt of the Beta, I don't see how it would be a timing issue when done Locally 5 ft away from the PC.
The issue of wiping out all settings on installation happened (Locally) to a Bullet, Nanostation and Nanostation Loco.
Updated a node with OTA to a node on a Mountain, and it took a couple tries before updating.  It did go thru the "Updating Firmware" screens each try.

(Tunnel / Ping Issue)
Was running the Nanostation M2 and tunneled.
With version 3.15.1 the Tunnel worked fine, no issues.
Installed the 3.16 beta, and was was not able to ping, scp, or ssh any node on the other end of the Tunnel.
Restored the NanoStation to 3.15.1 and everything returned to working properly.

With the announcement of the Beta and AirRouter I decided to purchase the AirRouter and give 3.16 beta a try again.
This node is experiencing the exact same issues with the tunnel as the NanoStation did.  Once the Tunnel is installed I loose the ability to ping other nodes.
I can however connect to those node via the Web Browser.

I reinstalled the 3.16 beta fresh. Installed the tunnel package and tested the tunnel prior to capturing the support dump.
Only changes to the node are changing the tunnel port number as my provider is blocking the default port as well as others using the same provider.

Confirmed the rule for tunnel is wiped out of the firewall file with any save to the port / services page.

Here is the support dump from the AirRouter HP.  Hope it helps.
I do not have a support dump of the other end of the station as I don't own it.

 
kg9dw
kg9dw's picture
Chrome????
I'm wondering if this is a browser problem. Conrad, didn't you say that if the node's web gui is hit before the firmware upgrade completes that the node self-configuration is interrupted? I'm wondering if this is a problem where the browser attempted to reconnect. I've seen chrome reload a page regardless if there is a longer refresh set if the network drops out from underneath chrome. In my OTA failure scenario, the workstation doing doing the upgrade only had internet access through the mesh, and it was the mesh gateway that I was updating. If chrome noticed that the internet went away and then started trying to refresh tabs, it could have hit the remote node prior to the self-configuration of "keep settings" was complete.

A long shot...worth investigating???
 
KG6JEI
WellI was thinking that
WellI was thinking that somone visited the page during the SHORT window where the node is in the preconfig mode (this is expected as part of sysupgrade) but the node reboooted, and they saved it in an unconfigured mode on top.

However this wouldn't make sense on remote nodes and as such is probably NOT the cause.
 
kg9dw
kg9dw's picture
Unable to recreate
I've been unable to recreate the bug. I've taken the node back and forth between versions and the upgrade always seems to work. My next step will be to go back to one of the two sites where I have nodes running 3.15.1, take a second node, and try to do an OTA upgrade from a workstation while on site. I'm reluctant now to attempt an upgrade of any node that I can't readily put hands on. I'm 100% sure the keep settings box was checked and yet the node came up as though it had no settings. 
AE6XE
AE6XE's picture
it should be noted that
it should be noted that OpenWRT site has posted warnings that on some hardware, the sysupgrade process (aka 'keep settings' in AREDN) has had failures.   Other groups and images have seen similar odd behavior.    The OpenWRT wiki warning posted:

"For unknown reasons such a cold reset has often been reported to be necessary after a sysupgrade. This is very very bad in case you performed this remotely!"

The AREDN process does a reboot twice, and I speculate this issue is occurring and the state of the node is half-way through the AREDN upgrade process.
  
But,  let's be cautions to over react--and I exaggerate for effect--we don't want to stop swimming in the pacific because 1 person last year got bit by a Shark in the Atlantic.   My experience is that this odd behavior is on very rare  and occurs with a 'cursed' device (yes, this is a technical term ;)  !).   I've been fortunate, that none of the Rockets (M2, M3, & M5) I have up on mountain tops are cursed--and I've used "keep settings" 20+ times without issue to have confidence.  But I do have this one NanoStation...

Joe AE6XE
KG6JEI
The symptom you talk about
The symptom you talk about from OpenWRT doesn't match the synptoms of this issues. Those device report as stalling. Nowhere even near this issue.

We have also seen OpenWRT documentation wrong on numerous occasions as well.

Do you have a support dump or a serial log Joe of the symptoms your speaking of? And are you speaking of coming up as NOCALL or something else on this NanoStation?
N8JJ
Upgrade to Beta 16 OTA

I thought I would add my experience:
I had a Bullet with the Beta 15 firmware and tried the OTA upgrade to 15.1 and it worked fine.  I did it via the download and had another node set as a WAN.
After the upgrade everything was fine.
When I go back to admin and refresh the download box, I only got the 15.1 production version and one 3.0 older version.
So I downloaded the 3.16.1.0b01 to my PC and used the upload function to the node.
The node did not recover and reverted to an IP of 192.168.1.1 and radiated a SSID of "MESHNODE"
I could not connect to the node at 192.168.1.1:8080
When it did connect, I got the redirecting to status page message and then a timeout.
I tried repowering several time with the same result and could not get a response by connecting the the MESHNODE either.  (although the wifi did connect)
Finally, I did a long reset (20 sec) via the POE and got an immediate response.
Of course, it was now NOCALL with no settings.
I reentered the settings and all is well.

I actually saw briefly a screen with my call in the process before the reset but it would not respond.  It seemed that there was something corrupted in memory.

I also should have mentioned that after the OTA failed I plugged the node directly into a PC.
One final note, I had to use the distance slider.  Entering a number directly did not work.

N8JJ
 

kg9dw
kg9dw's picture
Bug
There's definitely a bug here.

To recap, I had done an OTA upgrade on multiple nanobridge nodes with no issue. Then I got to my HEYELE node...when I did the upgrade, the node didn't come back. I had to visit the site, and I was unable to get into the node while there. I ended up swapping it out with another nanobridge. When I brought the first one home, it was in the NOCALL state. 

Fast forward to this week. I felt lucky so I did an OTA of the last of my nodes. I really wanted to try out the new N stuff as well as the charts to track some problems I've been having with a link. I loaded the 240sec update patch, it rebooted, then I did the 3.16b1 update. No issues. Yay.

Today I loaded the 240sec patch on the node I had replaced when my first node didn't update (the HEYELE site). The reboot worked. Everything looked good. I loaded 3.16b1 and guess what...it didn't come back. Luckily I had a spare hour so gear up, PPE on, check into the controlled access site, and up to the node. I plugged my laptop in the LAN port and got a link light, but no DHCP. I tried manually setting it to the 192.168.1.x subnet, but the node wasn't there on .20 or .1. I finally did the remote reset on the power brick and got it to come up in NOCALL mode. I reset the settings and everything was fine.

So what's unique about this site?

1. It's a mesh gateway node.
2. The gateway IP address is fixed...192.168.1.3/255.255.255.0 with a gateway of 192.168.1.1.
3. It's hooked to a netgeat switch with VLANs setup.
4. The internet router is on 192.168.1.1.
5. It was running 3.15.1.
6. It had just been rebooted.
7. No services.
8. No tunnels.

Dev team, does anything stand out about this config? I can attempt to recreate this in my lab later in the week if needed - I've got a few spare nanobridges.

Mike, KG9DW
K5DLQ
K5DLQ's picture
Hi Mike,
Hi Mike,
We are fairly certain that we know the cause.  Working on the solution now for Beta 2
kg9dw
kg9dw's picture
Ticket?
Ticket? Warning? Something? 
KE2N
KE2N's picture
beta 2 ota u/g n/g

I upgraded my airRouter to beta 2 here with no problem. That is a direct connection to the shack computer.  Then I used that connection to do an OTA SysUpgrade to one of my M2's - and it did not come back.  It had the same problem as previous postings:  If you connect direct to it and run an IP scan, you will find something on 192.168.1.1, but no ports open or responses to pings.  Power on reset of 15+ seconds puts it in the TFTP server mode on 192.168.1.20 and you can upload the factory bin file with a TFTP client and enter your callsign, etc., again.
 

KG6JEI
Can you reproduce this with

Can you reproduce this with beta 2 on the node and trying to go to beta 2 (upgrade in place to same version) ?

if so a support dump from BEFORE the upgrade may be useful for testing.

KE2N
KE2N's picture
patch

on a suggestion from K1KY I put in the "slow link" patch and tried another Rocket M2.  
This time it worked.  
I would not consider the LINK to be slow as it is 1 hop running at 28 Mbps and 100% LQ.  It must just be the amount of time it takes the program to burn-in.... 
 

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer