You are here

unable to get dtd working on a two nodes site

19 posts / 0 new
Last post
i8fuc
unable to get dtd working on a two nodes site

Hi all,
we have an AREDN network with 12 nodes up & running very well.
We also have 3 sited with multiple nodes all runnig on the same frequency happyly and using dtd
we are now adding a new site with two nodes on the same frequency and I am unable to get teh DTD running beetween the to radios.
In particular I am also unable to ping from one of the radio the other  ... actualy I get a rountrip on the ping getting several seconds !!!!
tried to join the lan ports of the two radios with a vlan aware switch as well as using a dumb switch... same result...
By checking the mesh status on one of the node I get NQ =0 while NLQ=100 with a BW 130 Mb/sec...

The two radios are a nanostation and a rocket with a 30db dish mounted on the same support.

...Any help solving this problem  is warmly welcome :)

Mike

ki4lmr
unable to get dtd working on a two nodes site

I am far from an expert on this topic.  So, take these suggestions accordingly.  I have a few nodes linked DTD with a Cisco switch (VLAN aware).  In order for the nodes (all rockets) to link DTD, the ports must be forced into trunk mode and pass packets tagged as VLAN 2.  A similar configuration is needed if you use an Air Router to connect nodes at a site.
 
Not sure why a dumb switch does not work.  The AREDN firmware is probably looking for a VLAN 2  802.1Q tag for DTD connections... ???
 
Randy - KI4LMR
____________________________________________________________________
 
The following configuration is 3 nodes (ports 2, 3, and 4), all connected DTD Air router configured for 3 DTD ports. 
1. Edit /etc/aredn.include/swconfig with WinSCP.
2. On the AR's setup, click Save.
3. Click Reboot on the AR.
 
config switch
    option name 'switch0'
    option reset '1'
    option enable_vlan '1'
 
config switch_vlan
                option device 'switch0'
                option vlan '0'
                option ports '0t 3 4'
 
config switch_vlan
                option device 'switch0'
                option vlan '2'
                option ports '0t 1t 2t'
 
_______________________________________________
 
Same thing on a Cisco switch (ports 1, 2, and 3) connected DTD
 
!
interface FastEthernet0/1
 description KI4LMR-110-3purple-235 degrees  switchport access vlan 2  switchport trunk encapsulation dot1q  switchport trunk allowed vlan 2  switchport mode trunk
!        
interface FastEthernet0/2
 description WB4TNH-small-3yellow-120
 switchport access vlan 2
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 2
 switchport mode trunk
!        
interface FastEthernet0/3
 description KI4LMR-103-3red North
 switchport access vlan 2
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 2
 switchport mode trunk
!        
 
 

i8fuc
Hi Randy,
Hi Randy,
..your hint  has been illuminating......

I just noticed that this was the first dtd test for a nanostation/XW for me,  that infact have two ethernets... just having a look to the switch config under /etc/aredn_includes I noticed a config non really very understandable for me...

so I just tryed to setup a dummy switcconfig.... no content at all..

after reboot of the two nodes.... wonderfull !!!  DTD is up &running !!!!!

So also looking the other comments I received from several other friend in the thread I just conclude that just leaving  unconfigured the switch ports on the nanostantion ( XW)  produces as a result the DTD operation....


I cannot at moment confirm that other features like direct wan is operational because the test site has no direct internet connection...

Thanks a lot to you and everubody helped to solve the issue ..

Best regards

Mike


 
KX5DX
A dumb switch will ignore

A dumb switch will ignore tagged frames coming from vlan 2 on the AREDN devices.

First try with a back to back connection(without a switch) . Next test with a managed switch and make sure the ports are set as trunk ports. That will pass all the vlans configured on the switch. Once you have it passing DtD traffic over the switch then configure the switch to just allow vlan 2 through those ports.

I have DtD traffic passing through a Cisco/Unifi/AirOS infrastructure and it works great.

 

KG6JEI
An unmanaged (dumb) switch
An unmanaged (dumb) switch will pass the packers through unaltered as broadcast packets.

VLAN's were intentionally designed to work over existing switching fabrics as part of the Ethernet specification.
K6AH
K6AH's picture
DtD is on the secondary port

Mike,

Check out this thread.  It is likely your NanoStation is an XW which supports DtD on the secondary port.

http://www.aredn.org/content/xw-nanostation-dtd-secondary-port

Andre, K6AH
 

i8fuc
Hi Andre,
Hi Andre,

as you can read in my reply to Randy I just solved the issue by creating a dummy switchconfig under /etc/aredn_includes on my nanostation/XW..

after reboot the DTD is up&running !

BTW thanks for yor answer.

Best regards
Mike
AE6XE
AE6XE's picture
Mike,
Mike,

If I understand correctly, you edited the switch config file on a NSM5 XW to include dtdlink vlan2 to the main port.   If so, this will work for using dtdlink on the main port, however it will break the usage of the LAN network.   

All the packets egressing out the main port will have a tag on them now.   This all comes from a root cause that ether the chip or the swconfig program or drivers in openwrt can not set a given port on the switch to use both tagged and untagged packets at the same time on the switch.   Openwrt defect ticket says it's a chipset defect.

Long story short, there's no untagged packets even though your config file might define this.   The untagged packets are the LAN traffic.   Easy to forget and attempt to plug a laptop into the device in the future and scratch one's head why it doesn't work.   

Joe AE6XE
i8fuc
Hi Joe,
Hi Joe,
...you arerigth :(

Let me tell the story....  yesterday nigth I just creareted an empty switchconfig and rebooted the NSM5XW... voila' I got the DTD up&running.... entering the device via the mesh was perfect...

This morning I just tryed to enter the device via the ethernet port and infact as you say it does not work :(  I was starting to analyze the issue...  

now your message just clarify me that it is a fault....

so what to do at this point ?  should we just take the conclusion we have to make a coice beetwen  the DTD  or the LAN usage ?

Let me just try to understand ; having two nodes like in my case ( a rocket and a NS)  is it possible to rely on the other node to get the DHCP and the lan usage, then entering the NS via the other node ?

thanks for your support

Mike
 
AE6XE
AE6XE's picture
Option 1:   Use 2 x cat5
Option 1:   Use 2 x cat5 cables to the NSM5 XW  (Main = LAN & Secondary = WAN, DtDlink)   Everything works as expected, just not desirable and more complicated to use 2 cables and configure switches and devices to use.

Option 2:   use 1 x cat6 cable on secondary port.  Location has multiple nodes, so plug LAN devices onto one of the other mesh node's LAN network.   This means feeding the power into the secondary port, which works, but was not designed for this purposes (other posts that elaborate more on this).  

This option #2 is a 'yes' answer to the question,  "having two nodes like in my case ( a rocket and a NS)  is it possible to rely on the other node to get the DHCP and the lan usage, then entering the NS via the other node ?"

Option 3:   use 1 x cat6 cable on the main port.   Only have access to the LAN network and connect with laptop, etc.

Joe AE6XE
 
i8fuc
Hi Joe,
Hi Joe,
thanks for the explanation 

I have on the same site a very strange situation that I am not able to fix:  my test scenario is the following:
- a server linux based is connected to a node IQ8SO5A  .that is meshed with the NS that is IK8MEX-110-239-120​ 
 that is connected in DTD as previously explained  with a rocket IK8MEX-42-183-104

- by pinging from the server the rocket I get normally a few msec round trip; time to time I get several seconds of round trips getting to 20-30 seconds...
see fig. ves_1  it seems that several packets was looped in the network and getting to the target later...
- I have investigated with tracepath and got what you can see in fig. ves_2   apparently some packets loops before getting to the rocket node.

any idea ?

Mike
Image Attachments: 
AE6XE
AE6XE's picture
Mike,  There is certainly

(Edited to get my answer right ...)

Mike,  There is certainly something slowing down the ping latency intermittently.   I'm not convienced the tracepath output is confirming a  'loop'.  What does "trouceroute" show?    Column 1 is the "time to live" being increased for each packet sent.  But it's only 3 hops away, so I'm not entirely sure how to interpret this, not as familiar with tracepath.   It is very interesting that it goes back and forth giving the appearance of a loop, but this is not necessarily what is occurring.  
I'd need to see, next step,  the support dump on all the mesh nodes to have more data on why there might be a slow down.   Down load support data is at the bottom of the Admin page.

Joe AE6XE

i8fuc
Hi Joe,
Hi Joe,

this evening I collected support data while a period of wrong routing ( let's say..) was in progress

The attached file contains 3 support data related to the 3 nodes involved on the issue.

Thanks for your valuable support...
Mike
Support File Attachments: 
AE6XE
AE6XE's picture
There are 2 other mesh nodes
There are 2 other mesh nodes in the picture, so missing detail to know what is going on:

1) I09SO5A 10.80.198.114
2) IQ8S05 10.106.215.115  <- and sariserver is on this mesh node.

2 issues:   

1) better to do a traceroute.   I found documentation of tracepath explicitly says that the asymm, "this information is not reliable".
2) there is a vlan2 or dtdlink interface on IQ8SO5A involved.  If this is linked to the NSM5XW with the non-standard vlan config, this is problematic.  What is IQ8SO5A supposed to be dtdlink'd to?

Joe AE6XE
i8fuc
Hi Joe,

Hi Joe,
thanks for your attention....  I just give you details... BTW  I just agreed with the admin request and restored the original setting on the NS we are discussing about and  allowing the dtd to happen between the 2 colocated nodes NS5 and rocket.

Now to our network:  we have several nodes and sites;  till few days ago only one site was holding more then one nodes: let's call this node "main"; in the last weeks we was  adding a new node that is 18 km  far from the rest of the nodes, in order to extend our coverage area: let's call this node "vesuvio"   (BTW it is infact on the mount Vesuvio close to the bay of Naples southern Italy).
Unfortunatelly there is no direct path from "main" to "vesuvio"  so we decided to try to transit via a node that have a direct path to vesuvio..... let's call this node  aux1.
Beeing the aux1 path to vesuvio a 18 km path nearly 90% over sea ( bay of naples) we discovered that making the link with low gain radios was unreliable mainly due to a slow fading  ranging from -78dbm to -85 dbm with a period of several minutes.... This was the situation using on both the sides of the link a rocket+sectorial ubiquiti aantenna 120°-18db gain; with this setup the aux1 node was able to  get both the vesuvio and the main node with a single radio

To try to make a stable connection to vesuvio we decided to add to the aux1 site a new node using a rocket radio with a 30 db ubiquiti parabolic antenna.
This was not allowing to get with the same radio the connection to main, so we added a NSM5 XW colocated and bearing to the main site.
After this change the problems started....   

Now the connection aux1 to vesuvio in standalone ( without connection to the mesh) is perfect....  signal is stable in the range -63dbm --68dbm and the achivable speed around 100mbs....

As soon as we connect the aux1 node to the mesh and try to ping from the main node the vesuvio or the aux1 rocket radio we notice the strange ping as describred in previous messages...

Now to the main site:  this is infact the main site of the mesh ... here we have 3 radios  on the roof:
- iq8so5 is a rocket + MIMO ubiquiti vertical  antenna
- iq8so5a is a nanobridge 5G  bearing to site aux1
- i8fuc is a 2.4G bullet acting as gateway to the 2.4Ghz submesh

all the 3 nodes are attached to a lan aware switch and using DTD locally to the site
colocated with the nodes there is a linux server based on scientific linux that we use as a network monitoring and support tool: let's call it sariserver.

The NSM5 and rocket we are talking about are located in the aux1 site ; there no  VLAN2 tunnel or other non standard connection beetween the main  and aux1 sites.

The sariserver maintains a number of statistics related to the mesh network.

That is... I will try to collect support data from all the involved nodes as soon as possible .

In the mean time I just looked around for some other tools to monitor the paths... I am now testing mtr that seems very clever in getting a sort of trailer of the traceroute versus time that apparently lets understand better what happens...I will give you additional details on that asp.
 
You should have a valid account to enter our network and see all the infos we collect.... just in case you missed it please give me again you mail and I send you the credentials...

thanks for your kind attention

Best regards

Mike

 

AE6XE
AE6XE's picture
Mike,   are all the 5Ghz
Mike,   are all the 5Ghz devices on the same channel and channel-width?  All the 5Ghz nodes in the support download information are on ch168 @ 20Mhz. If so, this is problematic or at least will reduce the overall performance by multiple factors.   When there is one poor link and traffic has no other option to route over it, then all the handshaking and resent packets cause all the other links to wait.  The network as a whole can not scale and carry very much data.   Traffic at the  main site will cause traffic at vesuvio to wait, due to hidden node issue and use of CTS/RTS.  Also exposed node problems show up.  

Some rules of best practices I try to follow to avoid all these performance killers and possibility of transient routing loops:

1) long distant P2P links isolated on their own channel, when used to carry traffic of multiple clients, generally tower site to tower site (or mountain to mountain sites).    
2) cell coverage areas on their own channel and do not interact with cell coverage of another area. 
3) If 360deg coverage of a tower area, use different channels for each sector of coverage

Of course cost and freq availability may limit always adhering to these best practices.     

Here's an example of what could be happening to look out for,  if all on the same channel.   Let's say that conditions sometimes allow a link between main and vesuvio.   Data is loading down the main<->aux1<->vesuvio path.    OLSR, which is far from perfect, starts to loose UDP packets and begins to think the link is 2 hops of 70% each.  Suddenly OLSR thinks it's better to use the main<->vesuvio direct path and flips the routing over.   It's a marginal path and not really useable.   A couple minutes latter after the application stalled out on the direct path, OLSR flips the routing back to the now unused 2-hop path.   The cycle repeats. 

Joe AE6XE
i8fuc
HI Joe,
HI Joe,

yes. all nodes involved are 5Ghz nodes all on the same frequency and 20 Mhz channel....

Your explanation of a possible "why" is exactly according to what we was thinking...
In particular the aux1<-> vesuvio link in on the sea and is far from beeing a very stable link: consider that in this period with 34°C temperature and a lot of humidity the sea paths is in the worst of the the year conditions....

So after your best-pratice suggestions I understand that we should take into consideration to make the network a little bit more complex and unfortunatelly more costly....  We was already considering to take the aux1<->vesuvio link a p-t-p with a different frequency but this make it necessary to add a second radio to the vesuvio node to collect traffic from vesuvio area.... at present before making further expenses we would see how usable is a 18 Km link over sea ....

BTW I was observing more in deep the traffic patterns using the MTR tool and it appears that the explanations we was discussing about are right:  the sariserver<->vesuvio connection is 99% of the time perfect ( based on few hours of testing with mtr) with a rtt under few msecs ... only time to time we get the issue and this create a transient 10-20 sec max fade in the communication....
The fig I attach is an example of this issue : the path is actually the sariserver<->aux1  and also on this segment we experience the problem; so this means that also on a 4 Km ( this is the length of this link)  we exploit the issue... 
as you can see from the the figure it is just a 20-25 sec pattern.... based on continous observation this is the typical pattern.... so I just confirm that having two sites with multiples radios on the same freq/bandwidth is already a situation to avoid....

I am considering to make a small tool to be run on sariserver to specifically target the capture of these issues happening in order to get an evaluation and a metric for this problem... may be it could help in giving a better understanding of the conditions and give an indication of how to optimize cost/performancies...

I will keep you informed on the subject :)

Thanks again for the valuable help.

Best regards
Mike
 
Image Attachments: 
KG6JEI
Admin Note:
Admin Note:
Moving this thread to ragchew as this involves a device that has had its config (switchport configuration) modified above what the GUI permits, please reset the node to default configs (press and hold reset for 15 seconds after the node boots) to move it to officially supported to get official support.
 
i8fuc
Hi,
Hi,
I do completely agree with the support policy....  if we want to create something serious it is essential to keep the things reproducible and clearly understandable

The method of using the GUI as the only "official" way to manage the nodes/network seems to me a good method to achive the above target.

I have put  the affected node to the original status so to have it back to the "supported"  status :)

I will open a new thread to better clarify the issues we are making evidence of... .

Best regards

Mike 
 

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer