You are here

DNS Loop

7 posts / 0 new
Last post
KM6IAU
KM6IAU's picture
DNS Loop

I'm trying to track down what is causing a DNS loop.  Here's my setup:

AREDN node v3.20.3.1:

KM6IAU-Palisade-Node1.local.mesh
Wifi address: 10.112.213.0 / 8
LAN address: 10.13.80.1 / 28
WAN address: 172.16.73.1 / 24
default gateway: 172.16.73.254
DNS 1:  172.16.73.254


Edge Router 10X:

eth1:  internet
eth2:  mesh 10.13.80.2 / 28
eth2.1:  gateway for mesh 172.16.73.254
switch0:  private network


The edge router is running dnsmasq.  It is configured as such:
cache-size: 2048
listen-on: 

  • eth2.1
  • switch0
  • (and other non-related interfaces)

name-server:

  • 1.1.1.2
  • 1.0.0.2

options:

  • log-queries
  • server=/local.mesh/10.13.80.1
  • server=/10.in-addr.arpa/10.13.80.1
  • address=/non-related-host.on.my.net/redacted


Usually DNS resolution works fine:

  • When I browse to an internet site, DNS is resolved by the edge router forwarding the request to 1.1.1.2 or 1.0.0.2, or cache, as expected.
  • When I browse to a mesh site (any DNS name ending in local.mesh), the edge router forwards the request to 10.13.80.1, or cache, as expected.

However, at some point, the AREDN node (via the WAN address 172.16.73.1) asks my edge router to resolve a *.local.mesh address, which my edge router forwards to the AREDN node at 10.13.80.1, and the AREDN node queries the edge router, which forwards to the AREDN node, ad infinitum.

Here's a snip of the dnsmaq.log on the edge router:

Dec 11 09:39:40 dnsmasq[12893]: query[AAAA] n6gkb-node1.local.mesh from 172.16.73.1
Dec 11 09:39:40 dnsmasq[12893]: forwarded n6gkb-node1.local.mesh to 10.13.80.1
Dec 11 09:39:41 dnsmasq[12893]: query[AAAA] n6gkb-node1.local.mesh from 172.16.73.1
Dec 11 09:39:41 dnsmasq[12893]: forwarded n6gkb-node1.local.mesh to 10.13.80.1
Dec 11 09:39:41 dnsmasq[12893]: query[AAAA] n6gkb-node1.local.mesh from 172.16.73.1
Dec 11 09:39:41 dnsmasq[12893]: forwarded n6gkb-node1.local.mesh to 10.13.80.1
Dec 11 09:39:41 dnsmasq[12893]: query[AAAA] n6gkb-node1.local.mesh from 172.16.73.1
Dec 11 09:39:41 dnsmasq[12893]: forwarded n6gkb-node1.local.mesh to 10.13.80.1
Dec 11 09:39:41 dnsmasq[12893]: query[AAAA] n6gkb-node1.local.mesh from 172.16.73.1
Dec 11 09:39:41 dnsmasq[12893]: forwarded n6gkb-node1.local.mesh to 10.13.80.1
Dec 11 09:39:41 dnsmasq[12893]: query[AAAA] n6gkb-node1.local.mesh from 172.16.73.1
Dec 11 09:39:41 dnsmasq[12893]: forwarded n6gkb-node1.local.mesh to 10.13.80.1
Dec 11 09:39:41 dnsmasq[12893]: query[AAAA] n6gkb-node1.local.mesh from 172.16.73.1
Dec 11 09:39:41 dnsmasq[12893]: forwarded n6gkb-node1.local.mesh to 10.13.80.1
Dec 11 09:39:41 dnsmasq[12893]: query[AAAA] n6gkb-node1.local.mesh from 172.16.73.1
Dec 11 09:39:41 dnsmasq[12893]: forwarded n6gkb-node1.local.mesh to 10.13.80.1
Dec 11 09:39:41 dnsmasq[12893]: query[AAAA] n6gkb-node1.local.mesh from 172.16.73.1
Dec 11 09:39:41 dnsmasq[12893]: forwarded n6gkb-node1.local.mesh to 10.13.80.1
Dec 11 09:39:41 dnsmasq[12893]: query[AAAA] n6gkb-node1.local.mesh from 172.16.73.1
Dec 11 09:39:41 dnsmasq[12893]: forwarded n6gkb-node1.local.mesh to 10.13.80.1
Dec 11 09:39:41 dnsmasq[12893]: query[AAAA] n6gkb-node1.local.mesh from 172.16.73.1
Dec 11 09:39:41 dnsmasq[12893]: forwarded n6gkb-node1.local.mesh to 10.13.80.1
Dec 11 09:39:41 dnsmasq[12893]: query[AAAA] n6gkb-node1.local.mesh from 172.16.73.1
Dec 11 09:39:41 dnsmasq[12893]: forwarded n6gkb-node1.local.mesh to 10.13.80.1
Dec 11 09:39:41 dnsmasq[12893]: query[AAAA] n6gkb-node1.local.mesh from 172.16.73.1
Dec 11 09:39:41 dnsmasq[12893]: forwarded n6gkb-node1.local.mesh to 10.13.80.1
Dec 11 09:39:42 dnsmasq[12893]: query[AAAA] n6gkb-node1.local.mesh from 172.16.73.1
Dec 11 09:39:42 dnsmasq[12893]: forwarded n6gkb-node1.local.mesh to 10.13.80.1
Dec 11 09:39:42 dnsmasq[12893]: query[AAAA] n6gkb-node1.local.mesh from 172.16.73.1
Dec 11 09:39:42 dnsmasq[12893]: forwarded n6gkb-node1.local.mesh to 10.13.80.1
Dec 11 09:39:42 dnsmasq[12893]: query[AAAA] n6gkb-node1.local.mesh from 172.16.73.1
Dec 11 09:39:42 dnsmasq[12893]: forwarded n6gkb-node1.local.mesh to 10.13.80.1

When this loop occurs, DNS resolution to any domain is painfully slow.  Obviously.  The particular domain name in the loop will vary, but is always a *.local.mesh domain.  Once I caught the AREDN node asking my edgerouter to resolve localnode.local.mesh.  Ha.

If I restart the AREDN node, the loop is broken, and normal functionality resumes.

Here is the flow when it is behaving normally:

Dec 11 10:18:12 dnsmasq[27019]: query[A] km6iau-palisade-node1.local.mesh from 172.16.1.10
Dec 11 10:18:12 dnsmasq[27019]: forwarded km6iau-palisade-node1.local.mesh to 10.13.80.1
Dec 11 10:18:12 dnsmasq[27019]: reply km6iau-palisade-node1.local.mesh is 10.112.213.0
I haven't isolated yet what is causing the AREDN node to try to resolve *.local.mesh on its WAN.

Any ideas?
Follow-up information:

I suspected that the AREDN node was unable to resolve a .local.mesh query, and therefore asked the WAN.  I tested it by attempting to browse to:
Boom.  Instantly the loop started.  Why would an AREDN node ever try to resolve a .local.mesh query on the WAN, instead of returning a NXDOMAIN?

As a workaround, I might see if I can cause my edgerouter to explicitly drop such requests on that particular interface.
2nd follow up:  dnsmasq doesn't provide a way to behave differently for different interfaces, except by way of a new instance bound to the target interface.

It's getting hairy!
kg6wxc
kg6wxc's picture
I have seen this too

I also run dnsmasq on my house edgerouter and also have the server=/.local.mesh/x.x.x.x options in my dnsmasq config on the edgerouter.
I used to see this behavior quite often, but it doesn't really do it anymore and I forget how I fixed it. I tried several things over the years and I don't remember which I settled on. I will have to look at my config again...
One way, I do remember is to create a little script on the edgerouter that will grab the mesh "DNS" from your localnode and then you can have dnsmasq use that file as an additional hosts file and it'll all be happy. The problem with this approach is it is not an automatic process, it'll need updating from time to time, sometimes quite often.

One thing too is the request for the "AAAA" record. A "Quad A" record is for IPv6, not for IPv4, which is what the mesh uses.
What you are seeing there is that your computer is also asking for the IPv6 address of whatever.local.mesh, your edgerouter's dnsmasq is dutifully forwarding the request to your node, like it should, but there is no AAAA record for any .local.mesh address, so your node then asks it's default DNS for the AAAA record, and the loop starts, and yes, while they go back and forth like this almost all other DNS requests grind to a halt. Which then gets the wife yelling "The Intertubes are broke again!"
Same thing is happening when you ask for a non-existant mesh address.

Now that I type all that, I think I remember how I sort of "fixed" it.
I am assuming your "localnode" has your edgerouters ip address set as it's DNS server, right? Stop that.
Why does your localnode need to resolve things on your home LAN, it probably doesn't. Just leave your localnodes DNS set at 8.8.8.8 or whatever it is by default now.
If you need to have your localnode "see out" to the internet, for tunnels or whatnot, you could probably just set it a default route pointing at your edgerouters IP. Then the dnsmasq on the node can query google all day long for AAAA and A records for not-even-there.local.mesh and they'll fail like they should and all will be happy.

I hope that even slightly made sense.
I will double check my config and I can also post the little edgerouter script I have that'll query your localnode for a list of all mesh hosts and then create a "hosts file" out of it.

kg6wxc
kg6wxc's picture
script

Here's a little script that'll run on an edgerouter or anything else with python and the needed python modules.
It queries a node for the list of hostnames: http://nodename:8080/cgi-bin/sysinfo.json?hosts=1
It then massages that data into a normal ole "hosts file" that can be used with just about anything...

Well since I can't attach it, it's not an allowed file type and the "<code>" tags aren't working either it seems, here ya go:

#!/usr/bin/env python
from urllib2 import urlopen
import json
import os

#change this IP address to be the address of YOUR localnode
localnodeIP = "10.x.x.x"

#this is where the file goes, use this for the "addn-hosts" directive in dnsmasq
#change name or path as you see fit
#(the "user-data" directory is carried across edgerouter upgrades tho ;) )
meshHostsFile = "/config/user-data/mesh_hosts"

#nothing below here should need changing...
url = 'http://' + localnodeIP + ':8080/cgi-bin/sysinfo.json?hosts=1'
response = urlopen(url)
json_obj = json.load(response)

file = open(meshHostsFile, 'w')
for i in json_obj['hosts']:
       file.write(i['ip']+"\t"+i['name']+".local.mesh\t"+i['name']+"\n")
file.write(localnodeIP + "\tlocalnode.local.mesh\tlocalnode\n")
file.close()
os.system("/etc/init.d/dnsmasq restart")
 

KM6IAU
KM6IAU's picture
Good ideas, thanks

Excellent.  I don't know why I didn't think to just put 1.1.1.2 directly in as the DNS; afterall, that is where my edgerouter is forwarding.  I don't have any reason to resolve address for the AREDN network, AREDN doesn't need access to my private hosts.  I think the initial thought was to add some layer of protection with DNS blacklists, but 1.1.1.2 actually already does that.  Heck, for AREDN, I give it a DNS of 1.1.1.3 (adult filters + malware protection).

For those that are not familiar with Cloudflare's 1.1.1.1 (and 1.1.1.2 and 1.1.1.3):  https://blog.cloudflare.com/introducing-1-1-1-1-for-families/

I also really like your other workaround.  I do have some interest in providing DNS services for AREDN WAN, but it is not necessary at this time.  A cron job every minute would handle that pretty nicely.  Worst case scenario, DNS resolution is slow for 1 minute.

Kudos, thanks

kg6wxc
kg6wxc's picture
no problem
Glad to help, I hope I explained it well enough.
I think I also have a plain ole shell script around here somewhere that does the same thing as the above python script...
Now I will have to try and find it. smiley

73, stay safe.
KM6IAU
KM6IAU's picture
Yep, that made perfect sense.
Yep, that made perfect sense.  I didn't realize the Edgerouter had python installed.  Or maybe I added it at some point, and forgot.  Ha.  Well, I think I'll use your python script.  Might be a good Saturday project.  73
kg6wxc
kg6wxc's picture
It's default

yes
On the edgerouter I have, python is there by default... I have not looked deeply, but I would bet that is how they made the gui, at least partially...
That's one reason I just used it on there...

I could not find the shell script I was thinking of. I think I deleted it cause I figured I'd never need it again...
But, I did find the same idea, but in a very basic form... It's probably what I started with many years ago...
It's just using scp to get a nodes actual "hosts_olsr" file that is used by dnsmasq on the node.

#!/bin/bash
our_mesh_node="10.x.x.x"
mesh_hosts_file="/config/user-data/mesh_hosts"

scp -P 2222 root@$our_mesh_node:/var/run/hosts_olsr $mesh_hosts_file

#get rid of stuff that is not needed
#".local.mesh" gets duplicated in some places, it's easy enough to change back
sed -i -e '/localhost/d' -e '/###/d' \
    -e $'s/\t#.*/.local.mesh/' \
    -e 's/local.mesh.local.mesh/local.mesh/' \
    -e '/^\s*$/d' $mesh_hosts_file

The python version is better, and the bash script could be made better too... I know I did it once. smiley
Unless you have the proper SSH key on the node, you'll have to put in the password to get the file.

There are several other ways I found that I tried too laugh
redirect these to output to another file:

#!/bin/bash
export PYTHONIOENCODING=utf8
curl -s 'http://10.x.x.x:8080/cgi-bin/sysinfo.json?hosts=1' | \
python -c "import sys, json; x = json.load(sys.stdin)['hosts']; for i in x: print i[ip]\ti[name]"

This needs the "jq" program (which I think is already on an edgerouter):

#!/bin/bash
host_data=$(curl -s 'http://10.x.x.x:8080/cgi-bin/sysinfo.json?hosts=1')
echo $host_data | jq -r '.hosts[] | "ip": .ip, "name": .name'


as with everything computer-ish we could find different ways to do this until the proverbial cows get home.  yes
 

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer