Technical

More DNS Anycast

Also known as “how to do BGP with Vultr using ExaBGP”.

Having previously written about locally anycasting services within my home network, I recently decided to run an experiment anycasting a prefix on the internet.

I’ve used ExaBGP before, and so it was a no brainer to use it again. For anycasted services, it offers a couple of benefits; it’s small and lightweight, it’s in many linux distros, and it can easily spawn a watchdog process that you can use to control your prefix advertisements.

I’ve been using Vultr for my authoritative DNS servers for a while, and so it was also a bit of a no brainer to use their services for this. I’m familiar with their UI, I already have an account, and even on the cheapest virtual machines, you can do BGP; you just need to send them a letter of authority (LOA) proving you own the address space you plan on advertising. I have my own IPv6 PI address space from RIPE, courtesy of a friendly sponsoring LIR, as well as my own ASN, so I was all set.

The other nice thing about Vultr is that the BGP session is the same peer IP at the Vultr end regardless of which of their datacentres you choose to spin things up in which means your automation to configure things is easier.

Vultr insist on a BGP session password, and the first problem I ran into turns out to be related to this, and so part of the reason for writing this is to help out anyone that also runs into this problem.

I went down a bit of a rabbit hole thinking the problem was to do with multihop BGP (Vultr’s sessions are multihop) and wondered if I needed to be setting the TTL on the outbound packets. This turned out not to be the case, but I left the settings in place anyway.

I installed BIRD and used one of Vultr’s canned configs for the virtual machine in question, and this worked like a charm, so this steered me in the direction of the problem being in my configuration of ExaBGP.

BIRD would have done the job, but I’d have had to write a new watchdog, and the one I have for ExaBGP is tried and tested, tweaked to my needs, and works well, so I was keen to get ExaBGP working. It’s a complete re-write from the one I talk about in the earlier blog post, so maybe I’ll write a post on that soon…

Either in a template or neighbor configuration, depending on the complexity of your needs, you just need outgoing-ttl 2; and incoming-ttl 2;

By default, ExaBGP expects md5-password to be a base64 encoded string, and so if what you’ve specified is just the plain text string for the session, it won’t work. If you want to specify the plain text password in this parameter, you need to set md5-base64 to false.

I’m using ansible to automate configuration, and so the template for my exabgp.conf looks like this:

process monitor {
	run /usr/local/bin/exabgp-healthcheck anycast;
	encoder text;
}

neighbor 2001:19f0:ffff::1 {
	local-address {{ ansible_default_ipv6.address }};
	router-id {{ router_id }};
	local-as {{ as_number }};
	peer-as 64515;
	hold-time 10;
	group-updates true;
	md5-password {{ md5_password }};
	md5-base64 false;
	outgoing-ttl 2;
	incoming-ttl 2;

	capability {
		graceful-restart 10;
	}

	family {
		ipv6 unicast;
	}

	api service {
		processes [ monitor ];
	}
}

I don’t have any IPv4 prefixes to advertise, so you’d need to add the relevant bits to the above if you do.

I had upgraded ExaBGP to version 4 as part of a distro upgrade on my internal resolvers, and rather than update the watchdog script, I opted for reverting ExaBGP’s setting instead, and so in exabgp.env I altered the ack setting to false in the [exabgp.api] section.

Standard
Technical

Anycasting DNS

Introduction…

I wanted to have a tinker with anycasting, and DNS seemed a sensible place to start, and easy to test and muck about with. So, I spun up a couple of DNS resolvers, and decided what my anycasted IP addresses would be. They need to be outside of the subnets I’m using on the rest of my network, as I want to route traffic to them. I’ve put the underlying machine’s unicast addresses in this subnet too, but you wouldn’t have to, depending on your set up.

Servers…

The nameservers are, essentially, identical to servers that’d deal with unicast traffic, except for the following changes. I’m using BIND, but it really doesn’t matter what you use.

We need to bind up the anycast addresses so that the O/S will deal with their traffic…

In my case, my anycasted addresses will be 10.1.53.1 and 10.1.53.2, and I’m using Debian, so my additions to /etc/network/interfaces are:

auto lo:1
iface lo:1 inet static
address 10.1.53.1
netmask 255.255.255.255

auto lo:2
iface lo:2 inet static
address 10.1.53.2
netmask 255.255.255.255

We need to stop the machine responding to ARP for these. Actually, we tell it to stop responding to ARP requests unless the interface the ARP arrives on matches the ARP’d for IP, so because we’ve bound them up to the loopback, we don’t want the machine to respond via eth0, for example, so I added the following to /etc/sysctl.conf:

net.ipv4.conf.eth0.arp_ignore = 1
net.ipv4.conf.eth0.arp_announce = 2

BGP & Load Balancing…

Now we need to advertise the anycast addresses to our router. In this case, we’ll use BGP to do this. To do that, we’ll use ExaBGP. Grab that and install it on the server, and then the config looks something like this. My router is 10.1.53.254, and my two nameservers live in 10.1.53.0/24

neighbor 10.1.53.254 {
  router-id 10.1.53.11;
  local-address 10.1.53.11;
  local-as 64601;
  peer-as 64601;
  hold-time 10;

  process watch-nameserver {
    run /usr/local/bin/nameserver_watchdog;
  }

  static {
    route 10.1.53.1/32 next-hop 10.1.53.11 watchdog anycastdns withdraw;
    route 10.1.53.2/32 next-hop 10.1.53.11 watchdog anycastdns withdraw;
    route xxxx.xxxx.xxxx:53::1/128 next-hop xxxx.xxxx.xxxx:53::11 watchdog anycastdns withdraw;
    route xxxx.xxxx.xxxx:53::2/128 next-hop xxxx.xxxx.xxxx:53::11 watchdog anycastdns withdraw;
  }
}

I withdraw the routes from the outset, so that the watchdog will announce them upon successful testing.

The router’s BGP config looks like this (it’s JunOS):

# show protocols bgp group dns-anycast
local-address 10.1.53.254;
hold-time 10;
family inet {
    unicast;
}
family inet6 {
    unicast;
}
peer-as 64601;
local-as 64601;
multipath;
neighbor 10.1.53.11;
neighbor 10.1.53.12;

I’m going to equally load balance between the two servers, but you could set a localpref on each server, for example, and have server1 handle .1 primarily with server2 taking over in the event of failure, and vice versa.

Don’t fall for JunOS’ misleading ‘per packet’ configuration item; this will, despite appearances, load balance per flow based on a hashing algorithm.

# show routing-options forwarding-table
export dns-anycast-loadbalance;

# show policy-options policy-statement dns-anycast-loadbalance
then {
    load-balance per-packet;
}

Monitoring and Health…

We’ve included a watchdog in the ExaBGP config. Without this, clearly if the nameserver fails entirely, then the BGP session will be torn down, and the traffic directed to the other host. However, if the nameserver daemon fails, then the BGP session will remain, and traffic will be disrupted. Therefore, there’s a watchdog that’ll check that the nameserver daemon is listening, and will perform a lookup against it, announcing the anycast address(es) while it’s up, and withdrawing them in the event of failure. The watchdog looks like this:

#!/usr/bin/perl

use strict;

my $debug = 0;

unless($debug) {
	$SIG{'INT'} = sub {};
}
select STDOUT;
$| = 1;

use IO::Socket;
use Net::DNS;

my $state = 'init';

my $ip;
my $domain;
if(open(C,"/etc/nameserver_watchdog.conf")) {
	chomp(($ip, $domain) = split /:/, <C>);
	close C;
} else {
	$ip = '127.0.0.1';
	$domain = 'localdomain';
}
print "checking $ip for $domain\n" if $debug;

while(1) {
	eval {
		local $SIG{ALRM} = sub { die 'Timed Out'; };
		alarm 2;
		print "attempting connect... state is [$state]\n" if $debug;
		my $socket = IO::Socket::INET->new(Proto=>'tcp', PeerAddr=>$ip, PeerPort=>53, Timeout=>2);
		if($socket && $socket->connected() && do_lookup($ip, $domain)) {
			print "announce watchdog anycastdns\n" if $state ne 'up';
			$socket->close();
			alarm 0;
			$state = 'up';
			print "state set to up\n" if $debug;
		} else {
			print "withdraw watchdog anycastdns\n" if $state ne 'down';
			$state = 'down';
			print "state set to down\n" if $debug;
		}
	};
	if($@) {
		print "state is [$state]\n" if $debug;
		print "withdraw watchdog anycastdns\n" if $state ne 'down';
		$state = 'down';
		print "state set to down in barf\n" if $debug;
	}
	alarm 0;
	sleep 10;
}

sub do_lookup {
	my $ip = shift;
	my $domain = shift;
	my $r = Net::DNS::Resolver->new;
	$r->nameservers($ip);
	$r->tcp_timeout(5);
	$r->udp_timeout(5);
	my $q = $r->query($domain,'SOA');
	my $found = 0;
	print "Answer: ".($q->answer)[0]->serial."\n" if $debug;
	$found++ if ($q->answer)[0]->serial =~ m/^\d+$/;
	if($debug > 1) {
		require Data::Dumper;
		print Data::Dumper::Dumper($q)."\n\n";
	}
	return 1 if $q && $found;
	print "Error:\n" if $debug;
	print $r->errorstring if $debug;
	print "\n===\n" if $debug;
	return 0;
}

/etc/nameserver_watchdog.conf contains lines of the format ip.ad.dr.ess:domain.com.

It’ll announce the address in the event that a tcp connection succeeds as well as a DNS lookup that you’d expect the server should answer or be permitted to recurse for you. If the DNS daemon stops responding the watchdog will withdraw the routes; if the server fails, the BGP session will fail, and the route will be withdrawn anyway.

Standard