Technical

Blocking Adverts, Tracking & Malware With RPZ

A while back, I decided I wanted to prevent at least some adverts and tracking, but rather than on a device by device basis, I wanted to achieve this for all devices on the network. Those that know me and my recent work will understand that naturally, DNS blocking sprang to mind, as I’m already very familiar with RPZ.

Originally, I was consuming a bunch of lists with some code, manipulating the entries with some weighting and then outputting an RPZ for my servers to use. However, more recently I found Energized Protect, which has a load of different levels of blocking, and they provide the different levels in a variety of formats, helpfully including RPZ. So, I’ve been trialling their lists for a couple of weeks now.

As with any external feed, you need to be aware of either false positives being added to the list by the curator, as well as things they think should be on the list that you may disagree with. I was recently affected by this with my Amazon devices, whereby one or more domains critical to the correct functioning of the Echo devices found their way onto the block list I’m consuming. To be fair, I’m consuming one of the more extreme variants of the list, and so this was something I was aware could happen (although I admit, it didn’t spring straight to the front of my mind when troubleshooting over the weekend!).

So, let’s talk about how this works.

RPZ is a feature within some DNS servers that allows you to modify the responses given to clients depending on a number of different criteria. BIND from Internet Systems Consortium (ISC) was pretty much first to have RPZ, but others have varying levels of support for the main functionality. The BIND implementation allows you to define a policy that can consist of a number of layers. Within the policy you can override the entire contents of a layer, and within each layer you can have permit and deny actions based on a number of triggers. For this use case, we are interested in two of the triggers:

  • the name being looked up
  • the IP of the client making the request

The file we download from Energized Protect will form the main blocking layer, and we’ll override the entire layer at the policy level with NXDOMAIN. Arguably we could send queries to a web server with a block page, but not all things on the requesting end of this are browsers, and we can get logging from the BIND servers if we want to know what was blocked for a given client for the purposes of troubleshooting. Of course, we will want to be able to override these entries incase something gets on the list that we don’t want to be affected by (see above).

RPZ layers are DNS zone file format (see RFC1035 section 5 if you’re particularly interested in DNS master zone format, or for RPZ you can read the RFC draft (it’s not made it to a full RFC yet…)).

Because they’re DNS zone files, they can be transferred to other DNS servers using the normal notify and transfer mechanisms.

On my network here, there’s a central authoritative server, and then a pair of recursive servers that deal with actual client requests. I’ll get around to writing about the anycast set up of those in another article.

For the purposes of this article, the authoritative master is on 192.168.1.53, and the two slaves that are actually dealing with the client recursion are on 192.168.1.51 and 192.168.1.52.

Central Authoritative Server

We’ll start with the central authoritative server. There are two bits to this, periodically fetching the RPZ, and serving it to the slave servers.

All of the scripts I talk about below, can be found in the Bitbucket repository. The code is fairly straight foward, but of course, drop me a line if you have questions.

Energized Protect update their feeds every 6 hours, and so there’s no need to poll them any more often than that. Further, the updateblockrpz script keeps an unchanged copy of the downloaded file so that wget can do timestamping and only download the file if it has actually changed on the server.

There are two further scripts, both of which allow you to manipulate an override layer in the policy. The first, rpz-override, allows you to add and remove domains from the override, either to add things you want to block, or allow things blocked in the block layer. The second script, rpz-override-client, allows you to base the action on the client IP instead of on the queried name. Both of these are written in Perl, and more specifically are built on the Net::DNS module to send the changes into the server via a dynamic update.

Next, let’s look at how we configure the server. A base understanding of BIND configuration is assumed.

First, we’ll need to config it to master the two zones, permit dynamic updates on the override zone, and permit slaves to transfer them. Depending on your distro, the location of your named.conf may vary, and also whether it’s a single file or split out with includes. I’ll just include generic config here to try and cover as many bases as possible.

zone "block" {
	type master;
	file "rpz/block";
	notify explicit;
	also-notify {
		192.168.1.51;
		192.168.1.52;
	};
};

zone "override" {
	type master;
	file "rpz/override";
	notify explicit;
	also-notify {
		192.168.1.51;
		192.168.1.52;
	};
	allow-update { 127.0.0.1; ::1; };
};

Normal rules apply here; config like also-notify can inherit from the main options section, or can be overridden per zone like we have done here (line 4 to force just the specific entries listed in lines 5-7). We do the same again with the override zone (lines 14 & 15-17), but here we also add the allow-update (line 19), in order to permit the maintenance scripts to work. If your main options section has allow-update specified, you will need to specify allow-update { none; }; in addition for the block zone, to prevent BIND from keeping journals for the zone. If you need other config that will lead to journals, such as ixfr-from-differences, for example, then the updateblockrpz script may need a tweak to freeze and thaw the block zone instead of just reloading the update.

I run the updateblockrpz script from cron at a randomly selected minute after the hour, every 6 hours and lazily capture the output to a tmp file for troubleshooting purposes. Yes, I should likely update this to log properly!

17 */6 * * * /usr/local/bin/updateblockrpz >/tmp/updateblockrpz.tmp

Slave Servers

Having got the RPZ zones set up on the master, we can turn our attention to the slaves that are actually handling the queries from the clients on the network.

First, we’ll slave the RPZ zones from the master:

masters rpzmasters { 192.168.1.53; };
zone "block" {
    type slave;
    file "rpz/block";
    masters { rpzmasters; };
};
zone "override" {
    type slave;
    file "rpz/override";
    masters { rpzmasters; };
};

…and next, we’ll define the policy that’ll apply to the clients:

options {
...
	response-policy {
		zone "override" policy given;
		zone "block" policy nxdomain;
	}
		break-dnssec yes
		qname-wait-recurse no
		max-policy-ttl 900
	;
...
};

As we mentioned before, we’re overriding the block layer at the policy level, forcing anything in that layer to result in a NXDOMAIN response. The override layer is left as given so that the actions in the layer carry. The policy is evaluated top to bottom, with the first action encountered causing an exit from policy, hence the override layer, which could be whitelisting something that’s in the block layer, is listed first.

RPZ Entries

Lastly, we’ll just briefly cover different types of record that you might want to put in the override layer; the scripts will help you mostly with this, but for those that are interested, here’s a little more detail.

Broadly, as we discussed earlier, we’re interested in two main triggers; the name being looked up, and the client making the query.

Entries that affect the domain name being looked up broadly look like this:

some.domain.name.override. 300 IN CNAME <action>.

Where <action> is one of the following:

  • rpz-passthru (whitelist)
  • rpz-drop (drop the query – quite unfriendly, will cause the client to wait for a timeout)
  • . (a literal dot, which will cause a NXDOMAIN response)

It’s also possible to do something like this, if you want to override to a block page or honeypot, for example:

some.domain.name.override. 300 IN A 192.168.0.1

…and of course, any of those can be prefixed with *. to cause the action to apply to everything within the bailiwick of some.domain.name.

Entries that affect the client look a little different. Firstly, they’re reversed, a bit like in-addr.arpa zones but they’re prefixed by an additional item specifying the CIDR notation. So, if you want to (using the actions from above) whitelist all queries from single IP 192.168.58.3, you’d do:

32.3.58.168.192.rpz-client-ip.override. 300 IN CNAME rpz-passthru.

However, if you wanted to block the upper /25, you’d do this (note use of the subnet IP, you need to specify the correct subnet boundary IP):

25.128.58.168.192.rpz-client-ip.override. 300 IN CNAME rpz-passthru.

Other Trigger Types

We’ve not talked about the other triggers, but briefly, you can also trigger actions based on:

  1. rpz-ip – the IP addresses that are returned in the answer to a query.
  2. rpz-nsdname – the domain name of the nameservers that are authoritative for the domain in the query.
  3. rpz-nsip – the IP addresses of the nameservers that are authoritative for the domain in the query (ie: what the names in (2) resolve to).

Type 1 can lead to data exfiltration, which, if you’re blocking a domain because you want to prevent exfiltration, defeats the object. If you put type 1 or type 3 in a layer, then if BIND reaches that layer as it works through the policy, it will do the recursion to the authority for the zone in order to work out if the trigger is a match. If you’re worried about data exfiltration, you MUST put the domains you’re blocking for that purpose in a RPZ layer above the first layer that includes type 1 or type 3 entries, then BIND will execute your configured action without any recursion.

…but what about DNSSEC

If you’ve read all that, and you’re thinking to yourself “hey, but surely returning modified answers will break DNSSEC” then you’re right. Your client machine stub-resolvers will trust your DNS resolver, and so won’t notice, but if you’re pointing a validating resolver at this setup, you’ll need to make sure you keep the break-dnssec yes; option I included above. Possibly counter-intuitively, this causes your RPZ server to lie to the downstream validating resolver. If baddomain.com is DNSSEC signed, and is on your block list, the downstream validating resolver will usually be sending queries with CD set instead of trusting your validation, expecting your server to send all the required DS, DNSKEY, etc. with break-dnssec yes; the RPZ server will lie; it’ll pretend baddomain.com isn’t signed and will strip all DNSSEC data in responses to the downstream resolver(s).

It’s important to note that this has an edge case. Let’s imagine you have gooddomain.com, which is signed, and is not being modified by your policy at all. Now let’s imagine you have badthing.gooddomain.com which is not at a zone split boundary, and is just a regular non-delegation entry in gooddomain.com. If you add badthing.gooddomain.com specifically to your RPZ for modification, the server can’t deal with lying about just that entry, and the downstream validator will spot the lie, returning SERVFAIL to its downstream client(s).

Standard
Technical

Alexa, why are you broken?

For a bit of context, I have an Echo Show as a bedside alarm clock, and I also have an Echo Clock paired to an Echo Dot in the kitchen, primarily to visualise timers when I’m cooking.

After a power cut late on Friday evening, the Echo Show came back on but displaying the wrong time (more or less an hour behind, but not exactly). If you asked “Alexa, what time is it?”, the correct time would be spoken, despite the wrong time still being displayed. Weird.

So, I got up, late, of course, and went to make breakfast. The Echo Clock in the kitchen is now also displaying the wrong time, but not the same wrong time that the Echo Show is displaying. Again, asking the paired Echo Dot for the time results in it speaking the correct time. Weirder and weirder.

I tried a bunch of things with the clock, reset and re-pair, checking the location and timezone settings for the devices in the app, but nothing resolved the incorrect time. I also noticed that the clock wasn’t displaying timers properly, and the Echo Dot had stopped announcing the end of timers.

I’d checked whether the Echo devices were unable to contact some central cloud service at Amazon with a quick check of twitter and some down detectors, and they didn’t seem to indicate widespread problems, and then it struck me: I wondered if my RPZ had picked up one or more entries that was causing this?

For some context at this point, I’ve been trialling the Energized Protection “Ultimate” in RPZ format. More about that soon in another post.

I got the MAC addresses for the Echo units from the Alexa app, grabbed the assigned IPs from the firewall and then looked in the RPZ logs to see if anything was being blocked for those client IPs — BINGO!

The log entries all start at the time of the power outage, and the TTL on (at least) fireoscaptiveportal.com is 60, and so I wonder if the Echo devices resolve the IP and then continue to use the resolved IP, ignoring the TTL?

There were four domains continually being blocked for the two devices:

  • fireoscaptiveportal.com
  • mas-sdk.amazon.com
  • prod.amazoncrl.com
  • unagi-na.amazon.com

I added whitelist entries to the RPZ for those, and immediatly could hear the Echo Dot in the kitchen announcing something … it was a timer from a few days before. As I stopped one, it would tell me about another, until it seemed to get very confused, resulting in a power off/power on.

fireoscaptiveportal.com.override. 5 IN CNAME rpz-passthru.
*.fireoscaptiveportal.com.override. 5 IN CNAME rpz-passthru.

Both the Echo Show and Echo Clock immediately corrected their displayed time, and timers were both displayed correctly and announced at the end correctly.

I’ve subsequently added fixed IP leases in the firewall’s DHCP config for the Echo devices, and added client-ip whitelisting for them in the RPZ.

32.105.0.1.10.rpz-client-ip.override. 5 IN CNAME rpz-passthru.
32.106.0.1.10.rpz-client-ip.override. 5 IN CNAME rpz-passthru.

Standard