Karl's Little World

…and the things that make it tick…

  • Temperature Monitoring – Part Three – Multicast

    For the moment, I’m still sending updates into IOT Plotter, but I wanted to have more flexibility (and I wanted to tinker some more, honestly).

    Pico Changes…

    So there have been a number of changes, additions, tidying and refactoring to the code running on the pico.

    Now, on each polling run, for each temperature sensor, the pico sends a UDP multicast packet onto the network containing a JSON string. The string contains the sensor name and the current temperature.

    One UDP packet for each sensor for each poll of the sensors.

    What this means is that anything else around the network that is interested in the various temperatures, can subscribe to the multicast group and do as it wishes with what it receives.

    I had the onboard LED come on at the start of a polling cycle and go off at the end. This gives at least a little bit of status without needing to hook up to the computer.

    I also updated the bit of code that makes the API call, and had it send a multicast packet with a status update in it so there’s also an ability to do some remote debug without the need to hook up Thonny being the first port of call.

    Prometheus & Grafana

    I already use prometheus elsewhere for collecting metrics from various things, but in all cases I’m using existing exporters and predefined Grafana dashboards. This was an opportunity to have a go from scratch. Learn. Understand.

    Prometheus Exporter

    I wrote a basic exporter using the Python `prometheus_client`library.

    The idea being that the exporter would:

    • Run a HTTP server on a port so that Prometheus can scrape the metric(s)
    • Subscribe to the multicast group and listen for the UDP packets
    • Update the metric(s) with the current temperature

    The resulting code can be seen in the Github repository

    Whilst the code only processes the UDP packets containing packets containing temperature updates, it does log the status updates as well, and so the log file for the running service can be used to debug the current state of the IOT Plotter API calls too.

    Friendlier Sensor Names

    Each temperature update includes the sensor name from the configuration file on the pico. I had ensured that all of the names were formed with a capital for each new word, so Loft, LivingRoom, ColdWaterTank etc. In the exporter, I convert these into “friendly names” that are included in the updates in a friendly_name label, so the above become Loft, “Living Room”, “Cold Water Tank”.

    Prometheus

    Configuring Prometheus is the usual addition of a job to scrape the target. I’m running Debian, so my config file is in /etc/prometheus/prometheus.yml

    - job_name: 'temperature'
      static_configs:
        - targets: ['192.0.2.100:8000']
    

    Grafana

    In Grafana I created a new dashboard, and a variable picker. This picker uses the friendly_name field introduced above:

    The main graph then uses this, defaulting to “All”.

    In the following example you can see:

    • when the heating comes on, whether it’s feeding the hot water, radiators or both
    • the temperature in the loft, and the water in the cold water tank

    Alerts

    I created an alert in Grafana that will tell me if the water in the cold water tank gets to 3 celsius so I know if it’s approaching freezing (although its a reasonable volume so won’t freeze quickly)

    I also wanted to be alerted as to whether updates had stopped arriving for a sensor.

    I looked at a combination of last_over_time and changes, but, particularly with the cold water tank, the temperature could be very stable for hours on end even if the updates were arriving.

    So I decided to add a timestamp metric that is updated at the same time as the temperature. So, even if the temperature value doesn’t change, the timestamp, which is the unixtime value at the time of the update, will always change.

    That allowed me to create an alert for a sensor on each pico (bearing in mind that some picos have multiple sensors, I don’t want to be alerted multiple times for each pico)

    time() - last_over_time(
        timestamp(
            changes(temperature_last_seen_timestamp{sensor_name="ColdWaterTank"}[1m]) > 0
        )[1h:]
    )
    

    Summary

    A nice thing about the design is that if you configure and plug in a new sensor:

    • It’ll start multicasting temperature updates to the network.
    • The prometheus exporter will pick the updates up automatically and include them for scraping.
    • Prometheus will pick them up automatically the next time it scrapes.
    • Finally, Grafana will see them automatically, and at least for the main graph on the dashboard, they’ll appear automatically.
  • Hikvision, Frigate and Home Assistant

    Introduction

    A while back now, I started using Home Assistant and that quickly grew to me adding my first camera and using Frigate.

    My first camera was a DS-2CD2387G2-LU.

    At the time, I had thought my camera had a third stream. but I had only managed to persuade Frigate to work with the first, with patchy results with the second. The third didn’t work at all.

    I had recently wanted to add another camera, and bought the current equivalent of my original (DS-2CD2387G3-LI2UY), to remain as consistent as possible. This too apparently had a third stream.

    Whilst reading something else trying to get audio working (in Safari) for the live feed, I stumbled across a post telling me how to get the third stream working on the cameras.

    So, to cut to the chase, you need to switch VCA mode to “Monitoring”. When you save, the camera will reboot and then your third stream will then magically appear.

    I had wanted this so that I could have a lower but still reasonable resolution for detection whilst continuing to record the main 4K stream.

    I hope this is useful to someone else!

    Configuration

    If, like me, you’ve been having a challenge with any of this, then this is the config I’m currently running:

    Each camera is essentially the same, so I’ll just include the config for one.

    This first bit has been there since the original camera, and I’m not entirely sure it’s still needed as audio has become a more mainstream feature. But I’ve not looked at that yet, so it’s still there.

    ffmpeg:
      hwaccel_args: preset-rpi-64-h264
      output_args:
        record: -f segment -segment_time 10 -segment_format mp4 -reset_timestamps 1 
          -strftime 1 -c:v copy -c:a aac
    
    

    Next, go2rtc config:

    go2rtc:
      streams:
        cam_1:
          - rtsp://user:password@192.0.2.101:554/Streaming/Channels/101
          - "ffmpeg:cam_1#audio=aac"
        cam_1_sub:
          - rtsp://user:password@192.0.2.101:554/Streaming/Channels/103
    

    …and then finally, the camera:

    cameras:
      cam_1:
        ffmpeg:
          inputs:
            - path: rtsp://127.0.0.1:8554/cam_1
              input_args: preset-rtsp-restream
              roles:
                - record
            - path: rtsp://127.0.0.1:8554/cam_1_sub
              input_args: preset-rtsp-restream
              roles:
                - detect
        detect:
          enabled: true
          width: 1920
          height: 1080
          fps: 5
        record:
          enabled: true
        live:
          streams:
            Main Stream: cam_1
            Sub Stream: cam_1_sub
    

    Camera Configuration Screens

    First, the older of the two, running firmware V5.7.3 build 220112

    Then the newer one, running firmware v5.8.10 build 250605 currently

  • Temperature Monitoring – Part Two – Boiler Monitoring

    Following on from the previous post where I wrote about an updated approach to monitoring temperatures around the house, I made some updates and implemented monitoring the various temperatures related to the boiler, hot water and radiators.

    Using the board designed and tested in Temperature Monitoring – Part One, I picked up some DS18B20 sensors on longer wires and with waterproof ends.

    They were run to the following locations:

    • Through the loft to the cold water tank where it’s lowered until it almost but not quite touches the bottom of the tank.
    • In the loft at approximately the height of the top of the cold water tank
    • In the airing cupboard tied to the feed from the boiler to the valve
    • In the airing cupboard tied to the feed from the valve to the hot water tank
    • In the airing cupboard tied to the feed from the value to the radiators

    The sensors come with a metal housing on the end, and so I made a makeshift strip from kitchen aluminium foil; about an inch wide and as long as the width of the roll. I wrapped this around the pipe once, and then included the sensor housing, before wrapping the rest of the tape around. I fastened with a couple of cable ties.

    Due to the short distances in the airing cupboard, I placed the hot water and radiator sensors as far from the valve as possible to try and minimise the pick up of temperature conducted through the pipe itself, trying to minimise reading temperature on the radiator sensor when the water is being directed to the hot water tank, and vice versa.

    There’s a 3 pin socket in the airing cupboard and so I replaced it with a socket that included a USB charging socket which allowed me to connect the board up in there.

    The sensors were connected to the board and the software loaded.

    You’ll have to ignore the state of the airing cupboard!

    You can see in the top left, the feed from the 3 way valve into the hot water tank, and in the lower right you can just about see the sensor on the feed into the valve, and the sensor on the pipe just before it disappears into the floor to feed the radiators.

    Issues

    I noticed that the updates to the IOTPlotter API would stop, and require the pico to be restarted. I did some troubleshooting by running with Thonny connected via a long USB cable.

    The call to the API would periodically have issues, either due to a brief outage of my broadband line, or (I suspect) possible maintenance of the API, etc. The code had not included a timeout on the API call, so I introduced one along with a try, except.

    This reduced the issues but ultimately I had another idea to improve things, coming soon in part 3…

  • Temperature Monitoring – Part One

    I’ve written about monitoring temperature around the house before.

    The solution I wrote about then is still working well.

    But, more recently, I wanted to not only update the system, play with some new things, but also a minor underlying driver was the desire to be able to see when the boiler was on, and where the heat was being sent; radiators and/or hot water.

    I had also been looking for a reason to have a play with Raspberry Pi Pico and I’d found that it’s fairly trivial to use DS18B20 sensors on one of the Pico’s GPIO pins.

    Hardware

    So, I ordered a Pico2W, a small breadboard to prototype it, a solderable breadboard for the final item, and some DS18B20 sensors.

    The DS18B20 have a low power requirement, and so with the Pico powered by a suitable power supply, 3V3 can be taken from pin 36, with ground on any of the GND pins. I used pin 38 in reality, but pin 28 makes the diagram tidier.

    The data pin from the DS18B20 can be connected to any GPIO pin, with the 4k7 resistor then connected between data and 3V3.

    Each DS18B20 has a unique 64 bit address, and so it’s possible to connect multiple sensors to the same GPIO pin.

    Software

    Next we need to write some code for the microcontroller. Of the languages supported by the Pico, I am most familiar with Python, so I grabbed a MicroPython image.

    When first connected to your USB port, a new Pico presents in bootloader mode. Loading the MicroPython image is as simple as copying the U2F image file to the presented volume upon which it will automatically reboot.

    You can persuade the Pico to connect in bootloader mode in future by pressing the bootloader button while you connect the power.

    I use Thonny for programming the Pico.

    We need the code to do a few things; connect to the WiFi, initialise the temperature sensor(s) and send the collected temperature data somewhere.

    A friend was using iotplotter.com to send time series data and visualise it in a graph. 

    A few articles referenced the itk_pico code, and that gave me a bit of a head start.

    Unless I’m mistaken (I’m new-ish to Python, so this is likely!) the code assumes a single connected temperature sensor, and for my use I also wanted to be able to give the sensors ascii names, so I made some modifications to the temperature.py code…

    First, a method to convert to a human friendly name:

        def friendly_name(self, device):
            string = binascii.hexlify(device)
            return string.decode('ascii')
    

    …a method to return all of the friendly names…

        def get_device_friendly_names(self):
            names = {}
            for device in self._devices:
                names[self.friendly_name(device)] = {}
            return names
    

    …modified the get_temperature method to loop around all sensors and return all of the temperatures…

        def get_temperature(self):
            temps = {}
            for device in self._devices:
                self._sensor.convert_temp()
                time.sleep(1)
                temp = self._sensor.read_temp(device)
                device_string = self.friendly_name(device)
                Logger.print(f"Device: {device_string}; Temperature: {temp} celcius")
                temps[device_string] = temp
            return temps
    

    …and lastly, some tweaks to the initialisation; a bit more debug output.

        def __init__(self, pin: int) -> None:
            self._pin = pin
            self._one_wire = onewire.OneWire(machine.Pin(pin))
            self._sensor = ds18x20.DS18X20(self._one_wire)
            Logger.print("Initialised on pin:", self._pin)
            Logger.print("Scanning for devices...")
            self._devices = self._sensor.scan()
            Logger.print("Found devices:", self._devices)
            for device in self._devices:
                friendly = self.friendly_name(device)
                Logger.print(f"Device: {device}; Friendly Name: {friendly}")
            if not self._devices:
                raise RuntimeError("No DS18B20 found!")
    

    When the pico is powered on without a console, ie: just plugged into a USB PSU rather than your computer’s USB port, it executes main.py so lets have a look at that…

    I’m only using the temperature, wifi and logger code, so we import those. The IOT Plotter API expects the payload as JSON, we have our config, and we’ll use the requests module to POST to the API.

    from itk_pico.temperature import TemperatureSensor
    from itk_pico.wifi import WiFi
    from itk_pico.logger import Logger
    from time import sleep
    import json
    import config
    import requests
    

    Next, we’ll initialise the temperature sensors and friendly names…

    temperature_sensor = TemperatureSensor(config.GPIO_PIN)
    
    sensor_config = temperature_sensor.get_device_friendly_names()
    

    We’ll initialise default values for each sensor and then have a look in the config to see if there’s a specific setting for each…

    for sensor in sensor_config:
        Logger.print(f"Initialising sensor {sensor} details...")
    
        if "default" in config.SENSOR.keys():
            sensor_config[sensor]["name"] = config.SENSOR["default"]["name"]
        else:
            raise RuntimeError("No default settings in config file")
    
        if sensor in config.SENSOR.keys():
            if "name" in config.SENSOR[sensor].keys():
                sensor_config[sensor]["name"] = config.SENSOR[sensor]["name"]
    
        Logger.print(f"Sensor {sensor}; Name: {sensor_config[sensor]['name']}")
    
    

    The main loop looks like this. It’s possible to set the time for each data value but we’re sending data directly in real time, so by omitting the time, IOT Plotter will use the current time.

    while True:
        # check we're still connected to the wifi
        wifi.try_reconnect_if_lost()
    
        # get the temperature from each sensor
        sensors = temperature_sensor.get_temperature()
    
        # initialise headers and payload
        headers = {'api-key': config.API_KEY}
        payload = {}
        payload["data"] = {}
    
        # for each sensor, get the temperature and add it to the payload
        for sensor in sensors:
            sensor_name = sensor_config[sensor]["name"]
            temperature = sensors[sensor]
            Logger.print(f"Sensor: {sensor}; Name: {sensor_name}; Temp: {temperature}")
            payload["data"][sensor_name] = []
            payload["data"][sensor_name].append({"value": temperature})
    
        # send the payload to the API
        response = requests.post(feed_url, headers=headers, data=json.dumps(payload))
        Logger.print(f"API response: {response.status_code} {response.text}")
        Logger.print(f"Sleeping for {config.SLEEP} seconds")
    
        # ...and sleep
        sleep(config.SLEEP)
    

    The config file looks like this:

    SSID = "wifi-ssid"
    PSK = "wifi-password"
    GPIO_PIN = 15
    SLEEP = 60
    BASE_URL = "http://iotplotter.com/api/v2/feed/"
    API_KEY = "api-key-goes-here"
    FEED_ID = "feed-is-goes-here"
    
    SENSOR = {}
    SENSOR["default"] = {"name": "DefaultSensorName"}
    SENSOR["0123456789012345"] = {"name": "SomeSensorName"}
    

    The full code can be found at https://github.com/karldyson/pico-temperature

    Having tested it, I soldered the board up.

    Sensors can be connected to the screw terminal on the top right.

    Stay tuned for the next article on implementing this…

  • Things That IPv6 Mostly Breaks… Part 1

    Following my recent article on ipv6-mostly, last weekend I enabled this on my main network segment as it had all gone well on the guest network.

    I had half expected a bunch of things to stop working in some way…

    Would the Hue app on my phone still talk to the Hue bridge?

    Would the Alexa app on my phone still talk to the devices in my home?

    I assumed a bunch of these things probably all make connections to a cloud service and communicate between themselves via that.

    Nothing appeared to break.

    Until I tried to cycle on Zwift.

    Zwift seems to work fine, however, the companion app doesn’t work as an in-game companion. It talks to Zwift all ok, shows events, activities, etc, just no companion.

    I suspected that the game communicates its local/internal IP address to Zwift’s servers, and then the companion app gets the IP from there and makes a local connection direct to the game.

    However, the local IP that the game will find on the interface is an unroutable IPv4 address between 192.0.0.2 and 192.0.0.7 from the DS-Lite reserved IP addresses (see RFC6333)

    So, if the game communicates this via Zwift servers, the companion app is going to fail to make a connection to that.

    I manually added a valid IPv4 address to both the Apple TV where the game is running, and my iPhone, and (with a game exit & restart) the companion started working just fine.

    I’ve raise a support ticket with Zwift. I’ll let you know…

  • IPv6 Privacy Extensions on Linux

    Introduction

    This is definitely one of the blog posts written to remind me how to do something.

    I’m sure I knew this, but returning to it on a new box caused me frustration that Googling did not help with. Maybe my google-fu is broken?

    It was only after quite a bit of googling and reading things that I spotted something in a comment that reminded me of the thing I was missing.

    So, I’m writing this so that it will remind me in future, and may help others.

    Scenario…

    I had built a new Raspberry Pi 5 as a pi-hole server.

    Of course, in order to ssh to it and browse to the admin UI, it has fixed IP addresses.

    Actual DNS service is done by advertising service IP(s) into the router’s routing table so I can do some ECMP and resilience.

    I wanted it to recurse to the internet from privacy extensions IPs so that the recursive source IPs change over time.

    I knew I needed to stick a ‘2’ in /proc/sys/net/ipv6/conf/eth0/use_tempaddr and that things worked best if you also stick ‘1’ in /proc/sys/net/ipv6/conf/eth0/accept_ra

    But, nothing. No dynamic IPv6 addresses.

    Bother.

    I knew there was something else I was pretty sure there’s a third file I need to do something with, but could I remember which one it was?

    Nope.

    Solution…

    So, I’ll cut to the chase.

    For automatic IP configuration, there also needs to be a ‘1’ in /proc/sys/net/ipv6/conf/eth0/autoconf

    Configuration…

    This is what I have done to fix this such that it’s reboot and upgrade safe. By all means leave me a comment below if I’ve missed something obvious here!

    sysctl can be persuaded to make these changes as follows:

    $ cat /etc/sysctl.d/99-tempaddr.conf
    # accept router-advertisements
    net.ipv6.conf.all.accept_ra=1
    net.ipv6.conf.default.accept_ra=1
    
    # use temporary addresses (privacy extensions)
    net.ipv6.conf.all.use_tempaddr=2
    net.ipv6.conf.default.use_tempaddr=2
    
    # auto configure addresses
    net.ipv6.conf.all.autoconf=1
    net.ipv6.conf.default.autoconf=1

    I don’t use NetworkManager for my eth0 IP addresses, as I’m more familiar with /etc/network/interfaces so this is my static config and a poke to make sure that eth0 picks up the settings too.

    $ cat /etc/network/interfaces.d/ethernet
    auto eth0
    allow-hotplug eth0
    iface eth0 inet static
    	address 10.1.1.53
    	netmask 255.255.255.0
    	gateway 10.1.1.1
    
    iface eth0 inet6 static
    	address 2001:db8::53
    	netmask 64
    	gateway 2001:db8::1
            post-up /usr/sbin/sysctl net.ipv6.conf.eth0.accept_ra=1
            post-up /usr/sbin/sysctl net.ipv6.conf.eth0.autoconf=1
            post-up /usr/sbin/sysctl net.ipv6.conf.eth0.use_tempaddr=2

    Fixed!

    …and this is what the output looks like now that that is working…

    $ ip add sh eth0
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
        link/ether 2c:cf:67:8a:03:50 brd ff:ff:ff:ff:ff:ff
        inet 10.1.1.53/24 brd 10.1.1.255 scope global eth0
           valid_lft forever preferred_lft forever
        inet6 2001:db8::a863:8e:293f:1ec7/64 scope global temporary dynamic
           valid_lft 525174sec preferred_lft 6706sec
        inet6 2001:db8::2ecf:67ff:fe8a:350/64 scope global dynamic mngtmpaddr
           valid_lft 2591996sec preferred_lft 604796sec
        inet6 2001:db8::53/64 scope global
           valid_lft forever preferred_lft forever
        inet6 fe80::2ecf:67ff:fe8a:350/64 scope link
           valid_lft forever preferred_lft forever

    …and remote things on the internet see the correct source for my outbound connections…

    $ dig i.p.c.je txt @i.p.c.je +short
    "query received from source IP: 2001:db8::a863:8e:293f:1ec7"
    "query received from source port: 47948"
    "query received over: udp"
    "edns size: 1232"
    $ curl -6 c.je/ip
    Your IP is 2001:db8::a863:8e:293f:1ec7
    You're using curl/7.88.1

    Technicalities

    use_tempaddr

    Values in this file have the following meanings:

    <= 0 : Privacy Extensions is disabled
    == 1 : Privacy Extensions is enabled, but public addresses are preferred over temporary addresses
    > 1 : Privacy Extensions is enabled, but temporary addresses are preferred over public addresses

    Note that in some very brief and not very scientific testing, if autoconf is enabled and use_tempaddr is set to 1, the addresses that seem to be preferred are the autoconf addresses even if you have a static configured. I don’t think this surprises me, but worth noting.

    accept_ra

    Simply put, 0 to disable, 1 to enable the acceptance of router advertisements.

    Note that if you disable this, you will need to specify the gateway statically if you have not already.

    If this is disabled but use_tempaddr is non zero, temporary addresses can still be created but just not using prefix information from the router advertisements.

    autoconf

    As above; 0 to disable, 1 to enable the automatic configuration of IP addresses.

    If accept_ra is set to 1 and this is set to 0, router advertisements will be used for the gateway but not for IP address configuration locally.

    If this is enabled and use_tempaddr is not, stateless (SLAAC) addresses will be added to the interface but no privacy extensions addresses.

  • IPv6, Mostly

    TL;DR

    This article turned out longer than expected, so if you’re here for the bit on making it work and just want the TL;DR, skip to that bit!

    Introduction

    IPv4 has run out. This is not news.

    IPv6 uptake, however, has been slow given it was first created in 1995.

    It can be intimidating, if you’re not a network expert, and there are still things that are perceived to just work best if you’re doing IPv4.

    Examples include getting information about the network to a client, other than the client’s IP address.

    With IPv4, when you connect to a network, by default in most cases, your device asks the network for an IP address using a protocol called Dynamic Host Configuration Protocol (DHCP).

    But the protocol doesn’t just give your device an IP address; it can also let your device know about a bunch of other things on the network. Most commonly the DNS servers that will resolve names for you, but also things like NTP servers, proxy discovery, and more can be conveyed to the device using this mechanism.

    So IPv6 just needs a version of DHCP then? Well sure, and it does, cunningly named DHCPv6. But for lots of reasons, on a client network, you probably don’t want to be handing out IPv6 addresses with DHCPv6. IPv6 is more modern and can sort out its addresses statelessly and automatically using something called SLAAC. You probably want IPv6 to be stateless for a variety of reasons. Privacy is high up the list.

    Privacy?

    Let’s take a step back.

    With IPv4, the address space is tiny compared to the number of devices on the modern internet. Think about your phones, iPads, laptops, home assistant, smart speakers; the list goes on. Even a modest modern home has dozens of devices needing an IP address.

    So, there are blocks of IPv4 addresses reserved for use within private networks that will never be routed on the internet. Those private addresses can be routinely used in many networks concurrently because whilst IP addresses have to be unique, they only have to be unique within a given network.

    Your device gets one of those private addresses.

    But, how do you communicate on the internet if your device has an IP address that doesn’t work on the internet?

    Network Address Translation

    When your device makes a connection to something on the internet, your router does something called Network Address Translation (NAT).

    That is to say, on the inside of your network there will be plenty of those private addresses for all of the devices, but your router will map them all onto a much smaller pool of internet routable addresses and then juggle making sure the right internet traffic is sent to the right device.

    This means that it’s harder for websites to track you using your IP address, because a whole bunch of devices and users will be “hidden” behind a much smaller NAT pool, possibly just a single IP.

    With IPv6, the size of the address space is vast. There’s no need for NAT and so the IP address your machine has, is the IP that things you connect to on the internet see. For example when you browse a website.

    Instead of all devices on a network being hidden behind one IP, every individual device has a real internet IP address.

    This makes it a lot easier for websites to use your IP address to track you, if your IP address is now per device.

    So just how big is IPv6?

    Won’t we run out? We did with IPv4 after all…

    IPv4 has a total of 232 IP addresses. That’s 4,294,967,296. Four billion and change. Sounds like a lot, right?

    IPv6 has a total of 2128 IP addresses.

    That’s 340,282,366,920,938,463,463,374,607,431,768,211,456

    340 trillion trillion trillion.

    For context, that’s more than 100 times the number of atoms on the surface of the Earth1.

    It was calculated that we could assign a unique /48 to every human being on Earth for the next 480 years before we would run out.

    A /48 is the expected common assignment block size. It’s what my ISP has assigned me for my home network. It’s the smallest block of IPv6 address space that you can route on the wider internet.

    Your typical individual network subnet will be assigned a /64.

    A /48 contains 65,536 /64 subnets.

    A /64 network contains 18,446,744,073,709,551,616 usable IPv6 addresses.

    Back to privacy, then…

    So, IPv6 has the concept of Privacy Extensions (RFC8981) which means that it has a mechanism for your device to assign itself new IP addresses temporarily, use them for a while, and then discard them. But it requires that your network is assigning addresses statelessly.

    Router Advertisements

    If you’re doing stateless IPv6 addresses, and you want to move away from using IPv4, how do you tell your devices about things like DNS resolvers?

    In order to statelessly configure its IPv6 addresses, your device sends something called Router Solicitation (RS) messages to the network. Available routers on the network respond with Router Advertisement (RA) messages.

    These RAs include the network prefix(es) in use so SLAAC and privacy extensions can work. They can also be used to tell the client device whether to actually use stateless addresses, whether to try DHCPv6, or whether to do a bit of both, and use stateless addresses but with other information (like DNS servers) from DHCPv6.

    More recently, to avoid the need for DHCPv6 at all, these RAs can now contain the DNS server information (RDNSS).

    OK, so we’ll switch off IPv4 inside our networks? We’ll give all the clients an IPv6 address.

    We can use the RA to give devices DNS servers.

    But what about things on the internet that are still only configured for IPv4…?

    How do you connect to them?

    NAT64

    In a similar way to how NAT works above, translating networks of private addresses into a smaller pool of internet routable addresses, it’s possible to translate addresses between address families: ie: between IPv4 and IPv6.

    Your router can tell your device that the network has the ability to do NAT64.

    With this, devices connect to a mapped IPv6 address for any internet device that only has IPv4.

    There’s what’s called a well-known allocation for NAT64: 64:ff9b::/96

    This allows us to map the entire IPv4 address space. For example, let’s say your device wants to connect to a website that only has IPv4 and its address is 198.51.100.189. This maps to 64:ff9b::c633:64bd.

    If you look carefully at the end of that, you can see we have c633:64bd.

    c6 in hexadecimal is 198 in decimal.

    Converting the rest, then: 33 becomes 51, 64 becomes 100, and bd becomes 189.

    Your device makes an IPv6 connection to 64:ff9b::c633:64bd and when this reaches the router, it knows to translate this back the other way and make an IPv4 connection to 198.51.100.189.

    But how does the device know to attempt to connect to that mapped IPv6 address?

    DNS64

    One option is that you give them a DNS server that does DNS64.

    Usually, dual stack devices will prefer IPv6. As such, when they do DNS lookups to get the IP address for whatever the device needs to connect to, they will look up an IPv6 (AAAA) DNS record first, and fall back to IPv4 (A record).

    So, a DNS64 capable DNS server looks up the AAAA record. If it exists, it gives the response back to the device and everything works over IPv6.

    However, if the AAAA record doesn’t exist, the service only supports IPv4, the DNS64 server looks up the IPv4 A record, but instead of telling the requesting device, the DNS64 server generates the relevant mapped NAT64 IP address and returns that in the AAAA record.

    The device makes what it thinks is an IPv6 connection, and the NAT64 configuration on the router does the translation to IPv4.

    Google offer public DNS64 servers on 2001:4860:4860::64 and 2001:4860:4860::6464

    But, some components that run on your machine can decide to use DNS servers of their own choice. This can lead to some things not using a DNS64 server, not getting the mapped IP, and continuing to use IPv4 which may be undesirable.

    Customer Translator (CLAT)

    A second method is a customer translator or CLAT.

    Your device learns that the network can do NAT64 from the network itself. It learns what the prefix is and can do the mapping from the desired IPv4 address to IPv6 under the covers within the operating system.

    Further, your applications don’t need to know that they’re not making an IPv4 connection.

    Your device puts an unroutable placeholder IPv4 address on its network interface to keep up the pretence that IPv4 is working as usual.

    When your application attempts to make an IPv4 connection, the CLAT within your device works out the mapping, and makes the IPv6 connection to the mapped IPv6 address. The router does NAT64 and it all works seamlessly.

    This even works for IP literals.

    DHCP Option 108

    So, how do we persuade devices that the network supports IPv6, and can do NAT64 for those pesky “IPv4 only” things?

    Many devices still send out DHCP for IPv4 first.

    So there’s a mechanism for saying “if you support CLAT, can use the network’s NAT64, and don’t want or need an IPv4 DHCP lease”.

    It’s called “option 108”.

    Devices capable of IPv6 mostly, will set option 108 in their DHCP request. If the DHCP server replies with option 108, the client doesn’t take an IPv4 lease and applies the placeholder/unroutable IPv4 address to the interface.

    So, lets make it work…

    I wanted to tinker with this and see it in action.

    My router/firewall was a Juniper SRX240 which sadly will not run a new enough version of Junos to support RDNSS.

    It didn’t support telling the clients about the NAT64 prefix either, and it seemed very unhappy with me trying to persuade it to do DHCPv6 for just the DNS information.

    So, this felt like a great excuse to finally replace it, and so now I have a SRX340 in its place.

    The next thing to consider is client device support. My laptop is a MacBook, my family use iPhones and iPads. There’s the usual plethora of IoT things around the network too.

    Your mileage may vary if you have other things around your network.

    So I started with the guest VLAN as there’s less in there to disrupt.

    I want stateless addresses. I need to configure router advertisements anyway, so I’d like to skip needing DHCPv6 configured.

    Configuration time…

    NAT

    First we’ll configure NAT64 so that when we tell the clients it’s available, it actually is. We need to do two things here:

    1. We need to take traffic destined for the NAT64 prefix and static NAT it to IPv4. You can see this in the static NAT section at the top of the following configuration snippet.
    2. We need to then source NAT the traffic so that it comes from an internet routable IPv4 address. I’m fortunate here in that I have a /29 from my ISP, and so to aid troubleshooting, I have the NAT64 traffic SNAT from a different IP to the IPv4 native traffic.
    security {
        nat {
            static {
                rule-set nat64 {
                    from zone guest;
                    rule nat64 {
                        match {
                            source-address 2001:db8:97::/64;
                            destination-address 64:ff9b::/96;
                        }
                        then {
                            static-nat {
                                inet;
                            }
                        }
                    }
                }
            }
            source {
                pool guest {
                    address {
                        192.0.2.230/32;
                    }
                }
                pool guest-nat64 {
                    address {
                        192.0.2.229/32;
                    }
                }
                rule-set guest {
                    from zone guest;
                    to zone untrust;
                    rule guest {
                        match {
                            source-address 192.168.97.0/24;
                        }
                        then {
                            source-nat {
                                pool {
                                    guest;
                                }
                            }
                        }
                    }
                    rule guest-nat64 {
                        match {
                            source-address 2001:db8:97::/64;
                            destination-address 0.0.0.0/0;
                        }
                        then {
                            source-nat {
                                pool {
                                    guest-nat64;
                                }
                            }
                        }
                    }
                }
            }
        }
    }

    SLAAC, RDNSS and PREF64

    Next we need to tell the SRX to do SLAAC, RDNSS and PREF64…

    First, our interface needs an IPv6 address.

    interfaces {
        irb {
            unit 197 {
                description Guest;
                family inet {
                    address 192.168.97.1/24;
                }
                family inet6 {
                    address 2001:db8:97::1/64;
                }
            }
        }
    }

    Next, SLAAC, RDNSS and PREF64.

    Within the protocols router-advertisement section, we will use the following options:

    • dns-server-address option to set the RDNSS server(s)
    • nat-prefix to set the PREF64 prefix
    protocols {
        router-advertisement {
            interface irb.197 {
                preference high;
                max-advertisement-interval 20;
                min-advertisement-interval 3;
                other-stateful-configuration;
                solicit-router-advertisement-unicast;
                default-lifetime 9000;
                dns-server-address 2001:4860:4860::8888 {
                    lifetime 9000;
                }
                prefix 2001:db8:97::/64;
                nat-prefix 64:ff9b::/96 {
                    lifetime 18000;
                }
            }
        }
    }

    Option 108

    Lastly, we’ll support clients that support DHCP option 108 in our DHCP configuration.

    access {
        address-assignment {
            family inet {
                network 192.168.97.0/24;
                range r1 {
                    low 192.168.97.128;
                    high 192.168.97.191;
                }
                dhcp-attributes {
                    maximum-lease-time 9000;
                    router {
                        192.168.97.1;
                    }
                    propagate-settings irb.197;
                    option 108 unsigned-integer 9000;
                }
            }
        }
    }

    Did it work?

    This is from a MacBook running MacOS 15.

    We can see that the RA contains the configured details.

    $ ipconfig getra en4
    RA Received 04/25/2025 23:06:51 from fe80::cee1:9400:c55a:d430, length 96, hop limit 64, lifetime 9000s, reachable 0ms, retransmit 0ms, flags 0x48=[ other ], pref=high
    	source link-address option (1), length 8 (1): cc:e1:94:5a:d4:30
    	rdnss option (25), length 24 (3):  lifetime 9000s, addr: 2001:4860:4860::8888
    	prefix info option (3), length 32 (4):  2001:db8:97::/64, flags [ onlink auto ], valid time 2592000s, pref. time 604800s
    	pref64 option (38), length 16 (2): 64:ff9b::/96 lifetime 18000s

    We can see that the interface has stateless IPv6 configured, including privacy extensions. It has also enabled CLAT46, and has added the ‘placeholder’ IPv4 address.

    $ ifconfig en4
    en4: flags=88e3<UP,BROADCAST,SMART,RUNNING,NOARP,SIMPLEX,MULTICAST> mtu 1500
    	options=6464<VLAN_MTU,TSO4,TSO6,CHANNEL_IO,PARTIAL_CSUM,ZEROINVERT_CSUM>
    	ether a0:ce:c8:dd:5b:74
    	inet6 fe80::454:7b3a:963b:7fab%en4 prefixlen 64 secured scopeid 0xd
    	inet6 2001:db8:97:8ea:22c2:54cd:a385 prefixlen 64 autoconf secured
    	inet6 2001:db8:97:a4ba:e260:2aa1:59c0 prefixlen 64 deprecated autoconf temporary
    	inet6 2001:db8:97:479:67b7:1474:4197 prefixlen 64 deprecated autoconf temporary
    	inet 192.0.0.2 netmask 0xffffffff broadcast 192.0.0.2
    	inet6 2001:db8:97:e9bf:52f3:939c:6a03 prefixlen 64 autoconf temporary
    	inet6 2001:db8:97:2:c078:2867:d6d prefixlen 64 clat46
    	nat64 prefix 64:ff9b:: prefixlen 96
    	nd6 options=201<PERFORMNUD,DAD>
    	media: autoselect (1000baseT <full-duplex>)
    	status: active

    The device thinks it has an IPv4 connection to something in Microsoft…

    $ netstat -f inet -an|grep ESTAB|grep 192.0
    tcp4       0      0  192.0.0.2.51079        52.123.128.14.443      ESTABLISHED
    <snip>

    What does the firewall have to say about that….?

    kdyson@er> show security flow session destination-prefix 52.123.128.14
    Total sessions: 0

    …but 52.123.128.14 maps to 64:ff9b::347b:800e so lets check again…

    kdyson@er> show security flow session destination-prefix 64:ff9b::347b:800e
    Session ID: 115964145637, Policy name: web-browsing/10, Timeout: 1778, Session State: Valid
      In: 2001:db8:97:8ab:f46e:f285:917f/51062 --> 64:ff9b::347b:800e/443;tcp, Conn Tag: 0x0, If: irb.197, Pkts: 29, Bytes: 11041,
      Out: 52.123.128.14/443 --> 192.0.2.229/18910;tcp, Conn Tag: 0x0, If: pp0.2, Pkts: 24, Bytes: 8486,
    Total sessions: 1

    The IP details have been redacted/sanitised, of course, but note that the SNAT IP is the guest-nat64 pool IP and not the regular guest pool IP.

    The End!

    Well, that turned out a bit longer than expected and has a couple of tangents along the way. If you stuck around for all of it, well done and thanks.

    I hope it’s been useful or interesting!

    References

    1. Internet Society IPv6 FAQ ↩︎

    Further Reading

    https://www.ietf.org/archive/id/draft-link-v6ops-6mops-01.html

    https://2023.apricot.net/assets/files/APPS314/apnic55-deployingipv_1677492529.pdf

    https://blog.apnic.net/2019/06/07/how-to-slaac-dhcpv6-on-juniper-vsrx/

  • Nintendo Switch NAT Types

    Like lots of people, my daughter has a Nintendo Switch.

    A few weeks ago she came to me because a game she was trying to play online was complaining about the NAT type.

    So we had a look in the network connection test and found it was reporting NAT Type D.

    I have a Juniper SRX here (of course) and the Switch was just using the general outbound source NAT that looks a little like this (actual IPs redacted, naturally).

    > show configuration security nat source rule-set general rule general
    match {
        source-address 192.0.2.0/24;
    }
    then {
        source-nat {
            pool {
                general;
            }
        }
    }

    The pool is just a very basic pool with a simple address specified.

    > show configuration security nat source pool general
    address {
        198.51.100.100/32;
    }

    Some googling later and I find that this seems to be because port translation is in use.

    Seems the Nintendo does not like that.

    The Switch already has a static DHCP lease so it always gets the same IP.

    I’m fortunate to have a /29 from my ISP and I had an IP spare, so I created a new pool for the Nintendo with port translation disabled.

    > show configuration security nat source pool nintendo-switch
    address {
        198.51.100.101/32;
    }
    port {
        no-translation;
    }

    Good news. This gets the NAT Type up to B and the game started working.

    But this got me to thinking. What was needed for type A…?

    So, given nothing else was using the public IP, I altered the NAT configuration to a static NAT.

    > show configuration security nat static rule-set general rule nintendo-switch
    match {
        destination-address 198.51.100.101/32;
    }
    then {
        static-nat {
            prefix-name {
                nintendo-switch;
            }
        }
    }

    This, however, still results in type B.

    There seemed to be two obvious options remaining.

    1. Type A is actually “No NAT at all”
    2. Type A is “there’s effectively no firewalling

    Testing option 1 was quicker and simpler than 2.

    As much as I detest any-any type policies, the outbound policy all along for the Switch was a basic “allow the Switch outbound to the internet” policy.

    So I added an any-any inbound policy permitting anything inbound to the Switch.

    Bingo! Type A.

    So, that’s horrible.

    Given Type B is good enough for 99 point something percent of things, that any-any inbound policy was disabled as soon as the test completed.

    I will have to have a little faff with the network so that I can drop the Switch into a VLAN where I can give it a public IP directly so there’s actually no NAT at all, and see what it says then.

    I just thought this might be useful to someone.

    Thanks for reading.

  • Robin
  • A while back, I decided I wanted to prevent at least some adverts and tracking, but rather than on a device by device basis, I wanted to achieve this for all devices on the network. Those that know me and my recent work will understand that naturally, DNS blocking sprang to mind, as I’m already very familiar with RPZ.

    Originally, I was consuming a bunch of lists with some code, manipulating the entries with some weighting and then outputting an RPZ for my servers to use. However, more recently I found Energized Protect, which has a load of different levels of blocking, and they provide the different levels in a variety of formats, helpfully including RPZ. So, I’ve been trialling their lists for a couple of weeks now.

    As with any external feed, you need to be aware of either false positives being added to the list by the curator, as well as things they think should be on the list that you may disagree with. I was recently affected by this with my Amazon devices, whereby one or more domains critical to the correct functioning of the Echo devices found their way onto the block list I’m consuming. To be fair, I’m consuming one of the more extreme variants of the list, and so this was something I was aware could happen (although I admit, it didn’t spring straight to the front of my mind when troubleshooting over the weekend!).

    So, let’s talk about how this works.

    RPZ is a feature within some DNS servers that allows you to modify the responses given to clients depending on a number of different criteria. BIND from Internet Systems Consortium (ISC) was pretty much first to have RPZ, but others have varying levels of support for the main functionality. The BIND implementation allows you to define a policy that can consist of a number of layers. Within the policy you can override the entire contents of a layer, and within each layer you can have permit and deny actions based on a number of triggers. For this use case, we are interested in two of the triggers:

    • the name being looked up
    • the IP of the client making the request

    The file we download from Energized Protect will form the main blocking layer, and we’ll override the entire layer at the policy level with NXDOMAIN. Arguably we could send queries to a web server with a block page, but not all things on the requesting end of this are browsers, and we can get logging from the BIND servers if we want to know what was blocked for a given client for the purposes of troubleshooting. Of course, we will want to be able to override these entries incase something gets on the list that we don’t want to be affected by (see above).

    RPZ layers are DNS zone file format (see RFC1035 section 5 if you’re particularly interested in DNS master zone format, or for RPZ you can read the RFC draft (it’s not made it to a full RFC yet…)).

    Because they’re DNS zone files, they can be transferred to other DNS servers using the normal notify and transfer mechanisms.

    On my network here, there’s a central authoritative server, and then a pair of recursive servers that deal with actual client requests. I’ll get around to writing about the anycast set up of those in another article.

    For the purposes of this article, the authoritative master is on 192.168.1.53, and the two slaves that are actually dealing with the client recursion are on 192.168.1.51 and 192.168.1.52.

    Central Authoritative Server

    We’ll start with the central authoritative server. There are two bits to this, periodically fetching the RPZ, and serving it to the slave servers.

    All of the scripts I talk about below, can be found in the Bitbucket repository. The code is fairly straight foward, but of course, drop me a line if you have questions.

    Energized Protect update their feeds every 6 hours, and so there’s no need to poll them any more often than that. Further, the updateblockrpz script keeps an unchanged copy of the downloaded file so that wget can do timestamping and only download the file if it has actually changed on the server.

    There are two further scripts, both of which allow you to manipulate an override layer in the policy. The first, rpz-override, allows you to add and remove domains from the override, either to add things you want to block, or allow things blocked in the block layer. The second script, rpz-override-client, allows you to base the action on the client IP instead of on the queried name. Both of these are written in Perl, and more specifically are built on the Net::DNS module to send the changes into the server via a dynamic update.

    Next, let’s look at how we configure the server. A base understanding of BIND configuration is assumed.

    First, we’ll need to config it to master the two zones, permit dynamic updates on the override zone, and permit slaves to transfer them. Depending on your distro, the location of your named.conf may vary, and also whether it’s a single file or split out with includes. I’ll just include generic config here to try and cover as many bases as possible.

    zone "block" {
    	type master;
    	file "rpz/block";
    	notify explicit;
    	also-notify {
    		192.168.1.51;
    		192.168.1.52;
    	};
    };
    
    zone "override" {
    	type master;
    	file "rpz/override";
    	notify explicit;
    	also-notify {
    		192.168.1.51;
    		192.168.1.52;
    	};
    	allow-update { 127.0.0.1; ::1; };
    };
    

    Normal rules apply here; config like also-notify can inherit from the main options section, or can be overridden per zone like we have done here (line 4 to force just the specific entries listed in lines 5-7). We do the same again with the override zone (lines 14 & 15-17), but here we also add the allow-update (line 19), in order to permit the maintenance scripts to work. If your main options section has allow-update specified, you will need to specify allow-update { none; }; in addition for the block zone, to prevent BIND from keeping journals for the zone. If you need other config that will lead to journals, such as ixfr-from-differences, for example, then the updateblockrpz script may need a tweak to freeze and thaw the block zone instead of just reloading the update.

    I run the updateblockrpz script from cron at a randomly selected minute after the hour, every 6 hours and lazily capture the output to a tmp file for troubleshooting purposes. Yes, I should likely update this to log properly!

    17 */6 * * * /usr/local/bin/updateblockrpz >/tmp/updateblockrpz.tmp

    Slave Servers

    Having got the RPZ zones set up on the master, we can turn our attention to the slaves that are actually handling the queries from the clients on the network.

    First, we’ll slave the RPZ zones from the master:

    masters rpzmasters { 192.168.1.53; };
    zone "block" {
        type slave;
        file "rpz/block";
        masters { rpzmasters; };
    };
    zone "override" {
        type slave;
        file "rpz/override";
        masters { rpzmasters; };
    };
    

    …and next, we’ll define the policy that’ll apply to the clients:

    options {
    ...
    	response-policy {
    		zone "override" policy given;
    		zone "block" policy nxdomain;
    	}
    		break-dnssec yes
    		qname-wait-recurse no
    		max-policy-ttl 900
    	;
    ...
    };
    

    As we mentioned before, we’re overriding the block layer at the policy level, forcing anything in that layer to result in a NXDOMAIN response. The override layer is left as given so that the actions in the layer carry. The policy is evaluated top to bottom, with the first action encountered causing an exit from policy, hence the override layer, which could be whitelisting something that’s in the block layer, is listed first.

    RPZ Entries

    Lastly, we’ll just briefly cover different types of record that you might want to put in the override layer; the scripts will help you mostly with this, but for those that are interested, here’s a little more detail.

    Broadly, as we discussed earlier, we’re interested in two main triggers; the name being looked up, and the client making the query.

    Entries that affect the domain name being looked up broadly look like this:

    some.domain.name.override. 300 IN CNAME <action>.

    Where <action> is one of the following:

    • rpz-passthru (whitelist)
    • rpz-drop (drop the query – quite unfriendly, will cause the client to wait for a timeout)
    • . (a literal dot, which will cause a NXDOMAIN response)

    It’s also possible to do something like this, if you want to override to a block page or honeypot, for example:

    some.domain.name.override. 300 IN A 192.168.0.1

    …and of course, any of those can be prefixed with *. to cause the action to apply to everything within the bailiwick of some.domain.name.

    Entries that affect the client look a little different. Firstly, they’re reversed, a bit like in-addr.arpa zones but they’re prefixed by an additional item specifying the CIDR notation. So, if you want to (using the actions from above) whitelist all queries from single IP 192.168.58.3, you’d do:

    32.3.58.168.192.rpz-client-ip.override. 300 IN CNAME rpz-passthru.

    However, if you wanted to block the upper /25, you’d do this (note use of the subnet IP, you need to specify the correct subnet boundary IP):

    25.128.58.168.192.rpz-client-ip.override. 300 IN CNAME rpz-passthru.

    Other Trigger Types

    We’ve not talked about the other triggers, but briefly, you can also trigger actions based on:

    1. rpz-ip – the IP addresses that are returned in the answer to a query.
    2. rpz-nsdname – the domain name of the nameservers that are authoritative for the domain in the query.
    3. rpz-nsip – the IP addresses of the nameservers that are authoritative for the domain in the query (ie: what the names in (2) resolve to).

    Type 1 can lead to data exfiltration, which, if you’re blocking a domain because you want to prevent exfiltration, defeats the object. If you put type 1 or type 3 in a layer, then if BIND reaches that layer as it works through the policy, it will do the recursion to the authority for the zone in order to work out if the trigger is a match. If you’re worried about data exfiltration, you MUST put the domains you’re blocking for that purpose in a RPZ layer above the first layer that includes type 1 or type 3 entries, then BIND will execute your configured action without any recursion.

    …but what about DNSSEC

    If you’ve read all that, and you’re thinking to yourself “hey, but surely returning modified answers will break DNSSEC” then you’re right. Your client machine stub-resolvers will trust your DNS resolver, and so won’t notice, but if you’re pointing a validating resolver at this setup, you’ll need to make sure you keep the break-dnssec yes; option I included above. Possibly counter-intuitively, this causes your RPZ server to lie to the downstream validating resolver. If baddomain.com is DNSSEC signed, and is on your block list, the downstream validating resolver will usually be sending queries with CD set instead of trusting your validation, expecting your server to send all the required DS, DNSKEY, etc. with break-dnssec yes; the RPZ server will lie; it’ll pretend baddomain.com isn’t signed and will strip all DNSSEC data in responses to the downstream resolver(s).

    It’s important to note that this has an edge case. Let’s imagine you have gooddomain.com, which is signed, and is not being modified by your policy at all. Now let’s imagine you have badthing.gooddomain.com which is not at a zone split boundary, and is just a regular non-delegation entry in gooddomain.com. If you add badthing.gooddomain.com specifically to your RPZ for modification, the server can’t deal with lying about just that entry, and the downstream validator will spot the lie, returning SERVFAIL to its downstream client(s).