Building a Router from Scratch with Debian, Part 2

2024-06-06

21 min read

Welcome back to part 2 of our series on building a router from scratch with Debian! In this part, we'll delve into setting up VLANs on our custom-built router, configuring Wireguard for secure remote access, and discussing the ARP kernel flags we touched on last time.

Setting Up VLANs

VLAN was one of the big reasons I ditched my old router. VLANs let you split a physical network into several logical ones, allowing you to set different firewall rules for each group. Depending on your setup, a VLAN can be totally isolated from all other networks, or it can interconnect with everything else just for better grouping.

In my setup, I aimed for the following subnets:

  • Main LAN: The daily-use network with ad blocking, local services, and the ability to connect to all other subnets.
  • Guest Network: public Internet access but nothing else. Unfortunately I haven't set it yet because we don't have a lot of guests dropping by. I might make it IPv4-only.
  • IPv4-only IoT Subnet: Isolated from all other networks.
  • IPv6-only IoT Subnet: Also isolated and reserved for future IPv6-compatible devices.

I chose to have separate IoT subnets for IPv4 and IPv6 because I'm a big advocate for IPv6 and aim to transition fully to it. The daily use subnet may need IPv4 to communicate with the good old IPv4 world, but my IoT LAN isn't going to have outbound access, so I can merrily keep it IPv6 only. Unfortunately, not many IoT devices support IPv6 yet. Currently my only two cameras only support IPv4, and there's no plan to support IPv6 according to the manufacturer. So, I had to configure an IPv4-only subnet, meanwhile my IPv6 subnet is just sitting there empty.

Now comes a question: should I assign Global Unicast Addresses or Unique Local Addresses to the IoT subnet? I chose ULAs because I don't have stable GUA prefixes. I need to connect to the devices using their IPs from my servers. But if I had a static prefix, I would have gone the GUA approach as that is what IPv6 is supposed to be like.

That said, let's get into the VLAN configuration. Here it's getting a little tricky: I am setting VLANs on a Linux bridge. On a regular commercial router as we discussed in part 1, the VLANs are set on the single physical port. Look up VLAN bridge online and you'll see a ton of bugs being discussed.

But it turns out combining VLANs with a bridge is not that hard, at least in my simple network. There are a few ways of doing it, mostly manually setting up VLANs using ip link and brctl or bridge, but I didn't use any of them. I simply configured the VLANs in the file /etc/network/interfaces, done. Debian was handling the VLANs in a way I don't fully understand, but it surely works.

Here are the configurations I am using:

# /etc/network/interfaces

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface (WAN)
allow-hotplug enp1s0
iface enp1s0 inet dhcp
iface enp1s0 inet6 auto

# Define bridge interface
auto br0
iface br0 inet static
  bridge_ports enp2s0 enp3s0 enp4s0 enp5s0 enp7s0
  address 192.168.0.1
  netmask 255.255.255.0

# Bring up bridge ports with the bridge
  up ip link set enp2s0 up
  up ip link set enp3s0 up
  up ip link set enp4s0 up
  up ip link set enp5s0 up
  up ip link set enp7s0 up

# Bring bridge interface up
  up ip link set br0 up

# IPv6 GUA (managed by dhclient)
iface br0 inet6 manual

# IPv6 ULA
iface br0 inet6 static
  address fd00::1
  netmask 64

# br0.2 reserved for guest network

# IPv4-only IoT VLAN interface
auto br0.3
iface br0.3 inet static
  address 192.168.2.1
  netmask 255.255.255.0
  vlan_raw_device br0

# IPv6-only IoT VLAN interface
auto br0.4
iface br0.4 inet6 static
  address fd10::1
  netmask 64
  vlan_raw_device br0

Again, I masked the ULA address to fd00::1 and fd10::1 for privacy. Always generate a random subnet instead of using an arbitrary ULA prefix like fd00::/64.

Next, configure the dhcpd and radvd configs.

# /etc/default/isc-dhcp-server

INTERFACESv4="br0 br0.3"
INTERFACESv6=""
# /etc/dhcp/dhcpd.conf

subnet 192.168.0.0 netmask 255.255.255.0 {
    range 192.168.0.1 192.168.0.254;
    option routers 192.168.0.1;
    option subnet-mask 255.255.255.0;
    option domain-name-servers 192.168.0.1;
}

subnet 192.168.2.0 netmask 255.255.255.0 {
    range 192.168.2.1 192.168.2.254;
    option routers 192.168.2.1;
    option subnet-mask 255.255.255.0;
}
# /etc/radvd.conf

interface br0
{
    AdvSendAdvert on;

    # GUA Prefix, dynamically updated by dhcp6c
    prefix ::/64 {
        AdvOnLink on;
        AdvAutonomous on;
        AdvRouterAddr on;
    };

    # ULA Prefix
    prefix fd00::/64 {
        AdvOnLink on;
        AdvAutonomous on;
        AdvRouterAddr on;
    };

    RDNSS fd00::1 {
    };
};

interface br0.4
{
    AdvSendAdvert on;
    prefix fd10::/64 {
        AdvOnLink on;
        AdvAutonomous on;
        AdvRouterAddr on;
    };
};

Restart networking, isc-dhcp-server, and radvd. The VLANs should be up and running.

2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
    inet 98.109.32.213/24 brd 98.109.32.255 scope global dynamic enp1s0
       valid_lft 5541sec preferred_lft 5541sec
    inet6 fe80::xxxx:xxff:fexx:xxxx/64 scope link
       valid_lft forever preferred_lft forever
8: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.1/24 brd 192.168.0.255 scope global br0
       valid_lft forever preferred_lft forever
    inet6 2600:4041:449d:1801:xxxx:xxff:fexx:xxxx/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fd00::1/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::xxxx:xxff:fexx:xxxx/64 scope link
       valid_lft forever preferred_lft forever
9: br0.3@br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
    inet 192.168.2.1/24 brd 192.168.2.255 scope global br0.3
       valid_lft forever preferred_lft forever
    inet6 fe80::xxxx:xxff:fexx:xxxx/64 scope link
       valid_lft forever preferred_lft forever
10: br0.4@br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
    inet6 fd10::1/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::xxxx:xxff:fexx:xxxx/64 scope link
       valid_lft forever preferred_lft forever

Then I set the access point to broadcast 3 SSIDs corresponding to each VLAN using the Unifi Network Application.

Finally, here's how I configured the iptables rules to manage VLANs:

#!/usr/bin/env bash

################## Constants ##################

wan=enp1s0
main_lan=br0
iot4_lan=br0.3
iot6_lan=br0.4
home_ipv4=192.168.0.0/24
home_ula=fd00::/64
iot_ipv4=192.168.2.0/24
iot_ipv6=fd10::/64


################## Default policies ##################

# forward DROP, input ACCEPT, output ACCEPT
# If set input to DROP, docker containers will stop working

iptables -P FORWARD DROP
ip6tables -P FORWARD DROP


#################### DHCP rules #####################

# Allow DHCP & DHCPv6
iptables -A INPUT -i ${wan} -p udp --dport 68 --sport 67 -j ACCEPT
ip6tables -A INPUT -i ${wan} -p udp --dport 546 --sport 547 -j ACCEPT


#################### ICMP rules #####################

# Allow ICMP Echo-Request (Ping)
iptables -A INPUT -p icmp --icmp-type echo-request -i ${wan} -j ACCEPT
iptables -A INPUT -p icmp --icmp-type echo-reply -i ${wan} -j ACCEPT
# Essential for proper path MTU discovery
iptables -A INPUT -p icmp --icmp-type fragmentation-needed -j ACCEPT
# Other useful messages:
iptables -A INPUT -p icmp --icmp-type time-exceeded -j ACCEPT
iptables -A INPUT -p icmp --icmp-type destination-unreachable -j ACCEPT


################## ICMPv6 rules ###################

ip6tables -N icmpv6-forward
ip6tables -N icmpv6-input

# Isolate IoT VLAN from others
ip6tables -A FORWARD -p ipv6-icmp -i ${iot6_lan} -o ${iot6_lan} -j icmpv6-forward
ip6tables -A FORWARD -p ipv6-icmp -i ${iot6_lan} -j DROP
ip6tables -A INPUT -p ipv6-icmp -i ${iot6_lan} -d ${iot_ipv6} -j icmpv6-input
ip6tables -A INPUT -p ipv6-icmp -i ${iot6_lan} -j DROP

# Allow full access between main LAN
ip6tables -A FORWARD -p ipv6-icmp -i ${main_lan} -o ${main_lan} -j ACCEPT
ip6tables -A INPUT -p ipv6-icmp -i ${main_lan} -j ACCEPT

# Other ICMPv6 traffic follows RFC 4890
ip6tables -A FORWARD -p icmpv6 -j icmpv6-forward
ip6tables -A INPUT -p icmpv6 -j icmpv6-input

# FORWARD chain rules

# Traffic That Must Not Be Dropped
ip6tables -A icmpv6-forward -p ipv6-icmp --icmpv6-type destination-unreachable -j ACCEPT  # Destination Unreachable (Type 1) - All codes
ip6tables -A icmpv6-forward -p ipv6-icmp --icmpv6-type packet-too-big -j ACCEPT  # Packet Too Big (Type 2)
ip6tables -A icmpv6-forward -p ipv6-icmp --icmpv6-type 3/0 -j ACCEPT  # Time Exceeded (Type 3) - Code 0
ip6tables -A icmpv6-forward -p ipv6-icmp --icmpv6-type 4/1 -j ACCEPT  # Parameter Problem (Type 4) - Code 1
ip6tables -A icmpv6-forward -p ipv6-icmp --icmpv6-type 4/2 -j ACCEPT  # Parameter Problem (Type 4) - Code 2

# Connectivity checking messages
ip6tables -A icmpv6-forward -p ipv6-icmp --icmpv6-type echo-request -j ACCEPT  # Echo Request (Type 128)
ip6tables -A icmpv6-forward -p ipv6-icmp --icmpv6-type echo-reply -j ACCEPT  # Echo Response (Type 129)

# Traffic That Normally Should Not Be Dropped
ip6tables -A icmpv6-forward -p ipv6-icmp --icmpv6-type 3/1 -j ACCEPT  # Time Exceeded (Type 3) - Code 1
ip6tables -A icmpv6-forward -p ipv6-icmp --icmpv6-type 4/0 -j ACCEPT  # Parameter Problem (Type 4) - Code 0

# Drop everything else
ip6tables -A icmpv6-forward -p icmpv6 -j DROP

# INPUT chain rules

# Traffic That Must Not Be Dropped
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type destination-unreachable -j ACCEPT  # Destination Unreachable (Type 1) - All codes
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type packet-too-big -j ACCEPT  # Packet Too Big (Type 2)
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type 3/0 -j ACCEPT  # Time Exceeded (Type 3) - Code 0
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type 4/1 -j ACCEPT  # Parameter Problem (Type 4) - Code 1
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type 4/2 -j ACCEPT  # Parameter Problem (Type 4) - Code 2

# Connectivity checking messages
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type echo-request -j ACCEPT  # Echo Request (Type 128)
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type echo-reply -j ACCEPT  # Echo Response (Type 129)

# Address Configuration and Router Selection messages
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type router-solicitation -j ACCEPT  # Router Solicitation (Type 133)
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type router-advertisement -j ACCEPT  # Router Advertisement (Type 134)
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type neighbor-solicitation -j ACCEPT  # Neighbor Solicitation (Type 135)
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type neighbor-advertisement -j ACCEPT  # Neighbor Advertisement (Type 136)

# Link-Local Multicast Receiver Notification messages
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type 130 -j ACCEPT  # Listener Query (Type 130)
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type 131 -j ACCEPT  # Listener Report (Type 131)
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type 132 -j ACCEPT  # Listener Done (Type 132)
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type 143 -j ACCEPT  # Listener Report v2 (Type 143)

# SEND Certificate Path Notification messages
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type 148 -j ACCEPT  # Certificate Path Solicitation (Type 148)
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type 149 -j ACCEPT  # Certificate Path Advertisement (Type 149)

# Multicast Router Discovery messages
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type 151 -j ACCEPT  # Multicast Router Advertisement (Type 151)
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type 152 -j ACCEPT  # Multicast Router Solicitation (Type 152)
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type 153 -j ACCEPT  # Multicast Router Termination (Type 153)

# Traffic That Normally Should Not Be Dropped
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type 3/1 -j ACCEPT  # Time Exceeded (Type 3) - Code 1
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type 4/0 -j ACCEPT  # Parameter Problem (Type 4) - Code 0

# Drop everything else
ip6tables -A icmpv6-input -p icmpv6 -j DROP


################## Forward rules ##################

# NAT for outgoing v4 traffic
iptables -t nat -A POSTROUTING -o ${wan} -s ${home_ipv4} -j MASQUERADE

# Allow within main LAN
iptables -A FORWARD -i ${main_lan} -o ${main_lan} -j ACCEPT
ip6tables -A FORWARD -i ${main_lan} -o ${main_lan} -j ACCEPT

# Allow within IoT network
iptables -A FORWARD -i ${iot4_lan} -o ${iot4_lan} -j ACCEPT
ip6tables -A FORWARD -i ${iot6_lan} -o ${iot6_lan} -j ACCEPT

# Allow from main to IOT
iptables -A FORWARD -i ${main_lan} -o ${iot4_lan} -j ACCEPT
iptables -A FORWARD -i ${iot4_lan} -o ${main_lan} -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
ip6tables -A FORWARD -i ${main_lan} -o ${iot6_lan} -j ACCEPT
ip6tables -A FORWARD -i ${iot6_lan} -o ${main_lan} -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT

# Allow main LAN to WAN
iptables -A FORWARD -i ${main_lan} -o ${wan} -j ACCEPT
ip6tables -A FORWARD -i ${main_lan} -o ${wan} -j ACCEPT

# Only allow WAN to LAN established traffic
iptables -A FORWARD -i ${wan} -o ${main_lan} -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
ip6tables -A FORWARD -i ${wan} -o ${main_lan} -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT


################## Input rules ##################

# Only allow established connections from WAN
iptables -A INPUT -i ${wan} -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
iptables -A INPUT -i ${wan} -j DROP
ip6tables -A INPUT -i ${wan} -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
ip6tables -A INPUT -i ${wan} -j DROP

# Accept all incoming traffic on main LAN interface (main LAN has full access)
iptables -A INPUT -i ${main_lan} -j ACCEPT
ip6tables -A INPUT -i ${main_lan} -j ACCEPT

# Only accept intra-subnet traffic on IoT LANs
iptables -A INPUT -i ${iot4_lan} -d ${iot_ipv4} -j ACCEPT
iptables -A INPUT -i ${iot4_lan} -j DROP
ip6tables -A INPUT -i ${iot6_lan} -d ${iot_ipv6} -j ACCEPT
ip6tables -A INPUT -i ${iot6_lan} -j DROP

Now the rules look much messier. Feel free to ping me if you find any loopholes in this script because even I am not sure such a complicated thing will work 100% as intended. Basically, what I am trying to achieve is:

  1. DROP all inputs and forwards from the WAN unless it's established.
  2. ACCEPT necessary ICMP and ICMPv6 messages, per RFC 4890, but only to the main LAN. IoT subnets should still be isolated from anything else.

This means you are able to ping both my router and my LAN devices from the public Internet on IPv6. Sounds scary, but it's not.

  1. Only ACCEPT traffic within both IoT VLANs. Actually it's okay if they two can interconnect or not, but IPv4 doesn't talk to IPv6 anyway.
  2. Main LAN has full access.

Reboot the router, and everything should work perfectly!

Wireguard

Every homelabber needs a VPN for secure remote access to their home network. These days Wireguard is the go-to modern VPN. It is easy to configure and provides way better performance than OpenVPN and IPsec. There's also a great service named Tailscale that does the UDP punching for you before initiating the connection, so the user can establish a connection between two devices behind NATs. I am a straight Wireguard cynic, though. Hole punching sounds like an ugly workaround when I can connect either via a VPS relay or via IPv6. Setting up DDNS and direct Wireguard connection deserves a separate post. For now, let's focus on setting up a VPS wireguard relay.

Here's a wonderful tutorial from whynot.guide that I followed when I started my journey of Wireguard. It's easy to follow (unlike my posts haha). Read it if you don't know how to use Wireguard because I am not going to explain it.

Okay, I assume you know how to work with Wireguard at this point. Here's my setup:

  1. Install Wireguard on all 3 machines.
  2. I chose 192.168.222.0/24 and fdaa::/64 as the subnets Wireguard will operate on.
  3. The VPS holds the .1 and ::1 IPs of the subnets; the router holds .2 and ::2; let's assume the laptop holds .3 and ::3. So when I send a packet to some device like say 192.168.0.210, the source packet will be 192.168.222.3, routing through the VPS to 192.168.222.2, then to 192.168.0.0/24. No NAT is done in this case.

My setup differs from the whynot guide in that I don't use a NAT. It's good to see 192.168.222.3 as the source address instead of 192.168.0.1. Much clearer and more friendly to debugging.

The VPS listens and forwards the packets between the laptop and the router. (The VPS can also initiate connections to the LAN, that's how I reverse proxy the services running on my app server.)

[Interface]
Address = 192.168.222.1/24, fdaa::1/64
ListenPort = 12345
PrivateKey = [redacted]

# Router
[Peer]
PublicKey = [redacted]
AllowedIPs = 192.168.222.2/32, fdaa::2, 192.168.0.0/24, 192.168.2.0/24, fd00::/64, fd10::/64

# Laptop
[Peer]
PublicKey = [redacted]
AllowedIPs = 192.168.222.3/32, fdaa::3

On the client, it's also simple. It connects to the VPS and sets the DNS to the router (which runs Pi-hole). Since the laptop is possibly behind a NAT, it needs to send packets to the VPS periodically to keep the NAT alive, in this case, once every 25 seconds. The client will send all the traffic targeting AllowedIPs to the tunnel instead of its physical network.

[Interface]
PrivateKey = [redacted]
Address = 192.168.222.3/24, fdaa::3/64
DNS = 192.168.0.1, fd00::1

# VPS
[Peer]
PublicKey = [redacted]
AllowedIPs = 192.168.222.2/32, fdaa::2, 192.168.0.0/24, 192.168.2.0/24, fd00::/64, fd10::/64
Endpoint = tongkl.com:12345
PersistentKeepalive = 25

And finally, the router. It sends only Wireguard-related traffic to the tunnel, and forward packets to its LANs.

[Interface]
Address = 192.168.222.2/24, fdaa::2/64
PrivateKey = [redacted]

PostUp = iptables -A FORWARD -i wg0 -o br0 -j ACCEPT
PostUp = iptables -A FORWARD -i br0 -o wg0 -j ACCEPT
PostUp = iptables -A FORWARD -i wg0 -o br0.3 -j ACCEPT
PostUp = iptables -A FORWARD -i br0.3 -o wg0 -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
PreDown = iptables -D FORWARD -i wg0 -o br0 -j ACCEPT
PreDown = iptables -D FORWARD -i br0 -o wg0 -j ACCEPT
PreDown = iptables -D FORWARD -i wg0 -o br0.3 -j ACCEPT
PreDown = iptables -D FORWARD -i br0.3 -o wg0 -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT

PostUp = ip6tables -A FORWARD -i wg0 -o br0 -j ACCEPT
PostUp = ip6tables -A FORWARD -i br0 -o wg0 -j ACCEPT
PostUp = ip6tables -A FORWARD -i wg0 -o br0.4 -j ACCEPT
PostUp = ip6tables -A FORWARD -i br0.4 -o wg0 -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
PreDown = ip6tables -D FORWARD -i wg0 -o br0 -j ACCEPT
PreDown = ip6tables -D FORWARD -i br0 -o wg0 -j ACCEPT
PreDown = ip6tables -D FORWARD -i wg0 -o br0.4 -j ACCEPT
PreDown = ip6tables -D FORWARD -i br0.4 -o wg0 -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT

[Peer]
PublicKey = [redacted]
AllowedIPs = 192.168.222.0/24, fdaa::/64
Endpoint = tongkl.com:28801
PersistentKeepalive = 25

After starting the services the tunnel is in good condition. Now I can reach any device home when I am at school.

A (Really) Weird Bug I encountered

For an entire month after building the router, we kept encountering a bizarre bug. When a device first joined the network (or after being inactive for a while), it couldn't access certain sites. Google and Amazon were fine, but Reddit and GitHub were problematic. After spending a minute or two madly pinging all the sites on the web, everything suddenly started working. We were driven crazy by this issue, but I couldn't find the reason. DHCP looked good; DNS looked good; iptables must be good because it connected eventually after initial struggle.

After finally learning to use tcpdump and getting hints from serverfault, I managed to uncover and resolve the issue before 3 a.m.

The problem stemmed from how Linux handles ARP (Address Resolution Protocol). By default, Linux can respond to ARP requests with addresses from any interface on the same subnet. According to kernel.org:

arp_filter - BOOLEAN
	1 - Allows you to have multiple network interfaces on the same
	subnet, and have the ARPs for each interface be answered
	based on whether or not the kernel would route a packet from
	the ARP'd IP out that interface (therefore you must use source
	based routing for this to work). In other words it allows control
	of which cards (usually 1) will respond to an arp request.

	0 - (default) The kernel can respond to arp requests with addresses
	from other interfaces. This may seem wrong but it usually makes
	sense, because it increases the chance of successful communication.
	IP addresses are owned by the complete host on Linux, not by
	particular interfaces. Only for more complex setups like load-
	balancing, does this behaviour cause problems.

Linux thinks the IPs are owned by the complete host, not by particular interfaces. Therefore, the kernel can respond to arp requests with addresses from other interfaces. Normally this won't cause an issue, but I happened to have another MacVLAN interface configured because I need to communicate with a MacVLAN docker container. The MacVLAN interface has the IP 192.168.0.5, the same subnet as br0, which holds 192.169.0.1. So when I tested the ARP replies using arping from my app server:

$ arping 192.168.0.1
ARPING 192.168.0.1 from 192.168.0.2 enp1s0
Unicast reply from 192.168.0.1 [82:43:49:xx:xx:xx]  0.676ms
Unicast reply from 192.168.0.1 [62:BC:10:xx:xx:xx]  0.699ms
^CSent 1 probes (1 broadcast(s))
Received 2 response(s)

The first MAC address belonged to br0 which was expected. The second reply from the MacVLAN interface was unwanted. It confused all the Windows clients and some mobile devices. Logs of iptables showed this weird behavior:

May 19 16:24:16 router kernel: RAW PREROUTING: IN=br0 OUT= PHYSIN=enp2s0 MAC=62:bc:10:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx SRC=192.168.0.123 DST=140.82.112.3 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=45696 DF PROTO=ICMP TYPE=8 CODE=0 ID=1000 SEQ=1
May 19 16:24:16 router kernel: MANGLE PREROUTING: IN=br0 OUT= PHYSIN=enp2s0 MAC=62:bc:10:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx SRC=192.168.0.123 DST=140.82.112.3 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=45696 DF PROTO=ICMP TYPE=8 CODE=0 ID=1000 SEQ=1
May 19 16:24:16 router kernel: NAT PREROUTING: IN=br0 OUT= PHYSIN=enp2s0 MAC=62:bc:10:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx SRC=192.168.0.123 DST=140.82.112.3 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=45696 DF PROTO=ICMP TYPE=8 CODE=0 ID=1000 SEQ=1
May 19 16:24:16 router kernel: MANGLE FORWARD: IN=br0 OUT=br0 PHYSIN=enp2s0 PHYSOUT=enp3s0 MAC=62:bc:10:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx SRC=192.168.0.123 DST=140.82.112.3 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=45696 DF PROTO=ICMP TYPE=8 CODE=0 ID=1000 SEQ=1

What the heck. After NAT PREROUTING the router decided that the packet should go to enp3s0, another LAN port under br0, instead of forwarding them to enp1s0, the WAN port!!

Here's what I caught listening on enp3s0:

$ sudo tcpdump -i enp3s0 -vvvnn icmp
tcpdump: listening on enp3s0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
16:06:45.536644 IP (tos 0x0, ttl 128, id 25306, offset 0, flags [none], proto ICMP (1), length 60)
    192.168.0.123 > 140.82.112.3: ICMP echo request, id 1, seq 1, length 40
16:06:50.362736 IP (tos 0x0, ttl 128, id 25307, offset 0, flags [none], proto ICMP (1), length 60)
    192.168.0.123 > 140.82.112.3: ICMP echo request, id 1, seq 2, length 40

Crazy. After a closer look it turned out the packets were using the MAC address of the MacVLAN interface as the destination. That was why initially all the devices had a hard time connecting to the outside.

Here's what happened:

  1. The device joins the network, receives two MAC addresses as its ARP responses, and chooses the second one.
  2. The device attempts to connect to the public Internet but uses the wrong MAC address.
  3. The router receives the packets, and thinks they should go to br0 because the destination MAC address is local (an L2 routing decision).
  4. Because the only other active port is enp3s0, it sends it there.
  5. Of course the packet gets dropped by my app server.
  6. After a while, for some reason I am not aware of, the device finally realizes it should try the other MAC address.
  7. The new MAC address works.

So how to fix the problem? Well, I just set the following sysctl flags related to ARP:

net.ipv4.conf.all.arp_filter = 1
net.ipv4.conf.all.arp_announce = 1
net.ipv4.conf.all.arp_ignore = 1 (or 2)

You can refer to the kernel.org text for a detailed explanation, but long story short the flags tells the kernel to:

  • arp_filter: Only respond with the MAC address of the corresponding interface, rather than all the interfaces under that subnet.
  • arp_announce: Only send the reply packets on the corresponding interface.
  • arp_ignore: Ignore requests targeting a different IP from the IP the interface is holding, even if they are under the same subnet.

Here's a very nice example (credits to ChatGPT):

  • Setup:
    • eth0 - IP: 192.168.1.10
    • eth1 - IP: 192.168.1.20, both on network: 192.168.1.0/24
  • Scenario: An ARP request for 192.168.1.10 comes on eth1.
    • arp_filter = 1: No response from eth1, because 192.168.1.10 is local to eth0.
    • arp_announce = 1: eth0 will announce 192.168.1.10 on eth0 only, not on eth1.
    • arp_ignore = 1: eth1 will not respond because 192.168.1.10 is not its IP.

It seems arp_filter and arp_ignore have overlapping functions, but it won't harm to set all three flags to 1.

Conclusion

That's it! Our custom-built router is now fully operational, complete with VLANs, Wireguard, and properly configured ARP settings. If you have any questions or spot any issues, feel free to reach out!

Happy networking!