Building a Router from Scratch with Debian, Part 2
2024-06-06
21 min read
Welcome back to part 2 of our series on building a router from scratch with Debian! In this part, we'll delve into setting up VLANs on our custom-built router, configuring Wireguard for secure remote access, and discussing the ARP kernel flags we touched on last time.
Setting Up VLANs
VLAN was one of the big reasons I ditched my old router. VLANs let you split a physical network into several logical ones, allowing you to set different firewall rules for each group. Depending on your setup, a VLAN can be totally isolated from all other networks, or it can interconnect with everything else just for better grouping.
In my setup, I aimed for the following subnets:
Main LAN
: The daily-use network with ad blocking, local services, and the ability to connect to all other subnets.Guest Network
: public Internet access but nothing else.Unfortunately I haven't set it yet because we don't have a lot of guests dropping by.I might make it IPv4-only.IPv4-only IoT Subnet
: Isolated from all other networks.IPv6-only IoT Subnet
: Also isolated and reserved for future IPv6-compatible devices.
I chose to have separate IoT subnets for IPv4 and IPv6 because I'm a big advocate for IPv6 and aim to transition fully to it. The daily use subnet may need IPv4 to communicate with the good old IPv4 world, but my IoT LAN isn't going to have outbound access, so I can merrily keep it IPv6 only. Unfortunately, not many IoT devices support IPv6 yet. Currently my only two cameras only support IPv4, and there's no plan to support IPv6 according to the manufacturer. So, I had to configure an IPv4-only subnet, meanwhile my IPv6 subnet is just sitting there empty.
Now comes a question: should I assign Global Unicast Addresses or Unique Local Addresses to the IoT subnet? I chose ULAs because I don't have stable GUA prefixes. I need to connect to the devices using their IPs from my servers. But if I had a static prefix, I would have gone the GUA approach as that is what IPv6 is supposed to be like.
That said, let's get into the VLAN configuration. Here it's getting a little tricky: I am setting VLANs on a Linux bridge. On a regular commercial router as we discussed in part 1, the VLANs are set on the single physical port. Look up VLAN bridge
online and you'll see a ton of bugs being discussed.
But it turns out combining VLANs with a bridge is not that hard, at least in my simple network. There are a few ways of doing it, mostly manually setting up VLANs using ip link
and brctl
or bridge
, but I didn't use any of them. I simply configured the VLANs in the file /etc/network/interfaces
, done. Debian was handling the VLANs in a way I don't fully understand, but it surely works.
Here are the configurations I am using:
# /etc/network/interfaces
# The loopback network interface
auto lo
iface lo inet loopback
# The primary network interface (WAN)
allow-hotplug enp1s0
iface enp1s0 inet dhcp
iface enp1s0 inet6 auto
# Define bridge interface
auto br0
iface br0 inet static
bridge_ports enp2s0 enp3s0 enp4s0 enp5s0 enp7s0
address 192.168.0.1
netmask 255.255.255.0
# Bring up bridge ports with the bridge
up ip link set enp2s0 up
up ip link set enp3s0 up
up ip link set enp4s0 up
up ip link set enp5s0 up
up ip link set enp7s0 up
# Bring bridge interface up
up ip link set br0 up
# IPv6 GUA (managed by dhclient)
iface br0 inet6 manual
# IPv6 ULA
iface br0 inet6 static
address fd00::1
netmask 64
# br0.2 reserved for guest network
# IPv4-only IoT VLAN interface
auto br0.3
iface br0.3 inet static
address 192.168.2.1
netmask 255.255.255.0
vlan_raw_device br0
# IPv6-only IoT VLAN interface
auto br0.4
iface br0.4 inet6 static
address fd10::1
netmask 64
vlan_raw_device br0
Again, I masked the ULA address to
fd00::1
andfd10::1
for privacy. Always generate a random subnet instead of using an arbitrary ULA prefix like fd00::/64.
Next, configure the dhcpd
and radvd
configs.
# /etc/default/isc-dhcp-server
INTERFACESv4="br0 br0.3"
INTERFACESv6=""
# /etc/dhcp/dhcpd.conf
subnet 192.168.0.0 netmask 255.255.255.0 {
range 192.168.0.1 192.168.0.254;
option routers 192.168.0.1;
option subnet-mask 255.255.255.0;
option domain-name-servers 192.168.0.1;
}
subnet 192.168.2.0 netmask 255.255.255.0 {
range 192.168.2.1 192.168.2.254;
option routers 192.168.2.1;
option subnet-mask 255.255.255.0;
}
# /etc/radvd.conf
interface br0
{
AdvSendAdvert on;
# GUA Prefix, dynamically updated by dhcp6c
prefix ::/64 {
AdvOnLink on;
AdvAutonomous on;
AdvRouterAddr on;
};
# ULA Prefix
prefix fd00::/64 {
AdvOnLink on;
AdvAutonomous on;
AdvRouterAddr on;
};
RDNSS fd00::1 {
};
};
interface br0.4
{
AdvSendAdvert on;
prefix fd10::/64 {
AdvOnLink on;
AdvAutonomous on;
AdvRouterAddr on;
};
};
Restart networking
, isc-dhcp-server
, and radvd
. The VLANs should be up and running.
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
inet 98.109.32.213/24 brd 98.109.32.255 scope global dynamic enp1s0
valid_lft 5541sec preferred_lft 5541sec
inet6 fe80::xxxx:xxff:fexx:xxxx/64 scope link
valid_lft forever preferred_lft forever
8: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
inet 192.168.0.1/24 brd 192.168.0.255 scope global br0
valid_lft forever preferred_lft forever
inet6 2600:4041:449d:1801:xxxx:xxff:fexx:xxxx/64 scope global
valid_lft forever preferred_lft forever
inet6 fd00::1/64 scope global
valid_lft forever preferred_lft forever
inet6 fe80::xxxx:xxff:fexx:xxxx/64 scope link
valid_lft forever preferred_lft forever
9: br0.3@br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
inet 192.168.2.1/24 brd 192.168.2.255 scope global br0.3
valid_lft forever preferred_lft forever
inet6 fe80::xxxx:xxff:fexx:xxxx/64 scope link
valid_lft forever preferred_lft forever
10: br0.4@br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
inet6 fd10::1/64 scope global
valid_lft forever preferred_lft forever
inet6 fe80::xxxx:xxff:fexx:xxxx/64 scope link
valid_lft forever preferred_lft forever
Then I set the access point to broadcast 3 SSIDs corresponding to each VLAN using the Unifi Network Application.
Finally, here's how I configured the iptables rules to manage VLANs:
#!/usr/bin/env bash
################## Constants ##################
wan=enp1s0
main_lan=br0
iot4_lan=br0.3
iot6_lan=br0.4
home_ipv4=192.168.0.0/24
home_ula=fd00::/64
iot_ipv4=192.168.2.0/24
iot_ipv6=fd10::/64
################## Default policies ##################
# forward DROP, input ACCEPT, output ACCEPT
# If set input to DROP, docker containers will stop working
iptables -P FORWARD DROP
ip6tables -P FORWARD DROP
#################### DHCP rules #####################
# Allow DHCP & DHCPv6
iptables -A INPUT -i ${wan} -p udp --dport 68 --sport 67 -j ACCEPT
ip6tables -A INPUT -i ${wan} -p udp --dport 546 --sport 547 -j ACCEPT
#################### ICMP rules #####################
# Allow ICMP Echo-Request (Ping)
iptables -A INPUT -p icmp --icmp-type echo-request -i ${wan} -j ACCEPT
iptables -A INPUT -p icmp --icmp-type echo-reply -i ${wan} -j ACCEPT
# Essential for proper path MTU discovery
iptables -A INPUT -p icmp --icmp-type fragmentation-needed -j ACCEPT
# Other useful messages:
iptables -A INPUT -p icmp --icmp-type time-exceeded -j ACCEPT
iptables -A INPUT -p icmp --icmp-type destination-unreachable -j ACCEPT
################## ICMPv6 rules ###################
ip6tables -N icmpv6-forward
ip6tables -N icmpv6-input
# Isolate IoT VLAN from others
ip6tables -A FORWARD -p ipv6-icmp -i ${iot6_lan} -o ${iot6_lan} -j icmpv6-forward
ip6tables -A FORWARD -p ipv6-icmp -i ${iot6_lan} -j DROP
ip6tables -A INPUT -p ipv6-icmp -i ${iot6_lan} -d ${iot_ipv6} -j icmpv6-input
ip6tables -A INPUT -p ipv6-icmp -i ${iot6_lan} -j DROP
# Allow full access between main LAN
ip6tables -A FORWARD -p ipv6-icmp -i ${main_lan} -o ${main_lan} -j ACCEPT
ip6tables -A INPUT -p ipv6-icmp -i ${main_lan} -j ACCEPT
# Other ICMPv6 traffic follows RFC 4890
ip6tables -A FORWARD -p icmpv6 -j icmpv6-forward
ip6tables -A INPUT -p icmpv6 -j icmpv6-input
# FORWARD chain rules
# Traffic That Must Not Be Dropped
ip6tables -A icmpv6-forward -p ipv6-icmp --icmpv6-type destination-unreachable -j ACCEPT # Destination Unreachable (Type 1) - All codes
ip6tables -A icmpv6-forward -p ipv6-icmp --icmpv6-type packet-too-big -j ACCEPT # Packet Too Big (Type 2)
ip6tables -A icmpv6-forward -p ipv6-icmp --icmpv6-type 3/0 -j ACCEPT # Time Exceeded (Type 3) - Code 0
ip6tables -A icmpv6-forward -p ipv6-icmp --icmpv6-type 4/1 -j ACCEPT # Parameter Problem (Type 4) - Code 1
ip6tables -A icmpv6-forward -p ipv6-icmp --icmpv6-type 4/2 -j ACCEPT # Parameter Problem (Type 4) - Code 2
# Connectivity checking messages
ip6tables -A icmpv6-forward -p ipv6-icmp --icmpv6-type echo-request -j ACCEPT # Echo Request (Type 128)
ip6tables -A icmpv6-forward -p ipv6-icmp --icmpv6-type echo-reply -j ACCEPT # Echo Response (Type 129)
# Traffic That Normally Should Not Be Dropped
ip6tables -A icmpv6-forward -p ipv6-icmp --icmpv6-type 3/1 -j ACCEPT # Time Exceeded (Type 3) - Code 1
ip6tables -A icmpv6-forward -p ipv6-icmp --icmpv6-type 4/0 -j ACCEPT # Parameter Problem (Type 4) - Code 0
# Drop everything else
ip6tables -A icmpv6-forward -p icmpv6 -j DROP
# INPUT chain rules
# Traffic That Must Not Be Dropped
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type destination-unreachable -j ACCEPT # Destination Unreachable (Type 1) - All codes
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type packet-too-big -j ACCEPT # Packet Too Big (Type 2)
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type 3/0 -j ACCEPT # Time Exceeded (Type 3) - Code 0
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type 4/1 -j ACCEPT # Parameter Problem (Type 4) - Code 1
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type 4/2 -j ACCEPT # Parameter Problem (Type 4) - Code 2
# Connectivity checking messages
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type echo-request -j ACCEPT # Echo Request (Type 128)
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type echo-reply -j ACCEPT # Echo Response (Type 129)
# Address Configuration and Router Selection messages
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type router-solicitation -j ACCEPT # Router Solicitation (Type 133)
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type router-advertisement -j ACCEPT # Router Advertisement (Type 134)
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type neighbor-solicitation -j ACCEPT # Neighbor Solicitation (Type 135)
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type neighbor-advertisement -j ACCEPT # Neighbor Advertisement (Type 136)
# Link-Local Multicast Receiver Notification messages
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type 130 -j ACCEPT # Listener Query (Type 130)
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type 131 -j ACCEPT # Listener Report (Type 131)
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type 132 -j ACCEPT # Listener Done (Type 132)
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type 143 -j ACCEPT # Listener Report v2 (Type 143)
# SEND Certificate Path Notification messages
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type 148 -j ACCEPT # Certificate Path Solicitation (Type 148)
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type 149 -j ACCEPT # Certificate Path Advertisement (Type 149)
# Multicast Router Discovery messages
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type 151 -j ACCEPT # Multicast Router Advertisement (Type 151)
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type 152 -j ACCEPT # Multicast Router Solicitation (Type 152)
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type 153 -j ACCEPT # Multicast Router Termination (Type 153)
# Traffic That Normally Should Not Be Dropped
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type 3/1 -j ACCEPT # Time Exceeded (Type 3) - Code 1
ip6tables -A icmpv6-input -p ipv6-icmp --icmpv6-type 4/0 -j ACCEPT # Parameter Problem (Type 4) - Code 0
# Drop everything else
ip6tables -A icmpv6-input -p icmpv6 -j DROP
################## Forward rules ##################
# NAT for outgoing v4 traffic
iptables -t nat -A POSTROUTING -o ${wan} -s ${home_ipv4} -j MASQUERADE
# Allow within main LAN
iptables -A FORWARD -i ${main_lan} -o ${main_lan} -j ACCEPT
ip6tables -A FORWARD -i ${main_lan} -o ${main_lan} -j ACCEPT
# Allow within IoT network
iptables -A FORWARD -i ${iot4_lan} -o ${iot4_lan} -j ACCEPT
ip6tables -A FORWARD -i ${iot6_lan} -o ${iot6_lan} -j ACCEPT
# Allow from main to IOT
iptables -A FORWARD -i ${main_lan} -o ${iot4_lan} -j ACCEPT
iptables -A FORWARD -i ${iot4_lan} -o ${main_lan} -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
ip6tables -A FORWARD -i ${main_lan} -o ${iot6_lan} -j ACCEPT
ip6tables -A FORWARD -i ${iot6_lan} -o ${main_lan} -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
# Allow main LAN to WAN
iptables -A FORWARD -i ${main_lan} -o ${wan} -j ACCEPT
ip6tables -A FORWARD -i ${main_lan} -o ${wan} -j ACCEPT
# Only allow WAN to LAN established traffic
iptables -A FORWARD -i ${wan} -o ${main_lan} -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
ip6tables -A FORWARD -i ${wan} -o ${main_lan} -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
################## Input rules ##################
# Only allow established connections from WAN
iptables -A INPUT -i ${wan} -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
iptables -A INPUT -i ${wan} -j DROP
ip6tables -A INPUT -i ${wan} -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
ip6tables -A INPUT -i ${wan} -j DROP
# Accept all incoming traffic on main LAN interface (main LAN has full access)
iptables -A INPUT -i ${main_lan} -j ACCEPT
ip6tables -A INPUT -i ${main_lan} -j ACCEPT
# Only accept intra-subnet traffic on IoT LANs
iptables -A INPUT -i ${iot4_lan} -d ${iot_ipv4} -j ACCEPT
iptables -A INPUT -i ${iot4_lan} -j DROP
ip6tables -A INPUT -i ${iot6_lan} -d ${iot_ipv6} -j ACCEPT
ip6tables -A INPUT -i ${iot6_lan} -j DROP
Now the rules look much messier. Feel free to ping me if you find any loopholes in this script because even I am not sure such a complicated thing will work 100% as intended. Basically, what I am trying to achieve is:
- DROP all inputs and forwards from the WAN unless it's established.
- ACCEPT necessary ICMP and ICMPv6 messages, per RFC 4890, but only to the main LAN. IoT subnets should still be isolated from anything else.
This means you are able to ping both my router and my LAN devices from the public Internet on IPv6. Sounds scary, but it's not.
- Only ACCEPT traffic within both IoT VLANs. Actually it's okay if they two can interconnect or not, but IPv4 doesn't talk to IPv6 anyway.
- Main LAN has full access.
Reboot the router, and everything should work perfectly!
Wireguard
Every homelabber needs a VPN for secure remote access to their home network. These days Wireguard is the go-to modern VPN. It is easy to configure and provides way better performance than OpenVPN and IPsec. There's also a great service named Tailscale that does the UDP punching for you before initiating the connection, so the user can establish a connection between two devices behind NATs. I am a straight Wireguard cynic, though. Hole punching sounds like an ugly workaround when I can connect either via a VPS relay or via IPv6. Setting up DDNS and direct Wireguard connection deserves a separate post. For now, let's focus on setting up a VPS wireguard relay.
Here's a wonderful tutorial from whynot.guide
that I followed when I started my journey of Wireguard. It's easy to follow (unlike my posts haha). Read it if you don't know how to use Wireguard because I am not going to explain it.
Okay, I assume you know how to work with Wireguard at this point. Here's my setup:
- Install Wireguard on all 3 machines.
- I chose
192.168.222.0/24
andfdaa::/64
as the subnets Wireguard will operate on. - The VPS holds the
.1
and::1
IPs of the subnets; the router holds.2
and::2
; let's assume the laptop holds.3
and::3
. So when I send a packet to some device like say192.168.0.210
, the source packet will be192.168.222.3
, routing through the VPS to192.168.222.2
, then to192.168.0.0/24
. No NAT is done in this case.
My setup differs from the whynot
guide in that I don't use a NAT. It's good to see 192.168.222.3
as the source address instead of 192.168.0.1
. Much clearer and more friendly to debugging.
The VPS listens and forwards the packets between the laptop and the router. (The VPS can also initiate connections to the LAN, that's how I reverse proxy the services running on my app server.)
[Interface]
Address = 192.168.222.1/24, fdaa::1/64
ListenPort = 12345
PrivateKey = [redacted]
# Router
[Peer]
PublicKey = [redacted]
AllowedIPs = 192.168.222.2/32, fdaa::2, 192.168.0.0/24, 192.168.2.0/24, fd00::/64, fd10::/64
# Laptop
[Peer]
PublicKey = [redacted]
AllowedIPs = 192.168.222.3/32, fdaa::3
On the client, it's also simple. It connects to the VPS and sets the DNS to the router (which runs Pi-hole). Since the laptop is possibly behind a NAT, it needs to send packets to the VPS periodically to keep the NAT alive, in this case, once every 25 seconds. The client will send all the traffic targeting AllowedIPs
to the tunnel instead of its physical network.
[Interface]
PrivateKey = [redacted]
Address = 192.168.222.3/24, fdaa::3/64
DNS = 192.168.0.1, fd00::1
# VPS
[Peer]
PublicKey = [redacted]
AllowedIPs = 192.168.222.2/32, fdaa::2, 192.168.0.0/24, 192.168.2.0/24, fd00::/64, fd10::/64
Endpoint = tongkl.com:12345
PersistentKeepalive = 25
And finally, the router. It sends only Wireguard-related traffic to the tunnel, and forward packets to its LANs.
[Interface]
Address = 192.168.222.2/24, fdaa::2/64
PrivateKey = [redacted]
PostUp = iptables -A FORWARD -i wg0 -o br0 -j ACCEPT
PostUp = iptables -A FORWARD -i br0 -o wg0 -j ACCEPT
PostUp = iptables -A FORWARD -i wg0 -o br0.3 -j ACCEPT
PostUp = iptables -A FORWARD -i br0.3 -o wg0 -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
PreDown = iptables -D FORWARD -i wg0 -o br0 -j ACCEPT
PreDown = iptables -D FORWARD -i br0 -o wg0 -j ACCEPT
PreDown = iptables -D FORWARD -i wg0 -o br0.3 -j ACCEPT
PreDown = iptables -D FORWARD -i br0.3 -o wg0 -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
PostUp = ip6tables -A FORWARD -i wg0 -o br0 -j ACCEPT
PostUp = ip6tables -A FORWARD -i br0 -o wg0 -j ACCEPT
PostUp = ip6tables -A FORWARD -i wg0 -o br0.4 -j ACCEPT
PostUp = ip6tables -A FORWARD -i br0.4 -o wg0 -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
PreDown = ip6tables -D FORWARD -i wg0 -o br0 -j ACCEPT
PreDown = ip6tables -D FORWARD -i br0 -o wg0 -j ACCEPT
PreDown = ip6tables -D FORWARD -i wg0 -o br0.4 -j ACCEPT
PreDown = ip6tables -D FORWARD -i br0.4 -o wg0 -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
[Peer]
PublicKey = [redacted]
AllowedIPs = 192.168.222.0/24, fdaa::/64
Endpoint = tongkl.com:28801
PersistentKeepalive = 25
After starting the services the tunnel is in good condition. Now I can reach any device home when I am at school.
A (Really) Weird Bug I encountered
For an entire month after building the router, we kept encountering a bizarre bug. When a device first joined the network (or after being inactive for a while), it couldn't access certain sites. Google and Amazon were fine, but Reddit and GitHub were problematic. After spending a minute or two madly pinging all the sites on the web, everything suddenly started working. We were driven crazy by this issue, but I couldn't find the reason. DHCP looked good; DNS looked good; iptables must be good because it connected eventually after initial struggle.
After finally learning to use tcpdump
and getting hints from serverfault, I managed to uncover and resolve the issue before 3 a.m.
The problem stemmed from how Linux handles ARP (Address Resolution Protocol). By default, Linux can respond to ARP requests with addresses from any interface on the same subnet. According to kernel.org:
arp_filter - BOOLEAN
1 - Allows you to have multiple network interfaces on the same
subnet, and have the ARPs for each interface be answered
based on whether or not the kernel would route a packet from
the ARP'd IP out that interface (therefore you must use source
based routing for this to work). In other words it allows control
of which cards (usually 1) will respond to an arp request.
0 - (default) The kernel can respond to arp requests with addresses
from other interfaces. This may seem wrong but it usually makes
sense, because it increases the chance of successful communication.
IP addresses are owned by the complete host on Linux, not by
particular interfaces. Only for more complex setups like load-
balancing, does this behaviour cause problems.
Linux thinks the IPs are owned by the complete host, not by particular interfaces. Therefore, the kernel can respond to arp requests with addresses from other interfaces. Normally this won't cause an issue, but I happened to have another MacVLAN interface configured because I need to communicate with a MacVLAN docker container. The MacVLAN interface has the IP 192.168.0.5
, the same subnet as br0
, which holds 192.169.0.1
. So when I tested the ARP replies using arping
from my app server:
$ arping 192.168.0.1
ARPING 192.168.0.1 from 192.168.0.2 enp1s0
Unicast reply from 192.168.0.1 [82:43:49:xx:xx:xx] 0.676ms
Unicast reply from 192.168.0.1 [62:BC:10:xx:xx:xx] 0.699ms
^CSent 1 probes (1 broadcast(s))
Received 2 response(s)
The first MAC address belonged to br0
which was expected. The second reply from the MacVLAN interface was unwanted. It confused all the Windows clients and some mobile devices. Logs of iptables showed this weird behavior:
May 19 16:24:16 router kernel: RAW PREROUTING: IN=br0 OUT= PHYSIN=enp2s0 MAC=62:bc:10:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx SRC=192.168.0.123 DST=140.82.112.3 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=45696 DF PROTO=ICMP TYPE=8 CODE=0 ID=1000 SEQ=1
May 19 16:24:16 router kernel: MANGLE PREROUTING: IN=br0 OUT= PHYSIN=enp2s0 MAC=62:bc:10:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx SRC=192.168.0.123 DST=140.82.112.3 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=45696 DF PROTO=ICMP TYPE=8 CODE=0 ID=1000 SEQ=1
May 19 16:24:16 router kernel: NAT PREROUTING: IN=br0 OUT= PHYSIN=enp2s0 MAC=62:bc:10:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx SRC=192.168.0.123 DST=140.82.112.3 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=45696 DF PROTO=ICMP TYPE=8 CODE=0 ID=1000 SEQ=1
May 19 16:24:16 router kernel: MANGLE FORWARD: IN=br0 OUT=br0 PHYSIN=enp2s0 PHYSOUT=enp3s0 MAC=62:bc:10:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx SRC=192.168.0.123 DST=140.82.112.3 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=45696 DF PROTO=ICMP TYPE=8 CODE=0 ID=1000 SEQ=1
What the heck. After NAT PREROUTING
the router decided that the packet should go to enp3s0
, another LAN port under br0
, instead of forwarding them to enp1s0
, the WAN port!!
Here's what I caught listening on enp3s0
:
$ sudo tcpdump -i enp3s0 -vvvnn icmp
tcpdump: listening on enp3s0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
16:06:45.536644 IP (tos 0x0, ttl 128, id 25306, offset 0, flags [none], proto ICMP (1), length 60)
192.168.0.123 > 140.82.112.3: ICMP echo request, id 1, seq 1, length 40
16:06:50.362736 IP (tos 0x0, ttl 128, id 25307, offset 0, flags [none], proto ICMP (1), length 60)
192.168.0.123 > 140.82.112.3: ICMP echo request, id 1, seq 2, length 40
Crazy. After a closer look it turned out the packets were using the MAC address of the MacVLAN interface as the destination. That was why initially all the devices had a hard time connecting to the outside.
Here's what happened:
- The device joins the network, receives two MAC addresses as its ARP responses, and chooses the second one.
- The device attempts to connect to the public Internet but uses the wrong MAC address.
- The router receives the packets, and thinks they should go to
br0
because the destination MAC address is local (an L2 routing decision). - Because the only other active port is
enp3s0
, it sends it there. - Of course the packet gets dropped by my app server.
- After a while, for some reason I am not aware of, the device finally realizes it should try the other MAC address.
- The new MAC address works.
So how to fix the problem? Well, I just set the following sysctl
flags related to ARP:
net.ipv4.conf.all.arp_filter = 1
net.ipv4.conf.all.arp_announce = 1
net.ipv4.conf.all.arp_ignore = 1 (or 2)
You can refer to the kernel.org text for a detailed explanation, but long story short the flags tells the kernel to:
arp_filter
: Only respond with the MAC address of the corresponding interface, rather than all the interfaces under that subnet.arp_announce
: Only send the reply packets on the corresponding interface.arp_ignore
: Ignore requests targeting a different IP from the IP the interface is holding, even if they are under the same subnet.
Here's a very nice example (credits to ChatGPT):
- Setup:
eth0
- IP:192.168.1.10
eth1
- IP:192.168.1.20
, both on network:192.168.1.0/24
- Scenario: An ARP request for
192.168.1.10
comes oneth1
.arp_filter = 1
: No response frometh1
, because192.168.1.10
is local toeth0
.arp_announce = 1
:eth0
will announce192.168.1.10
oneth0
only, not oneth1
.arp_ignore = 1
:eth1
will not respond because192.168.1.10
is not its IP.
It seems arp_filter
and arp_ignore
have overlapping functions, but it won't harm to set all three flags to 1
.
Conclusion
That's it! Our custom-built router is now fully operational, complete with VLANs, Wireguard, and properly configured ARP settings. If you have any questions or spot any issues, feel free to reach out!
Happy networking!