Saturday, January 27, 2024

IP Masquerading with nftables

Why this is here

I realize I'm sorta late to the game writing this, but then I JUST finished sifting through all the half-a$$'d documentation and examples I could find on the topic and NOT ONE of them offered a clear explanation of what is going on with nftables, and what the parts of a masquerading setup mean.  So here I am, hoping I can do a better job, because I'm gonna forget how this works, and probably come back here later and want me to explain it to future-me.

My other motivation for putting this here is to get feedback in comments in case I don't have some part of this quite right.  If you see something here that is just wrong, please let me know.

The Task

Set up a Linux machine, to forward traffic coming in one network interface, to any of the hosts in the network to which another network interface is attached, on any port.

My reason for doing this was that I had a new Wifi router, that I wanted to set up from my workstation, but my workstation was still connected to the "old" router.  I knew one option would be to create ssh tunnels to forward specific ports to the new router's LAN ip, but that meant opening a terminal and connecting / logging-in with the ssh command every time I wanted to open the router's web interface.  It also meant I'd have to set up one or more port-forwards to reach/test/reconfigure each device as I moved them.  That got tedious fast.

The Plan

I had a Raspberry Pi ZeroW that I had used for some forgotten thing a while back, and it was just sitting there in a box, gathering dust, begging to be made into a mini-nat-bridge/router between my current (old) network and the new one.  So, I searched for stuff like "nat" and "masquerade" and "iptables" because I remember it not being too difficult to get that exact thing working in an older version of Linux.  I even had it working once on a coax network with an old 386 machine running RedHat 2 (or something like that), using ipchains.  (That was so my wife and I could share our dialup connection to Mindspring instead of taking turns... boy those weren't the days).  But, after I installed the latest (Bookworm) Raspberry Pi OS, booted up the pi, connected it to the (old) network, and entered the iptables command, when I saw the dreaded: "command not found", I knew this was a new rabbit hole I had just jumped down.

The New Thing

Shifting my search strategy to things like "what replaced iptables" and "how do you set up masquerading now," I found that NetFilter has moved (or is nearly moved) from iptables to nftables.  I knew from experience that any time there's a subtle name change on something in Linux, right behind that I'll find a drastically different way of doing everything.  nftables didn't disappoint.  Instead of pre-established chains to which you inject your own rules, you now need to create a hierarchical table that contains its own set of chains, and each of those contains its own set of rules.  In a nutshell: 

  • Tables belong to a family that designates what subset of traffic it affects (ipv4, arp, ipv6, etc): https://wiki.nftables.org/wiki-nftables/index.php/Nftables_families
  • Chains are in a table, and belong to a type (filter, route, or nat), and are associated with a set of hooks that map to the part of the packet handling lifecycle related to the type - e.g. nat has hooks like prerouting and forward.
  • Rules are in a Chain, and describe something to be done with a packet.

This wiki page expands upon that "bootstrap view" a little: https://wiki.nftables.org/wiki-nftables/index.php/Quick_reference-nftables_in_10_minutes

How Does NAT Masquerading Work?

Before getting into how to set up nftables for NAT with masquerading, it is helpful to remember what is happening, so:

  • First, some other machine sends packets to the "inbound" network interface that are not destined for the machine that "owns" that interface.  This is typically done with a static route on the machine originating those packets, which is treating that interface as if it is part of a router.
  • If the machine receiving those "not for me" packets has packet-forwarding enabled, it will just send packets along, through whatever "outbound" network interface it would normally choose if it was originating those packets itself.  However, since it doesn't do anything to the fix the return IP address or source port number in those packets, the host that receives them next probably has no idea where to send its response packets, and the round-trip breaks.
  • Masquerading dynamically changes the source IP (to the "outbound" interface's IP) and port (to something it generates and keeps track of) so it can receive and re-route response packets back to the originating source IP and port.
    • Note SNAT is a variation on this same concept, but does not figure out what the "outbound" IP should be.  A SNAT setup depends on a specified, fixed "outbound" IP, which allows it to perform a bit better.

How the nftables Pieces Fit

To eliminate ambiguity between stand-in variables and actual values, this example assumes the following:

  • The machine has two network interfaces (because it needs at least two physically separate connections)
    • "inbound" connection is
      • on network 192.168.5.0/24
      • on an interface named "wlan0"
    • "outbound" connection is
      • on network 192.168.200.0.0/24
      • on an interface named "eth0"
  • The table needs a name, so we'll name it "nat_all_to_network2"
  • Chains can have names, but they'll be named unimaginatively with a shortened form of the hook they reference (e.g. hook prerouting chain will be "pre_rt"
The nft command is used to create tables, chains, and rules.
  • Note: Using the nft command in some circumstances relies on terminating argument/parameter (e.g. -a, -x, etc.) parsing before passing a negative number (e.g. -50).  You will see where that is needed below.
    • In bash at least any dash in the command line, after a double dash ("--") will be parsed as a literal value instead of a parameter - e.g.:  nft -- add ... priority -100
    • https://www.man7.org/linux/man-pages/man1/bash.1.html

Creating the Table

In nftables, a table contains a set of related chains, so that's the first required part.
  • Syntax: nft [command] [object_type] [family] [table_name]
  • Command: nft add table ip nat_all_to_network2
Note: The "ip" family means this rule only affects ipv4 packets.  "inet" would include ipv6 too.
Note: Commands to add chains and rules do not need to repeat the [family] part.

Adding Chains

Getting the table to process the right packets needs two chains which hook into prerouting and postrouting phases of packet handling with a chain type of "nat" (which means it only really examines the first packet of a connection to decide what to do with all of the packets that follow).
Note: Other hook types like "filter" continue evaluating each individual packet.
  • Syntax: nft [command] [object_type] [table_name] { type [chain_type] hook [hook_phase] priority [priority_value] ; }
  • Command (prerouting): nft -- add chain nat_all_to_network2 pre_rt { type nat hook prerouting priority -100 \; }
  • Command (postrouting): nft add chain nat_all_to_network2 post_rt { type nat hook postrouting priority 100 \; }
Note: The semicolon must be escaped so the shell/terminal won't treat it as a multiple-command-separator.
Note: The -100 and 100 priority values map to "special" values named dstnat and srcnat, respectively, which is how they appear when the table is viewed with nft list ruleset
For information overload about this, see: https://wiki.nftables.org/wiki-nftables/index.php/Netfilter_hooks

Adding Rules

The final part of setting up the table is to specify the circumstances when a packet should be evaluated, and what to do with it if it matches various attributes.
The prerouting chain needs a rule that limits the table to evaluate and process only packets coming from the inbound interface
  •  nft add rule nat_all_to_iot_network pre_rt iifname "wlan0"
The postrouting chain needs a rule that tells it to set up masquerading for packets that are being routed to the "outbound" interface.
  • nft add rule nat_all_to_iot_network post_rt oifname "eth0" masquerade

Blocking Reverse Traffic

When the default table in /etc/nftables.conf is loaded, there is a table for the inet family, named "filter", containing chains for input, forward, and output hooks.  To prevent any host on the outbound side from forwarding packets to hosts on the inbound side, add a few rules to the forward chain in this table.

First be sure the return packets from connections forwarded and masqueraded to the outbound side are still forwarded back.
  • nft add rule inet filter forward ct state {established, related} accept
Next drop any packets originating on the outbound side, destined for the inbound side.
  • nft add rule inet filter forward iifname "eth0" oifname "wlan0" drop
Packets that don't match one of those rules default to the chain's "policy accept"

Switching it On

None of this will do anything unless packet forwarding is turned on.
To enable forwarding until the next reboot:
  • sysctl -w net.ipv4.ip_forward=1
To make this permanent across reboots, edit /etc/sysctl.conf and add (probably just uncomment)
  • net.ipv4.ip_forward = 1
To reload the config immediately (to see if everything is set for a reboot), enter commands:
  • sysctl -p  (to reload)
  • sysctl net.ipv4.ip_forward  (to check resulting setting)

Persisting for the Next Boot

After following these directions and testing to be sure packet forwarding is working as expected, the table, chains and rules are just in volatile memory and will vanish upon reboot.  

To persist a table, so it is re-activated every time the system starts, it needs to be added to /etc/nftables.conf.  The format of the tables in the config file is approximately the same as the output of the nft list ruleset command.
  • Note: Special-value priorities displayed in the nft list ruleset command output are shown with their names (e.g. dstnat instead of -100,  and srcnat instead of 100).  nftables.conf must have the numeric values, since it doesn't seem to have a mapping from the names back to their numeric equivalents.
  • Note: Many tutorials found on the internet set up the initial rules locked to a particular network interface using iif and oif.  From the ntf command-line, the interface names seem to be translated into their index values, but in nftables.conf, values like eth0 and wlan0 are actually iifname or oifname, so it is better to set up the rules using those attributes to begin with.
    • See comment on Nov 6, 2023 here: https://superuser.com/questions/1815386/internet-connection-activ-only-after-restart-of-nftables