How Docker and iptables set sail together
linux
Published: 2023-03-15

WORK IN PROGRESS

When you hoist the sails on your docker ship, it sets course for some iptables adventures. If ye be a seasoned sailor who already knows the ins and outs of iptables, ye can weigh anchor and sail on ahead! But if deciphering the following snippet leaves you feeling like a landlubber, don’t fret! This post will chart a course to give you the full rundown and context of all the salty details.

On a fresh installation of Docker, it’ll add these iptable rules as part of the installation.

# stitched together using: https://orgmode.org/manual/Noweb-Reference-Syntax.html
$ sudo iptables-save -t nat
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:DOCKER - [0:0]
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A DOCKER -i docker0 -j RETURN
COMMIT
$ sudo iptables-save -t filter
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:DOCKER - [0:0]
:DOCKER-ISOLATION-STAGE-1 - [0:0]
:DOCKER-ISOLATION-STAGE-2 - [0:0]
:DOCKER-USER - [0:0]
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
-A DOCKER-USER -j RETURN
COMMIT

iptables primer

iptables in nutshell is a user-space cli utility program to manage incoming, forwarding and outgoing traffic through the linux kernel. Although I won’t dive into the nitty-gritty details of iptables, this post will provide the bare essentials to understand how Docker utilizes it.

iptables learning resources

Isn’t iptables obsolete?

Yes, it is true that the original iptables is made obsolete by the newer nftables. But there’s an in-between, iptables-nft which uses the newer nftables kernel API but reuses the legacy packet-matching code of iptables. As of the moment, you get the best of both worlds using iptables-nft.

If you’ve a recent system and it has iptables installed, it’s most likely iptables-nft, check by running the following. If it has the nf_tables in the output, it’s using iptables-nft.

λ  iptables -V
iptables v1.8.7 (nf_tables)

General consensus on learning, learn iptables syntax first with iptables-nft (Plenty learning materials), then learn nftables, then maybe try out the config managers like firewalld. Later checkout BPF / XDP.

Commands

  • iptables -nvL --line-number -t [table_name] : CLI view for table. --line-number is important because it tells you the rule ordering. -n shows numeric IPs, and -v helps us see the interface.
  • iptables-save -t [table_name] : File view for table. This commands does not save anything as the name implies but simply dumps the table data in the format it’ll be saved to disk. (We’re using this in our examples as it’s better for explaining)

Semantics

--->PRE------>[ROUTE]--->FWD---------->POST------>
    Conntrack    |       Mangle   ^    Mangle
    Mangle       |       Filter   |    NAT (Src)
    NAT (Dst)    |                |    Conntrack
    (QDisc)      |             [ROUTE]
                 v                |
                 IN Filter       OUT Conntrack
                 |  Conntrack     ^  Mangle
                 |  Mangle        |  NAT (Dst)
                 v                |  Filter

The diagram on top shows how tables and chains are related. Eg. we can see that, nat:PREROUTING is checked before the nat:INPUT.

Tables

  • Tables are an organizational structure for iptables. Chains are stored in tables, which in turn store rules, which in turn store matches and targets.
  • 5 tables: filter (default), nat, mangle, raw, security
  • Docker makes changes to just nat & filter tables so we’ll be focusing on those.

Chains

  • Chains are simply list of rules which are followed in order. Eg. ChainXYZ = [rule1, rule2, rule3, rule4]
  • Chains are per tables. i.e chain OUTPUT on nat table is different from chain OUTPUT on filter table.
  • Two types of chains: builtin and user defined.
    • Built-in chains represent the netfilter hooks which trigger them.
      • filter table: INPUT, FORWARD, OUTPUT
      • nat table: PREROUTING, INPUT, OUTPUT, POSTROUTING
    • Custom user defined chains represent targets which can be jumped to from the built-in chains.
      • Eg. Docker adds custom chains
      • Custom chains cannot have a default policy hence, you’ll see a - in the custom Docker added chains in the iptables-save output.
  • Understanding how chains are traversed is important to make sense of iptable rules but we won’t be delving into that territory in this post.

Rule

  • Rule = Match(s) + Target/Action

Match

  • Match is something that specifies a special condition within the packet that must be true (or false), if a match is true it can jump to a target
  • Types (Not official, just classifying)
    • Generic: A generic match is a kind of match that is always available, whatever kind of protocol we are working on, or whatever match extensions we have loaded. (eg. -s)
    • Implicit: Implicit matches are implied, taken for granted, automatic. These are protocol specific. (eg. -p tcp --dport)
    • Explicit: Explicit matches are those that have to be specifically loaded with the -m or --match option.

Target

  • These are basically actions to perform when a match occurs in a rule. Specified by the -j flag.
  • 3 Types: User defined(another chain), Builtin targets(ACCEPT, DROP, QUEUE, RETURN), Target extensions(man iptables-extensions).

Relevant notes

  • [some_number:some_other_number] : You’ll see this in iptable-save output. This is [Packets:Bytes] that have matched each rule. The default policies also have counters. It’s useful to see when you run something how this counters change. You can use the -Z option to clear the counters. -c option can be used to view counters by rule.
  • -A : Append rule to a chain
  • -I : Insert rule on top of a chain

nat table

Here’s what the nat table in its default state(Docker not installed yet) looks like.

$ sudo iptables-save -t nat
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]

filter table

Here’s what the filter table in its default state(Docker not installed yet) looks like.

$ sudo iptables-save -t filter
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]

Docker networking primer

Different container managers(docker, podman, lxd etc) provide a number of ways networking can be done with containers. An interesting one is the bridged networking approach(which this post is basedon), it essentially boils down to 3 things.

  1. Creating veth pair from host to net namespace-X. Every new container will add new veth interface and remove it once container is stopped. veth is a virtual device that acts as a tunnel between network namespaces. These devices create interconnected peering between the two connected links and pass direct traffic between them.
  2. Adding a bridge for the veth pair to talk through. When you install docker, it automatically creates docker0 bridge(a virtual bridge interface) for containers to communicate with each other and to the outside world.
  3. Adding iptables rules to access outside network (This is what we’re focusing on in here)

Image credits, Now usually docker/podman/lxd does all this for you so you won’t have to worry about it.

Docker’s game with iptables

Docker makes changes to 2 tables, nat (to resolve packets to and from containers, and more?) and filter (for isolation purposes, and more?).

Changes to nat table

$ sudo iptables-save -t nat
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:DOCKER - [0:0]
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A DOCKER -i docker0 -j RETURN
COMMIT

:DOCKER - [0:0]
  • Adds the DOCKER chain. As it’s a custom chain, it cannot have a default policy, hence -.

Intended for

  • Used to handle network address translation (NAT) for Docker containers.
  • I am not sure what it is intended for. Please help.

-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
  • Packet coming from any {interface, protocol, source} to a LOCAL address, jump to DOCKER chain.
  • See man iptables-extensions for details on what is LOCAL

Intended for

  • Traffic that’s coming from outside(external/host) to the container.
  • I am not sure what it is intended for. Please help.

-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
  • This rule is a combination of generic matching(-d) and explicit matching(-m).
  • Packets which are local being sent to a LOCAL address but not to the loopback range, jump to DOCKER chain.

Intended for

  • I am not sure why docker is interested in host traffic here.
  • I am not sure what it is intended for. Please help.

-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
  • Packet coming from any {interface, protocol, 172.17.0.0/16 network} to be sent out via anything but docker0 interface should MASQUERADE

Intended for

  • This is needed for internet access
  • Specifies that the source IP when going out via anything but docker0 needs to be masqueraded/SNAT’d.

-A DOCKER -i docker0 -j RETURN
  • In the default state, the DOCKER chain isn’t doing much but simply returning to the calling chain. But it’ll return only if the packet is incoming from docker0 interface.

Intended for

  • This becomes more useful as we add containers and open ports in them
  • I am not sure what it is intended for. Please help.

Changes to filter table

$ sudo iptables-save -t filter
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:DOCKER - [0:0]
:DOCKER-ISOLATION-STAGE-1 - [0:0]
:DOCKER-ISOLATION-STAGE-2 - [0:0]
:DOCKER-USER - [0:0]
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
-A DOCKER-USER -j RETURN
COMMIT

:DOCKER - [0:0]
:DOCKER-ISOLATION-STAGE-1 - [0:0]
:DOCKER-ISOLATION-STAGE-2 - [0:0]
:DOCKER-USER - [0:0]
  • Adds custom chains, since these are custom chains and cannot have a default policy, hence -.

Intended for

  • DOCKER : It contains rules that control incoming traffic to Docker containers
  • DOCKER-USER : user-defined iptables rules
  • DOCKER-ISOLATION-STAGE-1 : Restrict traffic (not a lot of info available about this chain)
  • DOCKER-ISOLATION-STAGE-2 : Restrict traffic (not a lot of info available about this chain)

-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
  • Anything that’s forwarded, first jumps to DOCKER-USER chain and then to the DOCKER-ISOLATION-STAGE-1 chain.

Intended for

  • Making sure docker filters get applied

-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
  • Accept packet coming from any {interface, protocol, source} to be sent via docker0 interface if they are ESTABLISHED or RELATED
  • To be sent via docker0 in this case means, packets going via the docker containers.
  • See man iptables-extensions for conntrack for details on ESTABLISHED or RELATED

Intended for

  • We’re only allowing packets which are already ESTABLISHED or RELATED, which means it will not allow NEW packets.

-A FORWARD -o docker0 -j DOCKER
  • Packets coming from any {interface, protocol, source} to be sent via docker0 interface, jump to DOCKER chain.
  • To be sent via docker0 in this case means, packets going to the docker containers.

Intended for

  • In fresh installation of Docker, there’s no rule defined in the filter:DOCKER chain. But will become useful as containers start exposing ports etc.

-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
  • Accepts packets coming from any {protocol, source, docker0 interface} to be sent via anything but docker0 interface (container to outside world)
  • Accepts packets coming from any {protocol, source, docker0 interface} to be sent via docker0 interface (container to container)

Intended for

  • In fresh installation of Docker, there’s no rule defined in the filter:DOCKER chain. But will become useful as containers start exposing ports etc.

-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
  • Packets coming from any {protocol, source, docker0 interface} to be sent via anything but docker0 interface, jump to DOCKER-ISOLATION-STAGE-2 (from container via outside interface, allow egress traffic)

Intended for

  • In its default state, DOCKER-ISOLATION-STAGE-1 doesn’t seem to do much but rules can be added via container configuration to enforce traffic restrictions for containers.

-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
  • Drop packets coming from any {protocol, source, interface} to be sent via docker0 interface (from anywhere via container, drop ingress traffic)

Intended for

  • In its default state, DOCKER-ISOLATION-STAGE-2 doesn’t seem to do much but rules can be added via container configuration to enforce traffic restrictions for containers.

-A DOCKER-USER -j RETURN
  • This adds a placeholder rule in the DOCKER-USER chain

Intended for

Other curious cases

What happens when ports open

What happens to iptables when you start a container

Nothing

What happens if you delete all the rules and restart docker

  • TODO
  • https://github.com/moby/moby/issues/43896
  • when we reconfigure or reload iptables, all these rules is lost.
  • Current solution - restart docker but it is bad - restarting docker service causes restart all containers.

How to prevent docker from making changes to iptables

  • You can set the no iptables thingy (not recommended)
  • Use DOCKER-USER chain
  • You can use lxd like me

Resources