ยท 8 min read ยท Wingston Sharon

Setting Up Tailscale Mesh Networking for Distributed GPU Inference

---

Setting Up Tailscale Mesh Networking for Distributed GPU Inference

By Wingston Sharon | November 2024


When we started building Agentosaurus, the inference problem was obvious from day one: we had compute scattered across locations. An M2 Mac Mini under my desk in Amsterdam. An Oracle Cloud ARM instance in Frankfurt. A local workstation with an Nvidia GPU that we use for heavier batch jobs. None of these were naturally reachable from each other in any reliable way, and I wasn't going to pay for a transit VPC just to let a Celery task call an Ollama endpoint.

Tailscale solved this. It's not a perfect solution โ€” I'll be honest about the gotchas โ€” but it's the reason our inference layer works without a dedicated network engineer.

What Tailscale Actually Is

Tailscale is a WireGuard-based mesh VPN. The key word is mesh: every node talks to every other node directly, peer-to-peer, using WireGuard tunnels. Tailscale handles the key exchange and the NAT traversal that makes WireGuard genuinely annoying to set up manually.

In practice this means: once a node is on your Tailnet, it's reachable from every other node via a stable 100.x.x.x address (Tailscale's CGNAT range) or, with MagicDNS enabled, a stable hostname like gpu-node-1.tail1234.ts.net. No port forwarding. No dynamic DNS hacks. No monthly calls to your ISP.

For inference routing, this is gold. Our Celery workers in Django can hit http://mac-mini-1.tail1234.ts.net:11434 and they'll get Ollama, every time, regardless of what ISP the mac mini is currently sitting behind.

Setting Up the Nodes

Installation is straightforward. On Linux (our Oracle Cloud ARM instances and the GPU workstation):

curl -fsSL https://tailscale.com/install.sh | sh
sudo systemctl enable --now tailscaled
sudo tailscale up --authkey=tskey-auth-XXXX --hostname=oracle-frankfurt-1

We use auth keys generated from the Tailscale admin console with the Reusable and Ephemeral: false flags. The --hostname flag is important โ€” it becomes the MagicDNS name, so name your nodes deliberately.

On macOS, it's the App Store install or the Homebrew cask:

brew install --cask tailscale
# Then via the menu bar app: Connect, sign in
# Or via CLI for headless:
sudo tailscale up --authkey=tskey-auth-XXXX --hostname=mac-mini-amsterdam

Verify connectivity after both nodes are up:

tailscale status
# Expected output:
# 100.x.x.x   mac-mini-amsterdam    macos   -
# 100.x.x.y   oracle-frankfurt-1    linux   -

tailscale ping oracle-frankfurt-1
# pong from oracle-frankfurt-1 (100.x.x.y) via DERP(fra) in 12ms
# pong from oracle-frankfurt-1 (100.x.x.y) via 85.x.x.x:12345 in 4ms

That second pong โ€” the direct one โ€” is Tailscale completing its NAT hole-punch. Once you see a direct route, latency drops significantly. Between Amsterdam and Frankfurt over Tailscale direct, we see ~5-8ms. That's acceptable for inference calls.

ACL Policies

By default, Tailscale gives all nodes full access to all other nodes. For a small team this is fine, but we wanted to be explicit about which nodes can reach the inference endpoints.

Tailscale ACLs live in the admin console under "Access Controls". Here's a simplified version of our policy:

{
  "acls": [
    {
      "action": "accept",
      "src": ["tag:celery-worker"],
      "dst": ["tag:inference-node:11434"]
    },
    {
      "action": "accept",
      "src": ["tag:inference-node"],
      "dst": ["tag:inference-node:11434"]
    },
    {
      "action": "accept",
      "src": ["group:admins"],
      "dst": ["*:*"]
    }
  ],
  "tagOwners": {
    "tag:celery-worker": ["group:admins"],
    "tag:inference-node": ["group:admins"]
  }
}

Tags are assigned at tailscale up time:

# On an inference node
sudo tailscale up --authkey=tskey-auth-XXXX \
  --hostname=mac-mini-amsterdam \
  --advertise-tags=tag:inference-node

# On a Celery worker
sudo tailscale up --authkey=tskey-auth-XXXX \
  --hostname=celery-worker-1 \
  --advertise-tags=tag:celery-worker

This means port 11434 (Ollama) on inference nodes is reachable from Celery workers, but not from the public internet or from other untagged nodes.

MagicDNS for Stable Hostnames

Enable MagicDNS in the Tailscale admin console under "DNS". Once on, each node gets a FQDN like mac-mini-amsterdam.tail1234.ts.net. We use this in our Django settings:

# settings/production.py
INFERENCE_NODES = [
    {
        "hostname": "mac-mini-amsterdam.tail1234.ts.net",
        "port": 11434,
        "models": ["llama3.1:8b", "nomic-embed-text"],
        "priority": 1,
    },
    {
        "hostname": "oracle-frankfurt-1.tail1234.ts.net",
        "port": 11434,
        "models": ["llama3.1:8b"],
        "priority": 2,
    },
]

The inference routing layer is deliberately simple โ€” a Django function that checks which nodes have a given model loaded and routes to the lowest-latency available one:

import httpx
from django.conf import settings

def route_inference_request(model: str, payload: dict) -> dict:
    candidates = [
        node for node in settings.INFERENCE_NODES
        if model in node["models"]
    ]
    candidates.sort(key=lambda n: n["priority"])

    for node in candidates:
        url = f"http://{node['hostname']}:{node['port']}/api/generate"
        try:
            response = httpx.post(url, json=payload, timeout=60.0)
            response.raise_for_status()
            return response.json()
        except (httpx.ConnectError, httpx.TimeoutException):
            continue  # Try next node

    raise RuntimeError(f"No available inference node for model: {model}")

Not fancy. No service mesh, no Kubernetes. But it works and it's easy to debug when something breaks.

Tailscale vs Raw WireGuard

We tested raw WireGuard before settling on Tailscale. Raw WireGuard is faster โ€” the tunnel overhead is slightly lower because you're not going through Tailscale's coordination layer. But the operational cost is real:

  • Key rotation requires touching every node
  • NAT traversal requires either public IPs or hole-punch scripts
  • No built-in ACLs (you use iptables/nftables yourself)
  • No MagicDNS

For a three-person team, the 1-2ms latency difference doesn't justify the ops complexity. If we were running 50 inference nodes and every millisecond mattered, we'd revisit.

Honest Limitations

Free tier node limits. Tailscale's free plan allows 100 devices. We're well under that, but if you're planning a large distributed setup, check the pricing. The paid plans are reasonable.

Subnet routing gotchas. If you want to expose an entire subnet (e.g., Docker internal network) through a Tailscale node, you use --advertise-routes. This mostly works, but we've had issues where the kernel IP forwarding wasn't enabled on the advertising node:

# Required for subnet routing to work
echo 'net.ipv4.ip_forward = 1' | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

Split-DNS sometimes breaks. When MagicDNS is enabled and you're also on a corporate VPN or another DNS system, Tailscale's DNS resolver can conflict. We've seen this on developer laptops. The fix is usually to set Tailscale DNS to "override local DNS" in the client settings โ€” but that's a footgun if you depend on your local DNS for other services.

Tailscale is a coordination dependency. The WireGuard tunnels themselves stay up if Tailscale's coordination servers go down, but new peers can't be added and ACL changes don't propagate. For a production inference network, this is worth knowing. Tailscale has excellent uptime historically, but it's a third-party dependency.

Despite all of this, Tailscale is the right call for us right now. If you're building distributed AI inference across heterogeneous hardware and you don't want to hire a network engineer, it's the fastest path to a working mesh.


Questions about our inference networking setup? Reach out at hello@agentosaurus.com.

Share: X (Twitter) LinkedIn

Build This Infrastructure?

We help AI teams build sovereign GPU clouds and autonomous systems. Free 30-minute consultation. Fixed-price projects from โ‚ฌ5K.

Schedule Free Consultation

Related Articles