Why restart the gateway twice after npm upgrade?

Some releases leave Node module resolution hot until the second clean start—mirroring upstream guidance on stale global references.

Where should backups live?

Store tarballs outside the npm tree, e.g. dated archives next to OPENCLAW_STATE_DIR documentation.

AI / Automation April 22, 2026

2026-04-22 OpenClaw npm Upgrade, Stale Module References & Gateway Restart on Headless Leased Cloud Mac

MacXCode Engineering Team April 22, 2026 ~17 min read

Operators who run OpenClaw as a global npm install on leased Apple Silicon Macs frequently see a confusing pattern after upgrades: openclaw --version reports the new semver, yet the gateway still loads older modules—webhooks flap, channels miss heartbeats, and logs reference mixed paths. This 2026-04-22 article documents a stop-first upgrade, the double restart mitigation aligned with community reports of stale Node resolution, and rollback that pairs with gateway upgrade & rollback. It complements onboard & install-daemon (first boot) and environment & secrets (state paths).

Stale Module Signals Worth Logging

Capture evidence before mutating npm globals—your future self debugging at 02:00 will thank you.

Signal	Interpretation	Capture command
Gateway log shows two semver strings	Binary upgraded but worker still importing old tree	Archive journalctl/stdout + `which openclaw`
Intermittent ESM import errors after upgrade	Mixed `node_modules` resolution paths	`npm ls -g openclaw --all` + gateway env dump
First restart healthy, traffic still odd	Second bounce needed to flush module cache	Time-stamped health curls to `127.0.0.1`

Quantify risk: keep at least 3 prior npm semver values in your change log, maintain 2 healthy gateways across regions when doing risky upgrades, and budget 8–22 minutes of maintenance window for npm + gateway cycles on busy hosts.

Preflight Snapshot & Backup

Record openclaw --version and node -v (target Node 22.16+ or 24.x per upstream guidance).
Tarball ~/.openclaw to a dated path outside npm’s global tree—mirror the backup discipline from gateway rollback.
Export launchd plist paths if you customized them during onboard.
Verify OPENCLAW_STATE_DIR and secrets precedence via launchd env notes.

Stop → Upgrade → Restart (Order Matters)

Never let npm rewrite globals while the gateway still holds file locks on transpiled modules. Stop the service first, then upgrade, then restart—mirroring production practices for any Node daemon on macOS.

openclaw gateway stop && npm install -g openclaw@latest && openclaw gateway start

Doctor first: if duplicate LaunchAgents warnings appear, fix plist sprawl before upgrading—otherwise you may restart the wrong process tree.

Double Gateway Restart for Stale Modules

When health checks still show mixed versions, perform a second clean restart: stop, wait for sockets to drain, start, re-run local probes. This mirrors mitigation patterns discussed publicly for npm global upgrades where Node retains stale resolution until the second cold start. Between restarts, avoid flipping nginx upstreams prematurely—follow health probes before exposing HK / JP / KR / SG / US traffic again.

What to Watch Between Restarts

During the quiet window, capture listening ports (lsof -nP -iTCP -sTCP:LISTEN filtered to your gateway port), confirm no orphaned node children remain, and verify disk inode pressure on the volume holding the global npm prefix. Teams that skip this verification often “succeed” on the first bounce yet still serve mixed module graphs because a long-lived worker survived SIGTERM. Document timestamps: we recommend at least 45–90 seconds between stop and start on busy hosts, and 120 seconds when antivirus or file-indexing daemons compete for the same paths—common on shared build machines.

Correlate application logs with system load: if load average stays above 4× core count while the gateway restarts, defer ingress reopening until CPU settles; otherwise health checks may flake for reasons unrelated to npm. For chat bridges, send a synthetic “noop” message through your staging webhook after each restart to prove end-to-end delivery before promoting the route to production DNS.

Finally, snapshot the npm global root (npm root -g) and the resolved binary (which openclaw) into your ticket—when semver diverges, comparing those two paths catches symlink drift early. Operators who lease multiple regions should repeat the same checklist per host; divergent global prefixes between JP and US nodes are a frequent source of “works in one DC only” reports after upgrades.

Health, Logs, and Ingress Re-enable

After the second bounce, validate:

Local curl against the gateway admin/health endpoints with expected JSON fields.
Structured log continuity per production logging guidelines.
Ingress TLS and webhook paths via nginx reverse proxy configuration.

Extend validation with negative tests: temporarily break a downstream token on purpose in a sandbox agent to ensure error surfaces match the new build’s string templates—mixed semvers often show up first as subtly different error copy, not as explicit version banners. Keep at least five baseline metrics (p50/p95 latency, error rate, queue depth, CPU, RSS) for 15 minutes post-change so you can compare against pre-upgrade charts; if you lack history, seed dashboards before touching npm.

Rollback When Semver Won’t Converge

If dual restarts fail, reinstall the last known good semver with npm, restore the tarball of ~/.openclaw, and replay the gateway start sequence. Document the incident with both npm and OpenClaw versions for support threads. For mesh access during recovery, see Tailscale mesh.

Scenario	Rollback focus	Expected time
Patch-level npm regression	Pin previous patch with npm	6–12 min
Config drift after upgrade	Restore tarball + validate JSON schema	12–25 min
Region-wide incident	Fail over to second MacXCode node	Depends on DNS/TTL

Channel-level issues after upgrades may overlap with sub-agent & channel troubleshooting; skills packaging differs—see Skills & ClawHub if your upgrade included skill migrations.

FAQ: npm + Gateway on Cloud Mac

Question	Answer
Is pnpm supported?	If your org standardizes pnpm, mirror the same stop-first discipline; keep global prefixes obvious in `PATH`.
Should I dockerize instead?	Compare trade-offs in Docker vs native npm—both can work, but this article targets bare-metal leases.
Where do I monitor ongoing health?	Use help links for support channels and keep synthetic probes per production guide.

Why Mac mini M4 is the Steady-State Host for OpenClaw npm Upgrades

Mac mini M4 bare-metal nodes give deterministic cold-start times for Node gateways—important when you intentionally bounce services twice. Compared to oversubscribed VMs, you spend fewer cycles chasing phantom “CPU steal” symptoms that look like stale modules. MacXCode’s SSH-first access across HK · JP · KR · SG · US, optional 1–2 TB NVMe for workspaces, and predictable networking simplify running duplicated gateways for blue/green upgrades. When npm churn accelerates, add capacity from pricing instead of stacking experiments on one tired host.

Run OpenClaw on dedicated M4 gateways

SSH · Multi-region · Agent-friendly

Lease capacity Deployment help

Stale Module Signals Worth Logging

Preflight Snapshot & Backup

Stop → Upgrade → Restart (Order Matters)

Double Gateway Restart for Stale Modules

What to Watch Between Restarts

Health, Logs, and Ingress Re-enable

Rollback When Semver Won’t Converge

Related Runbooks

FAQ: npm + Gateway on Cloud Mac

Why Mac mini M4 is the Steady-State Host for OpenClaw npm Upgrades

Run OpenClaw on dedicated M4 gateways