Infra that fixes itself, thanks to coding agents — Mahmoud Abdelwahab, Railway

·18:08

When production metrics start flashing red, the hard part isn’t noticing—it’s triaging, gathering context, and getting a safe fix out fast. This talk walks through Railway Autofix: a drop-in template that monitors your Railway project and opens GitHub PRs with proposed fixes when issues are detected.

Key takeaways

  • How a scheduled workflow fetches project architecture plus CPU/memory and HTTP metrics, then flags services that exceed defined thresholds
  • Why the speaker prefers analyzing a slice of time over alert-only triggers to reduce noise
  • How durable execution (via Inngest) is used to orchestrate multi-step workflows with retries and cached step results
  • How OpenCode runs as a headless server to clone a repo, follow a plan, implement changes, and open a PR for review