2026-05-18 · ~3 min read
The cheapest kill-switch is a YAML key
You’re building something that can blow up — money, data, state. You need a way to stop it. The instinct is to reach for “real” infrastructure: a feature flag service, an env var the operator sets at runtime, a row in a database the worker polls. I picked none of those for trader. I used a YAML key. Here’s why.
The instinct
Each of the “serious” options has real engineering behind it. A feature flag service like LaunchDarkly gives you per-user gates, scheduled rollouts, and audit logs — sophisticated and well-understood. An environment variable, set with KILL_SWITCH=1 and a restart, is the textbook twelve-factor move. A row in a kill_switches table that the worker polls on every iteration is composable, supports per-strategy flags, and trivially grows a UI later.
I have used all three in other contexts and I will use them again. None of them is right for this problem. The kill-switch on a trading agent is not a feature flag and not a deployment toggle. It is a stop button. The thing a stop button needs to do, more than anything else, is work when you press it.
Why those are wrong here
The asymmetry is the whole argument. For a kill-switch, the cost of failing to stop is catastrophic — money moves, positions open, the agent keeps trading through the failure. The cost of being too simple — one global switch, no per-strategy granularity, no UI — is zero. The decision should optimize for can I reliably stop? and nothing else.
Each “serious” option introduces a dependency between I want to stop and I can stop. The feature flag service needs the network; if the network is what’s broken, the kill-switch is what’s broken. The environment variable requires the operator to be present and the process to restart, and the restart is itself a risky action while a position is open. The database row requires the worker to be talking to the DB to read it, so a degraded DB connection is a degraded kill-switch.
A YAML key in the repo has no failure mode that isn’t also a failure mode of running the code at all.
The one-key approach
The implementation is six lines of config and four lines of code:
kill_switch:
enabled: false
def main():
cfg = yaml.safe_load(open("config.yaml"))
if cfg["kill_switch"]["enabled"]:
log.warning("kill_switch enabled; exiting")
return
# ... normal routine body
Flipping the switch is git commit -am 'kill' && git push. The next scheduled tick reads the new config and exits cleanly without opening a position. The state is in version control, the diff is the audit log, the rollback is git revert, and “who triggered the kill” is git blame.
What you give up
No per-strategy flags. No scheduled windows. No UI. No ability to disable opening positions while still allowing closes — that one stings a little, and if I ever need it I’ll add a second key, not a second system. All true tradeoffs. None of them answer the question the kill-switch exists to answer, which is: is the agent allowed to trade right now? That is the only question, and one global boolean answers it.
The shape of the principle
The cheapest tool that solves the actual problem beats the sophisticated tool that also solves problems you don’t have. The trap is reaching for infrastructure because it feels more engineered, when the asymmetry of the failure modes points hard at the simple option. Kill-switches are the clearest case I’ve worked through. The same shape shows up elsewhere — risk counters in committed JSON instead of SQLite, a 30-day paper-trade gate as a file write instead of a feature flag. Same instinct, same result.