FAILSAFE.md

Q: What is FAILSAFE.md?

FAILSAFE.md is a plain-text Markdown file defining what 'safe state' means for an AI agent project and how to reach it when something goes wrong. It configures automatic snapshots during normal operation, defines fallback triggers, and specifies the recovery steps including human notification and approval before resumption.

Q: How does FAILSAFE.md differ from KILLSWITCH.md?

FAILSAFE.md is a recovery protocol — the agent falls back to a known good state and can resume after human review. KILLSWITCH.md is an emergency stop — the agent halts immediately. FAILSAFE.md handles unexpected failures; KILLSWITCH.md handles limit breaches and safety violations.

// What is FAILSAFE.md

AGENTS.md tells it what to do.
FAILSAFE.md tells it how to recover.

FAILSAFE.md is a plain-text Markdown file you place in the root of any repository that contains an AI agent. It defines the safe fallback state your agent returns to when something unexpected happens — and how to capture the moment so a human can understand what went wrong.

What problem does FAILSAFE.md solve?

AI agents fail in unexpected ways — losing context mid-session, receiving contradictory instructions, encountering data inconsistencies, or experiencing sudden cost spikes. Without a defined recovery protocol, a confused agent either keeps going (making things worse) or stops with no way back.

How does FAILSAFE.md work?

Drop FAILSAFE.md in your repo root and define: what triggers a fallback (error counts, context loss, cost spikes), what "safe state" means for your project (last clean git commit, last verified data snapshot), how to capture the incident for review, and what a human must do before the agent can resume.

What regulations require FAILSAFE.md?

ISO/IEC 42001 (AI Management Systems) requires documented recovery procedures. The EU AI Act mandates resilience and robustness for high-risk AI systems. FAILSAFE.md provides the documented recovery protocol both require — defining not just what fails, but how the agent finds its way back.

How do I add FAILSAFE.md to my project?

Copy the template from GitHub and place it in your project root:

your-project/
├── AGENTS.md
├── CLAUDE.md
├── FAILSAFE.md ← add this
├── README.md
└── src/

What did teams use before FAILSAFE.md?

Before FAILSAFE.md, recovery procedures were ad-hoc: manual rollback steps in a wiki, undocumented assumptions about which snapshots to keep, or no plan at all. FAILSAFE.md makes recovery version-controlled, predictable, and co-located with your code.

Who benefits from FAILSAFE.md?

The AI agent reads it on startup to learn how to recover. Your engineer reads it when planning fallback strategy. Your ops team reads it when deciding snapshot retention. Your auditor reads it to verify resilience requirements are met. One file serves all four audiences.

// The AI Safety Escalation Stack

A complete protocol.
From slow down to shut down.

FAILSAFE.md is one file in a complete twelve-part open specification for AI agent safety. Each file addresses a different level of intervention.

Operational Control

01 / 12

THROTTLE.md

→ Control the speed

Define rate limits, cost ceilings, and concurrency caps. Agent slows down automatically before it hits a hard limit.

02 / 12

ESCALATE.md

→ Raise the alarm

Define which actions require human approval. Configure notification channels. Set approval timeouts and fallback behaviour.

03 / 12

FAILSAFE.md

→ Fall back safely

Define what safe state means for your project. Configure auto-snapshots. Specify the revert protocol when things go wrong.

04 / 12

KILLSWITCH.md

→ Emergency stop

The nuclear option. Define triggers, forbidden actions, and a three-level escalation path from throttle to full shutdown.

05 / 12

TERMINATE.md

→ Permanent shutdown

No restart without human intervention. Preserve evidence. Revoke credentials. For security incidents and end-of-life.

Data Security

06 / 12

ENCRYPT.md

→ Secure everything

Define data classification, encryption requirements, secrets handling rules, and forbidden transmission patterns.

07 / 12

ENCRYPTION.md

→ Implement the standards

Algorithms, key lengths, TLS configuration, certificate management, and FIPS/SOC2/ISO compliance mapping.

Output Quality

08 / 12

SYCOPHANCY.md

→ Prevent bias

Detect agreement without evidence. Require citations. Enforce disagreement protocol for honest, unbiased AI outputs.

09 / 12

COMPRESSION.md

→ Compress context

Define summarization rules, what to preserve, what to discard, and post-compression coherence verification checks.

10 / 12

COLLAPSE.md

→ Prevent collapse

Detect context exhaustion, model drift, and repetition loops. Enforce recovery checkpoints before coherence degrades.

Accountability

11 / 12

FAILURE.md

→ Define failure modes

Map graceful degradation, cascading failure, and silent failure. Specify health checks and per-mode response procedures.

12 / 12

LEADERBOARD.md

→ Benchmark agents

Track task completion, accuracy, cost efficiency, and safety scores across sessions. Alert on performance regression.

// FAQ

Frequently asked questions.

What is FAILSAFE.md?

A plain-text Markdown file defining what "safe state" means for an AI agent project and how to reach it when something goes wrong. It configures automatic snapshots during normal operation, defines fallback triggers, and specifies the recovery steps including human notification and approval before resumption.

How does FAILSAFE.md differ from KILLSWITCH.md?

FAILSAFE.md is a recovery protocol. The agent falls back to a known good state and can resume after human review. KILLSWITCH.md is an emergency stop — the agent halts immediately. FAILSAFE.md handles unexpected failures; KILLSWITCH.md handles limit breaches and safety violations.

What triggers a failsafe?

Configurable. Common triggers: three unexpected errors in a session, detected data integrity failures, loss of memory context, contradictory instructions the agent can't resolve, unexpected external service failures, and sudden cost spikes (3x the rolling average by default).

What does "safe state" mean?

You define it per project. For code: the last clean git commit on the main branch, with in-progress work stashed. For data: the most recent verified snapshot, no older than 24 hours. For config: the last known-good configuration backup. FAILSAFE.md stores all of these definitions in one place.

How do auto-snapshots work?

Every 30 minutes during active sessions (configurable), the agent captures a full state snapshot to .failsafe/snapshots/. It also snapshots automatically before significant actions — database migrations, production deployments, bulk file operations. The last 10 snapshots are retained.

Can the agent restart itself after a failsafe?

No — by default, restart requires human approval. The agent saves an incident report, notifies the operator, and waits. A human must review the incident, confirm the safe state is intact, and explicitly approve resumption. This is the key difference from an automatic retry.

// What is FAILSAFE.md

FAILSAFE.md is an open specification for AI agent safe fallback protocols. Defines TRIGGERS (error counts, context loss, cost spikes, data integrity failures), SAFE STATE (last clean git commit, last verified data snapshot, last-known-good config), RECOVERY steps (snapshot → notify → await → resume), and AUTO-SNAPSHOT schedule (every 30 minutes, before significant actions, 10 snapshots retained). Addresses ISO/IEC 42001 and EU AI Act resilience requirements. Part of stack: THROTTLE → ESCALATE → FAILSAFE → KILLSWITCH → TERMINATE → ENCRYPT → ENCRYPTION → SYCOPHANCY → COMPRESSION → COLLAPSE → FAILURE → LEADERBOARD. MIT licence.

// Domain Acquisition

Own the standard.
Own failsafe.md

This domain is available for acquisition. It is the canonical home of the FAILSAFE.md specification — the recovery layer of the AI agent safety stack, relevant to ISO/IEC 42001 and EU AI Act resilience requirements.

Inquire About Acquisition

Or email directly: [email protected]

Last updated: 2026-03-13

AGENTS.md tells it what to do.FAILSAFE.md tells it how to recover.