Skip to main content

Break-the-Glass: Emergency Manual Override

When production is on fire and you need to make changes directly in the cloud console, Fractal Cloud's break-the-glass protocol lets you temporarily take manual control of any component. The agent stops reconciling while you work, and reports what changed so you can decide how to proceed.

How It Works

Every component managed by Fractal Cloud carries a managed-by: fractal-cloud tag on the cloud resource. This single tag controls the agent's behaviour:

Tag stateAgent behaviour
managed-by: fractal-cloud presentNormal operation. Agent detects drift and reconciles to the declared state.
managed-by: fractal-cloud absentManual Override. Agent hands off. No reconciliation. Drift is reported but not corrected.

Entering Manual Override

Remove the managed-by tag from the cloud resource. That's it.

                  ┌────────────────────────────────┐
│ MANAGED │
│ tag: managed-by: fractal-cloud│
│ agent: reconciles drift │
└──────────┬─────────────────────┘

remove managed-by tag
(cloud console, CLI, or API)

v
┌──────────────────────────────┐
│ MANUAL OVERRIDE │
│ tag: (absent) │
│ agent: hands off │
│ agent: reports drift │
└──────────────────────────────┘

The agent will:

  • Stop reconciling the component on its next check cycle (within ~3 minutes). Do not make emergency changes until the status shows ManualOverride
  • Continue reading the cloud resource state.
  • Report drift as an output field on the component, showing exactly what differs between the declared state and the actual cloud state.
  • Never restore the managed-by tag on its own. The tag is yours to control.

Viewing Drift During Manual Override

While a component is in Manual Override, the agent populates a drift output field on the component. This field shows each parameter that differs between your LiveSystem declaration and the actual cloud state:

{
"status": "ManualOverride",
"outputFields": {
"vpcId": "vpc-0abc123def456",
"cidrBlock": "10.0.0.0/16",
"drift": {
"instanceTenancy": {
"declared": "default",
"actual": "dedicated"
}
}
}
}

This lets you see exactly what changed without needing to compare manually. If there is no drift (the resource matches the declaration), the drift field is absent.

Exiting Manual Override (Re-adoption)

When you're ready for the agent to resume management, add the managed-by: fractal-cloud tag back to the resource:

                  ┌──────────────────────────────┐
│ MANUAL OVERRIDE │
│ agent: hands off │
└──────────┬───────────────────┘

add managed-by: fractal-cloud tag
(cloud console, CLI, or API)

v
┌──────────────────────────────┐
│ MANAGED │
│ agent: reconciles to │
│ declared state │
└──────────────────────────────┘

On its next check cycle, the agent will:

  1. See the managed-by tag is back.
  2. Compare the cloud resource against the LiveSystem declaration.
  3. Reconcile any drift (revert the resource to the declared state).
  4. Clear the drift output field.
  5. Return the component to Active status.
Important

Before re-adopting, review the drift output field. If you want to keep the changes you made during the emergency, update your LiveSystem declaration first. Then when the agent reconciles, there will be nothing to revert.

Step-by-Step Guide

1. Remove the Tag (Enter Manual Override)

AWS Console:

  1. Navigate to the resource (e.g., EC2 > Instances).
  2. Select the instance, go to Tags tab.
  3. Click Manage tags.
  4. Remove the managed-by tag.
  5. Save.

AWS CLI:

# EC2 resources (VPC, Subnet, Security Group, EC2 Instance)
aws ec2 delete-tags \
--resources <resource-id> \
--tags Key=managed-by

# RDS
aws rds remove-tags-from-resource \
--resource-name <db-instance-arn> \
--tag-keys managed-by

2. Make Your Emergency Changes

With the agent in Manual Override, make whatever changes you need directly in the cloud console or CLI. The agent will not interfere.

3. Check the Drift Report

After the agent's next check cycle (~3 minutes), the component status will show ManualOverride and the output fields will include a drift entry showing what differs from the declaration.

You can view this in the Fractal Cloud dashboard or via the API:

GET /livesystems/{id}/mutations/{mutationId}

Look for the drift key in the component's outputFields.

4. Decide: Update Declaration or Revert

Option A: Keep your changes. Update the LiveSystem declaration to match what you changed in the cloud. Then re-adopt. The agent will see no drift and simply resume management.

Option B: Revert to declared state. Re-adopt without changing the declaration. The agent will reconcile the resource back to the declared state, undoing your manual changes.

5. Re-adopt (Add the Tag Back)

# EC2 resources
aws ec2 create-tags \
--resources <resource-id> \
--tags Key=managed-by,Value=fractal-cloud

# RDS
aws rds add-tags-to-resource \
--resource-name <db-instance-arn> \
--tags Key=managed-by,Value=fractal-cloud

Important Rules

  1. The agent never restores a removed managed-by tag. If you remove it, only you (or an automated process you control) can add it back. This is a non-negotiable safety guarantee.

  2. Scope is per-component. Removing the tag from a VPC does not affect its subnets, security groups, or other dependent resources. Each component is managed independently.

  3. Output fields are always available. Even in Manual Override, the agent continues reading the cloud resource and updating output fields (like resource IDs, endpoints, etc.). Dependent components that reference these output fields will continue to work.

  4. The agent check cycle is approximately 3 minutes. After removing or adding the tag, allow up to one agent cycle for the status change to take effect.

FAQ

Q: What happens if I delete a component's cloud resource while in Manual Override? A: The agent will detect that the resource is gone. Since there is no resource to carry the managed-by tag, you cannot re-adopt the component by adding the tag back — there is no resource to tag. The component will remain in Manual Override. To recreate the resource, you will need to redeploy the LiveSystem — this triggers a fresh provisioning cycle for the missing component.

Q: Can I use Manual Override for long-term external management? A: Yes. There is no timeout. A component can remain in Manual Override indefinitely. The agent will continue reporting drift but will never reconcile.

Q: What if I accidentally remove the tag? A: Add it back. The agent will reconcile on its next cycle and the component will return to Active status. No data is lost.

Q: Does Manual Override affect billing? A: No. The cloud resource continues to exist and incur its normal cloud provider costs. Manual Override only affects the agent's reconciliation behaviour.