Break-the-Glass: Emergency Manual Override
When production is on fire and you need to make changes directly in the cloud console, Fractal Cloud's break-the-glass protocol lets you temporarily take manual control of any component. The agent stops reconciling while you work, and reports what changed so you can decide how to proceed.
How It Works
Every component managed by Fractal Cloud carries a managed-by: fractal-cloud tag on the cloud resource. Two tags drive the protocol:
| Tag state | Agent behaviour |
|---|---|
managed-by: fractal-cloud present | Normal operation. Agent detects drift and reconciles to the declared state. |
managed-by: fractal-cloud absent | Manual Override. Agent hands off. No reconciliation. Drift is reported but not corrected. |
managed-by: fractal-cloud + fractal-cloud: reconcile both present | Reconcile Requested. Agent reconciles to the declaration on its next cycle, then removes the reconcile tag itself. |
The fractal-cloud: reconcile tag is meaningless on its own — agents only act on it when paired with managed-by. This two-tag handshake prevents accidental re-adoption when an operator restores the management tag without intending to trigger reconciliation.
Entering Manual Override
Remove the managed-by tag from the cloud resource. That's it.
┌────────────────────────────────┐
│ MANAGED │
│ tag: managed-by: fractal-cloud│
│ agent: reconciles drift │
└──────────┬─────────────────────┘
│
remove managed-by tag
(cloud console, CLI, or API)
│
v
┌──────────────────────────────┐
│ MANUAL OVERRIDE │
│ tag: (absent) │
│ agent: hands off │
│ agent: reports drift │
└──────────────────────────────┘
The agent will:
- Stop reconciling the component on its next check cycle (within ~3 minutes). Do not make emergency changes until the status shows ManualOverride
- Continue reading the cloud resource state.
- Report drift as an output field on the component, showing exactly what differs between the declared state and the actual cloud state.
- Never restore the
managed-bytag on its own. The tag is yours to control.
Viewing Drift During Manual Override
While a component is in Manual Override, the agent populates a drift output field on the component. This field shows each parameter that differs between your LiveSystem declaration and the actual cloud state:
{
"status": "ManualOverride",
"outputFields": {
"vpcId": "vpc-0abc123def456",
"cidrBlock": "10.0.0.0/16",
"drift": {
"instanceTenancy": {
"declared": "default",
"actual": "dedicated"
}
}
}
}
This lets you see exactly what changed without needing to compare manually. If there is no drift (the resource matches the declaration), the drift field is absent.
Exiting Manual Override (Re-adoption)
When you're ready for the agent to resume management, add the managed-by: fractal-cloud tag back to the resource:
┌──────────────────────────────┐
│ MANUAL OVERRIDE │
│ agent: hands off │
└──────────┬───────────────────┘
│
add managed-by: fractal-cloud tag
(cloud console, CLI, or API)
│
v
┌──────────────────────────────┐
│ MANAGED │
│ agent: reconciles to │
│ declared state │
└──────────────────────────────┘
On its next check cycle, the agent will:
- See the
managed-bytag is back. - Compare the cloud resource against the LiveSystem declaration.
- Reconcile any drift (revert the resource to the declared state).
- Clear the
driftoutput field. - Return the component to Active status.
Before re-adopting, review the drift output field. If you want to keep the changes you made during the emergency, update your LiveSystem declaration first. Then when the agent reconciles, there will be nothing to revert.
Step-by-Step Guide
1. Remove the Tag (Enter Manual Override)
- AWS
- Azure
- GCP
- OCI
- Hetzner
- VMware vSphere
- OpenShift / k8s
- Aruba
Console: EC2 → Tags tab → Manage tags → remove managed-by.
CLI:
# EC2-family (VPC, Subnet, Security Group, EC2, ECS, EKS)
aws ec2 delete-tags --resources <resource-id> --tags Key=managed-by
# RDS
aws rds remove-tags-from-resource \
--resource-name <db-instance-arn> --tag-keys managed-by
# S3
aws s3api delete-bucket-tagging --bucket <bucket-name>
Portal: resource → Tags → delete managed-by.
CLI:
az tag update --resource-id <resource-id> \
--operation delete --tags managed-by=fractal-cloud
Console: resource → Labels → remove managed-by.
CLI:
# Compute Instance
gcloud compute instances remove-labels <name> --labels=managed-by
# CloudRun
gcloud run services update <name> --remove-labels=managed-by
# CloudStorage
gcloud storage buckets update gs://<bucket> --remove-labels=managed-by
GCP uses "labels" instead of "tags" — agents treat them equivalently. VPC, Subnet, and Firewall do not support labels in the GCP Compute API; for those, the protocol's "default-to-import" fallback applies and the agent will treat them as ExternallyManaged whenever it reconciles them. Break-the-glass on these types is implicit (the agent never modifies them).
Console: resource → Tags → remove fractal.managed-by from free-form tags.
CLI (OCI requires you to provide the full tag set; fetch first, then re-set without managed-by):
# Example for a VCN — adapt the resource verb per service
oci network vcn update --vcn-id <ocid> \
--freeform-tags '{ "<other-tags-without-managed-by>": "..." }'
Console: Hetzner Cloud Console → resource → Labels → remove managed-by.
CLI (hcloud):
# Server
hcloud server update <id> --label managed-by-
# Network
hcloud network update <id> --label managed-by-
# Load Balancer
hcloud load-balancer update <id> --label managed-by-
The trailing - removes the label.
vSphere Client: object → Tags & Custom Attributes → Assign Tag → unselect managed-by:fractal-cloud from the fractal-cloud category.
CLI (govc):
govc tags.detach -c fractal-cloud "managed-by:fractal-cloud" /<DC>/vm/<vm-name>
vSphere tags are name-only, so the protocol's key:value semantics are encoded in the tag name itself, all under the fractal-cloud category.
The protocol's managed-by: fractal-cloud tag is expressed as the standard Kubernetes label app.kubernetes.io/managed-by=fractal-cloud.
# Remove the label (trailing - means delete)
kubectl label -n <namespace> deployment/<name> app.kubernetes.io/managed-by-
# Same pattern for Service, NetworkPolicy, PersistentVolumeClaim,
# VirtualMachine, etc.
Aruba represents tags as []string of key:value strings on the resource's metadata. Edit via the Aruba Cloud console (resource → Metadata → Tags → remove the managed-by:fractal-cloud entry) or via the Aruba API:
# Pseudocode — fetch, modify, PUT
GET /projects/{id}/providers/{provider}/locations/{loc}/<resource>/<id>
# → returns metadata.tags as ["managed-by:fractal-cloud", "fractal-cloud-owner:ls-...", ...]
PUT /... with metadata.tags filtered to remove "managed-by:fractal-cloud"
2. Make Your Emergency Changes
With the agent in Manual Override, make whatever changes you need directly in the cloud console or CLI. The agent will not interfere.
3. Check the Drift Report
After the agent's next check cycle (~3 minutes), the component status will show ManualOverride and the output fields will include a drift entry showing what differs from the declaration.
You can view this in the Fractal Cloud dashboard or via the API:
GET /livesystems/{id}/mutations/{mutationId}
Look for the drift key in the component's outputFields.
4. Decide: Update Declaration or Revert
Option A: Keep your changes. Update the LiveSystem declaration to match what you changed in the cloud. Then re-adopt. The agent will see no drift and simply resume management.
Option B: Revert to declared state. Re-adopt without changing the declaration. The agent will reconcile the resource back to the declared state, undoing your manual changes.
5. Re-adopt (Two-Tag Handshake)
To bring the component back under agent control, apply both managed-by: fractal-cloud and fractal-cloud: reconcile. Adding managed-by alone is not enough — the agent waits for the explicit reconcile signal before acting. The agent removes the reconcile tag itself once reconciliation completes; you never need to clean it up.
- AWS
- Azure
- GCP
- OCI
- Hetzner
- VMware vSphere
- OpenShift / k8s
- Aruba
# EC2-family
aws ec2 create-tags --resources <resource-id> \
--tags Key=managed-by,Value=fractal-cloud Key=fractal-cloud,Value=reconcile
# RDS
aws rds add-tags-to-resource --resource-name <db-instance-arn> \
--tags Key=managed-by,Value=fractal-cloud Key=fractal-cloud,Value=reconcile
# S3
aws s3api put-bucket-tagging --bucket <bucket> \
--tagging 'TagSet=[{Key=managed-by,Value=fractal-cloud},{Key=fractal-cloud,Value=reconcile}]'
az tag update --resource-id <resource-id> --operation merge \
--tags managed-by=fractal-cloud fractal-cloud=reconcile
# Compute Instance
gcloud compute instances add-labels <name> \
--labels=managed-by=fractal-cloud,fractal-cloud=reconcile
# CloudRun
gcloud run services update <name> \
--update-labels=managed-by=fractal-cloud,fractal-cloud=reconcile
OCI free-form tags require you to supply the complete tag set. Fetch the resource, merge in both keys, and update:
oci network vcn update --vcn-id <ocid> --freeform-tags '{
"managed-by": "fractal-cloud",
"fractal-cloud": "reconcile"
}'
hcloud server update <id> \
--label managed-by=fractal-cloud --label fractal-cloud=reconcile
govc tags.attach -c fractal-cloud "managed-by:fractal-cloud" /<DC>/vm/<vm-name>
govc tags.attach -c fractal-cloud "fractal-cloud:reconcile" /<DC>/vm/<vm-name>
In Kubernetes the reconcile signal is expressed as an annotation (: is illegal in label keys):
kubectl label -n <ns> deployment/<name> app.kubernetes.io/managed-by=fractal-cloud
kubectl annotate -n <ns> deployment/<name> fractal.cloud/reconcile=reconcile
Append both colon-strings to the resource's metadata.tags array via the Aruba API:
PUT /projects/{id}/providers/{provider}/locations/{loc}/<resource>/<id>
{ ..., "metadata": { "tags": [
"managed-by:fractal-cloud",
"fractal-cloud:reconcile",
...
] } }
After the agent's next poll, the component will transition to ReconcileRequested, then to Active once reconciliation completes. The fractal-cloud: reconcile tag will disappear automatically.
End-to-End Walkthrough: Drift Resolution Without Reverting
The most common break-the-glass workflow is not "revert everything I did" but keep my emergency changes by walking the LiveSystem declaration toward the cloud reality until drift is empty, then re-adopt cleanly. Here is the full cycle.
Scenario. An EC2 instance was scaled up from t3.medium to t3.large during an outage. The LiveSystem declaration still says t3.medium. You want the larger instance to stick.
Step 1 — Confirm Manual Override is in effect
After you removed the managed-by tag (Section 1) and waited one agent cycle, fetch the component status:
curl -s -H "Authorization: Bearer $TOKEN" \
"$ARIA/livesystems/$LS_ID/components/$COMP_ID" | jq
You should see something like:
{
"id": "ls-abc/blueprint/web-vm",
"status": "ManualOverride",
"outputFields": {
"instanceId": "i-0abc...",
"privateIp": "10.0.10.42",
"drift": {
"instanceType": { "declared": "t3.medium", "actual": "t3.large" }
}
}
}
The drift object is the agent's structured report of every parameter that differs between your declaration and the cloud. Each entry shows the declared value and the actual value side by side. If drift is absent, declaration and reality already match.
Step 2 — Update the LiveSystem declaration to match reality
Open your LiveSystem source and change the parameters to match the actual side of the drift report:
// Before
const web = networkAndCompute.aws.ec2({
id: 'web-vm',
instanceType: 't3.medium', // ← drift: declared
// ...
});
// After
const web = networkAndCompute.aws.ec2({
id: 'web-vm',
instanceType: 't3.large', // ← matches reality
// ...
});
For multi-parameter drift, walk the declaration entry by entry until every line of drift is addressed.
Step 3 — Push the updated LiveSystem
Re-deploy through the normal flow (SDK, UI, or API). The agent picks up the new declaration on its next poll. Component status stays ManualOverride (the tag is still missing) but the next drift report should show drift empty:
{
"status": "ManualOverride",
"outputFields": {
"instanceId": "i-0abc...",
"privateIp": "10.0.10.42"
// no `drift` key — declaration and reality match
}
}
If the drift report still shows entries, repeat steps 2–3 until empty. The point of this back-and-forth is that you stay in control of the declaration; the agent never writes into it.
Step 4 — Re-adopt with the two-tag handshake
Once drift is empty, apply both tags (Section 5 above). The agent reconciles on its next cycle, finds nothing to change, removes the reconcile tag, and the component returns to Active:
{
"status": "Active",
"outputFields": {
"instanceId": "i-0abc...",
"instanceType": "t3.large",
"privateIp": "10.0.10.42"
}
}
That's the loop: drift report → declaration update → empty drift → re-adopt → Active, with the cloud reality preserved end-to-end.
Choosing the alternative path: revert
If you instead want to throw away your emergency changes and snap back to the original declaration, skip steps 2–3 entirely. Apply the two-tag handshake from Manual Override directly. The agent will see drift on every parameter you changed manually and reconcile each one back to the declared value.
Important Rules
-
The agent never restores a removed
managed-bytag. If you remove it, only you (or an automated process you control) can add it back. This is a non-negotiable safety guarantee. -
Scope is per-component. Removing the tag from a VPC does not affect its subnets, security groups, or other dependent resources. Each component is managed independently.
-
Output fields are always available. Even in Manual Override, the agent continues reading the cloud resource and updating output fields (like resource IDs, endpoints, etc.). Dependent components that reference these output fields will continue to work.
-
The agent check cycle is approximately 3 minutes. After removing or adding the tag, allow up to one agent cycle for the status change to take effect.
FAQ
Q: What happens if I delete a component's cloud resource while in Manual Override?
A: The agent will detect that the resource is gone. Since there is no resource to carry the managed-by tag, you cannot re-adopt the component by adding the tag back — there is no resource to tag. The component will remain in Manual Override. To recreate the resource, you will need to redeploy the LiveSystem — this triggers a fresh provisioning cycle for the missing component.
Q: Can I use Manual Override for long-term external management? A: Yes. There is no timeout. A component can remain in Manual Override indefinitely. The agent will continue reporting drift but will never reconcile.
Q: What if I accidentally remove the tag? A: Add it back. The agent will reconcile on its next cycle and the component will return to Active status. No data is lost.
Q: Does Manual Override affect billing? A: No. The cloud resource continues to exist and incur its normal cloud provider costs. Manual Override only affects the agent's reconciliation behaviour.