Humans Are the Integration Layer

Designing Infrastructure for Bounded Autonomy

Introduction

A production access request during an incident looks straightforward from far enough away. An identity system can grant a role, a privileged access system can time-limit it, and an audit system can record what happened. The hard part is deciding whether the request should be granted at all.

That decision draws on the state of the incident and who owns the service. The reviewer may also need to check the requester’s device, the sensitivity of the data, recent access, and any current exceptions. In many organizations, those signals live in separate systems with different owners and failure modes. The person reviewing the request has to join them together.

Infrastructure systems have made bounded decisions for years. Controllers reconcile toward desired state. Autoscalers adjust capacity, identity platforms enforce conditional access, and rollback systems revert unhealthy deployments. Large language models add a cheaper way to combine context from systems that were never designed to understand each other. That is the useful change.

Traditional automation transformed execution. It made infrastructure provisioning and service deployment repeatable, then expanded into capacity management, credential rotation, session revocation, and rollback. Mediation still scales badly. A production access request can turn into a Slack thread while people reconstruct the policy, assess the risk, and gather context from several systems before making a decision.

Humans became the integration layer because we used the same queue for two different jobs. Governance sets ownership and risk appetite, grants exception authority, and assigns accountability. Mechanical mediation gathers context and checks it against an existing policy before translating a request into a bounded action. Infrastructure systems were poor at distinguishing those jobs, so people handled both.

The result is that ambiguous policy hides inside approval queues. Sometimes a reviewer is making a genuine governance decision. At other times they are compensating for missing metadata or stale ownership records. Disconnected telemetry and poorly encoded policy create more work of the same kind. Software can absorb the mechanical cases, but giving it unresolved governance decisions would be reckless.

Bounded autonomy starts by pulling implicit governance out of human queues. Teams can then encode settled policy and let systems handle the mechanical mediation around it. People retain ownership of policy, risk, and accountability while routine decisions stop depending on them to join systems together by hand.

Used this way, bounded autonomy is constrained reasoning inside an explicit governance boundary. It requires dependable context and narrow authority. Actions must be reversible and auditable, with a working escalation path. A system missing those safeguards is operating beyond its authority.

Execution, Mediation, and Governance

A useful way to reason about infrastructure autonomy is to separate work into three layers: execution, mediation, and governance. The distinction matters because these layers fail differently, scale differently, and require different kinds of human involvement. Treating them as one undifferentiated approval workflow is how organizations end up routing mechanical checks and real risk decisions through the same human bottleneck.

Current workflows often collapse all three kinds of work into one human queue:

flowchart TD
    R[Request] --> H[Human reviewer]
    H --> E[Execution]
    H --> M[Mediation]
    H --> G[Governance]

Bounded autonomy separates routine mediation from decisions that still need human governance. Known, reversible, governed work can move through the system. Ambiguous, contested, or irreversible work still escalates:

flowchart TD
    R[Request] --> D{Decision boundary}
    D --> S[System acts within policy]
    D --> H[Humans resolve ambiguity]
    S --> A[Audit trail]
    H --> A

Execution is deterministic system action. Provisioning infrastructure, reconciling desired state, deploying services, rotating credentials, and rolling back a deployment primarily belong to the execution layer. These actions can be complicated, but once the system knows what should happen, the operation is usually well-defined.

Mediation is contextual interpretation between systems. It includes deciding which policy applies, whether telemetry is trustworthy, whether conflicting signals should block action, and whether a proposed change sits inside acceptable risk boundaries. Mediation is not the button press. It is the reasoning that happens before the button press.

Governance is organizational authority and accountability. It covers policy authorship, ownership boundaries, escalation authority, and who answers when something goes wrong. Through governance, an organization records what it is willing to risk.

Traditional infrastructure automation largely transformed execution. Infrastructure as Code made provisioning reviewable and repeatable. Declarative orchestration and reconciliation loops reduced manual state management. Autoscaling and policy engines moved common operational responses into systems once the relevant policy was already known.

Those systems are strongest when the decision boundary is narrow. A Kubernetes controller can converge state, but it does not know whether a rollback conflicts with an active security investigation. Karpenter can provision nodes for unschedulable workloads without knowing whether unexpected demand represents growth, abuse, fraud, or an incident. Conditional access can enforce policy, but exception handling during an outage often depends on context outside the identity system.

That is where humans still sit. They connect domains that were never designed to reason together. They know when an incident changes the risk calculation, when ownership metadata is stale, when a policy exception is acceptable for one data class but not another, and when a technically valid request violates the intent of the control.

Large language models matter here because they can reduce the cost of mediation. Historically, cross-domain mediation required bespoke integration for each context source and workflow. The cost scaled in engineering time or human review time. LLMs make it more practical to compose context from heterogeneous sources, which is why the architecture is worth revisiting now.

That does not make governance automatable in the same way. A model can help evaluate whether a request appears consistent with policy, but it cannot decide what the organization is willing to risk. A system can propose or even execute bounded actions, but humans still own the policy, the escalation path, and the consequences of being wrong.

A Heuristic for What Should Move

The useful question is not whether a workflow involves a human today. Almost every interesting infrastructure workflow does. The useful question is why the human is there. If the human is gathering context, comparing signals, checking known policy, or routing a routine exception, the workflow is probably a candidate for bounded autonomy. If the human is resolving unclear ownership, interpreting contested policy, accepting irreversible risk, or making a judgment the organization has never written down, the workflow is probably governance in disguise.

This gives teams a practical test. Repetitive, reversible mediation governed by explicit policy is a design gap. The system should probably absorb more of it. Mediation that depends on disputed ownership, ambiguous intent, or irreversible consequences should stay human until the organization is willing to encode the relevant policy and accept the associated risk.

Consider onboarding a new SaaS application into an identity provider. The SAML or OIDC configuration is rarely the hard part anymore. The harder questions are who should receive access, what data the application will contain, whether access requires a managed device, which groups are authoritative, and who owns the integration after launch.

Some of that work is mechanical mediation. If the application has a declared data class, an owning team, and a standard access pattern, the system should be able to apply the right access policy and record the audit trail. A human should not have to reconstruct the same decision from scratch every time.

Some of that work is governance. If the data classification is unclear, the ownership model is disputed, or the request violates normal separation of duties, the system should not improvise an answer. It should escalate because the missing work is not execution or mediation. It is an organizational decision that has not been made explicit enough for software to enforce.

This distinction is bounded autonomy done correctly. It does not remove humans from infrastructure. It stops using humans as glue for decisions that are already governed, already reversible, and already routine.

Prior Art and What Is Actually New

The narrow forms of bounded autonomy named earlier work because their decision boundaries are constrained, their inputs are relatively structured, and their authority is narrow. The newer opportunity is cross-domain mediation.

A device trust signal may matter to an identity decision. An identity decision may matter to a production access workflow. A production access workflow may depend on incident status, service ownership, data classification, current telemetry, and whether the proposed action can be reversed. The hard part is not that any one system lacks automation. The hard part is that the decision requires context from several systems that do not naturally reason together.

This is where LLMs and structured context interfaces become relevant. Structured context interfaces, tool APIs, retrieval systems, policy APIs, telemetry stores, and audit systems provide the substrate for a reasoning layer that can inspect context across domains. The model is not the system of record, and it should not be the source of policy. It is a mediator over context exposed by systems that remain authoritative for their own domains.

The durable architecture is not a chatbot with admin privileges. It is a constrained reasoning layer operating over explicit tools, typed context, policy boundaries, and escalation paths. The model may interpret and propose, but the surrounding system must define what can be done, under what conditions, with what rollback path, and with whose authority.

The important shift is factoring mediation out of governance. Humans held both roles because we had no better option. Now some of the mediation can move into software, which leaves the harder organizational work exposed instead of buried inside queues.

A Concrete Example: Temporary Production Access

The same distinction becomes clearer in a temporary production access workflow. Most organizations already have pieces of this automated: identity can grant access, privileged access can time-limit it, audit logging can record the grant, and device or session systems can evaluate their own signals.

The mediation problem begins before any of that execution happens. The system still has to decide whether the request belongs inside the approved boundary.

The engineer may be on a managed device, or they may be on a replacement machine because their primary device failed during the escalation. The affected service may contain customer data, or it may operate in a lower-sensitivity environment. The request may come from the service owner, or from someone outside the normal support boundary. The incident may justify broader access, or it may be exactly the moment when access should become stricter.

A human reviewer currently handles that mediation. They look at the request, the incident, the user, the service, the policy, and whatever organizational context they happen to know. They may ask questions in Slack, check ownership metadata, inspect recent telemetry, review access history, and decide whether the exception is reasonable.

A bounded autonomous system should not simply approve all incident-related access. It should evaluate the request against explicit policy boundaries and known context. It might grant temporary access automatically when the requester belongs to the owning group, the device is trusted, the service is below a defined sensitivity threshold, and the incident is active. It should escalate when ownership is missing, the device is unmanaged, or the request conflicts with separation-of-duty rules.

The system is not replacing governance. It is translating policy into action for cases where the organization has already defined the relevant boundaries. Human review remains necessary when the policy is ambiguous, when the consequences are difficult to reverse, or when the request exposes a conflict the system is not authorized to resolve.

The failure cases are where this becomes real. If the device trust signal lags by thirty minutes, the system may grant access under a managed-device assumption that is no longer true. If the access window is correctly limited to thirty minutes but the session revocation system is degraded because it shares infrastructure with the incident itself, the nominal rollback path may not actually exist. In both cases, execution may look correct while mediation has failed.

Those are not exotic edge cases. They are exactly the kinds of failures infrastructure teams see when systems are connected through assumptions rather than explicit contracts. Bounded autonomy is credible only when those assumptions are visible, tested, audited, and escalated when they stop holding.

Accountability, Audit, and Decision Boundaries

Bounded autonomy is only credible if the organization can explain what happened after the fact. Infrastructure systems do not become safer because a model can produce a plausible rationale. They become safer when decisions are attributable, reconstructable, reviewable, and governed by policy that humans actually own.

When a bounded autonomous system makes a decision, the audit trail needs to capture more than the final action. It should record the request, the actor, the policy used, the context consulted, the decision path, the action taken, the rollback path, and any escalation conditions that did or did not trigger. A security investigator should not have to reverse-engineer a model interaction from logs scattered across five systems.

Attribution also matters. If a system grants access incorrectly, “the AI did it” is not an accountability model. The owning team must be clear, the policy authority must be clear, and the system’s scope of authority must be explicit. Bounded autonomy should make accountability more legible, not less.

This changes what humans approve. In a traditional review model, humans approve individual changes. In a bounded autonomy model, humans approve decision boundaries: classes of decisions the system is allowed to make under defined conditions. The boundary specifies the policy, scope of authority, required context, allowed actions, escalation triggers, audit requirements, and ownership.

A temporary production access decision boundary might look like this:

Policy: Grant up to thirty minutes of scoped production access during an active incident when predefined conditions hold.
Authority: The production access platform may act only on services below a defined sensitivity threshold.
Required context: The incident is active, the requester belongs to the service-owning group, the device trust signal is current, ownership metadata is present, and the requester has no separation-of-duty conflict with the current responder.
Allowed actions: Grant access to one named resource, limit the duration, record the session, and attach the decision to the incident record.
Reversibility: The privileged access system must be able to revoke the grant. An unhealthy or unavailable revocation path causes an escalation.
Escalation: Escalate stale device signals, missing ownership, sensitive services, requests longer than thirty minutes, unmanaged devices, separation-of-duty conflicts, and conflicting policies.
Audit: Log the request, consulted context, decision, action, revocation, and the version of the boundary that authorized them.
Ownership: The production access platform team owns the system; security engineering owns the policy.

This is why decision boundary authorship matters. The boundary is not a prompt, a workflow template, or a convenience wrapper around an approval queue. It is operating authority encoded in software.

A poorly written boundary can authorize many bad decisions at once. A stale data classification can cause the system to apply the wrong control repeatedly. A broken ownership signal can route exceptions incorrectly across an entire service category. The blast radius of bad mediation can be larger than the blast radius of a bad individual approval.

This is why decision boundaries need review with the same seriousness organizations apply to infrastructure changes, security controls, and production launch criteria. They should have owners, version history, test cases, rollback plans, and periodic review.

Auditability and forensics cannot be bolted on later. They are architectural requirements. If a bounded autonomous system cannot explain why it acted, which policy authorized the action, which context was used, and how a human can contest or correct the decision, it should not be trusted with meaningful authority.

Failure Modes and Decision Quality

Traditional automation failures are usually loud. A deployment fails. Latency rises. Error rates spike. A controller converges toward the wrong state. A script applies the wrong configuration. These failures can be serious, but they often have visible symptoms tied to a concrete action.

Bounded autonomous systems introduce quieter failures. The system may reason incorrectly from plausible context. It may treat stale telemetry as current, misread missing ownership data, resolve a policy conflict incorrectly, or over-trust historical approvals.

The dangerous failures may not page anyone. An exception policy that is slightly too permissive can run for months. A device trust override that was meant for rare incidents can become normalized through repetition. A remediation workflow can roll back safely from a technical perspective while violating a business constraint the system never understood. By the time the failure is obvious, the system may have made the same wrong decision thousands of times.

Bounded autonomous systems therefore need telemetry about decision quality as well as availability. A healthy service can still make poor decisions. Teams need evidence that the system reaches defensible conclusions, escalates the right cases, and stays inside its intended authority.

That requires new evaluation practices. Organizations should sample decisions for human review, test policy boundaries adversarially, compare system decisions against later reassessment, and watch for drift in escalation and exception patterns. A bounded autonomous system that is never reviewed is not bounded in any meaningful operational sense. It is operating outside meaningful governance.

This is still an open problem. SLOs and error budgets work well when failure is observable through service behavior. Execution often has that property. Mediation quality is harder. The meaningful errors are often quiet, delayed, and discovered through audit, investigation, or hindsight. Any organization adopting bounded autonomy needs to invest explicitly in decision-quality evaluation rather than pretending availability metrics are enough.

This loops back to the heuristic from earlier: bounded autonomy is most defensible when the decision is reversible, well-scoped, well-governed, and supported by trustworthy context. It is weakest when the cost of being wrong is paid in trust rather than system state. The evaluation problem matters because those failures may not announce themselves as outages. They may quietly become normal.

Implications for Infrastructure Teams

If bounded autonomy becomes part of infrastructure architecture, the role of infrastructure teams changes. The work does not disappear. It moves from performing and approving individual actions toward designing the policies, decision boundaries, context pipelines, and escalation paths that make bounded decisions safe.

The on-call model changes as well. A bounded autonomous system becomes a tier-one dependency for decision-making. Its failure plan must say whether requests pause, return to human review, or continue in a narrower execution-only mode. Incorrect decisions are harder to detect because normal health checks may remain green.

That means the operating model has to cover both availability and reasoning correctness. The people responding to those failures may not be the same people. A serving outage is an infrastructure problem. A pattern of bad approvals may be a policy problem, a telemetry problem, a data quality problem, or an ownership problem. Treating all of those as ordinary service incidents will miss the point.

There is also a training cost, and it is uncomfortable. Many engineers learned judgment by doing the rote work: handling access requests, debugging policy exceptions, approving small changes, investigating drift, and chasing context across systems. If every low-risk request and routine exception moves into software, teams may remove the path by which junior engineers learn how the organization actually works.

That does not mean toil should be preserved for educational value. It does mean teams need a replacement learning path. Sampling decisions for review, rotating engineers through decision-boundary design, and studying bounded-autonomy failures may become part of how infrastructure judgment is built. Otherwise organizations risk automating away the apprenticeship layer and then wondering why fewer engineers understand the system as a whole.

The larger implication is organizational. Bounded autonomy forces teams to write down who owns decisions, which systems are authoritative, which actions are reversible, and when software must stop and ask for help. The model only works where the organization is willing to make implicit rules explicit enough for software to enforce and explicit enough for humans to challenge.

Conclusion

Humans became the integration layer because infrastructure systems could execute actions but struggled to interpret context across domains. People resolved the ambiguity and carried the accountability. Both remain necessary when the organization has yet to make a policy decision.

The opportunity is to move routine mediation into systems without hiding the governance behind it. The architectural boundary should show where context gathering ends, where software may decide, and where human authority resumes.

Start with one workflow. Write down its decision boundary and authoritative context sources. Specify the actions the system may take, how it can reverse them, when it must escalate, and what the audit record must contain. Then examine the current reviewer’s role. A governance decision belongs with a person; manual work that applies settled policy is a candidate for automation.

Separating those cases is the next useful step for infrastructure teams. It leaves humans responsible for judgment while removing their role as the default adapter between systems.

Appendix: Capability Model

Bounded autonomy should not be evaluated as a single organizational maturity level. The same organization may be highly autonomous for rollback decisions, partially autonomous for access exceptions, and entirely manual for irreversible security-sensitive changes.

A more useful capability model evaluates each decision class along several dimensions:

Context quality: Does the system have timely, authoritative context?
Policy clarity: Are the rules explicit enough for software to apply?
Reversibility: Can the action be undone quickly and reliably?
Blast radius: How broad is the impact if the decision is wrong?
Auditability: Can the decision be reconstructed after the fact?
Escalation: Does the system know when to stop and ask for help?
Governance ownership: Is it clear who owns the policy and the risk?

The dimensions are how a team decides which capability level is honest for a given decision class. Strong context, clear policy, reliable reversibility, narrow blast radius, mature audit, working escalation, and clear ownership make bounded autonomy defensible. Weakness across those dimensions should push the decision class toward observe, advise, or explicit human approval.

From there, decision classes can be described in terms of capability:

Observe: The system gathers and presents context, but humans interpret and decide.
Advise: The system recommends an action with supporting context, but humans approve.
Execute with approval: The system prepares a bounded action, but humans authorize execution.
Bounded autonomy: The system executes within an approved decision boundary and escalates exceptions.
Policy-adaptive autonomy: The system identifies patterns in decisions and proposes changes to decision boundaries, but humans still approve policy evolution.

This model is deliberately scoped to decision classes rather than organizations. A team might operate at bounded autonomy for temporary access grants, advise-only for production policy exceptions, and observe-only for irreversible security actions.

Introduction#

Execution, Mediation, and Governance#

A Heuristic for What Should Move#

Prior Art and What Is Actually New#

A Concrete Example: Temporary Production Access#

Accountability, Audit, and Decision Boundaries#

Failure Modes and Decision Quality#

Implications for Infrastructure Teams#

Conclusion#

Appendix: Capability Model#