Large language models are only half of an agent. The other half is everything the model is allowed to do: run code, parse files, call internal APIs, query databases, and render charts. That power is exactly why execution isolation stopped being optional the moment agents went from prototypes to products.
E2B is one of the best-known cloud sandboxes purpose-built for AI use cases: ephemeral environments where untrusted or semi-trusted code can run with controlled filesystems, packages, and network behavior—then disappear.
This post explains why sandboxes exist, where they sit in an agent stack, and the patterns that keep you out of trouble when your company AI OS connects chat, workflows, and real business systems.
The core risk: the model is not your employee
Even with careful prompting, agents eventually produce tool calls you did not anticipate—wrong arguments, pathological loops, or instructions shaped by user input. If those calls execute on a shared worker with broad credentials, you are one prompt injection away from a lateral movement story.
Sandboxing is not paranoia; it is blast-radius control:
- Filesystem: read/write only inside a disposable root; no access to host keys or monorepo checkouts.
- Network: default deny, allowlist egress to specific domains or none at all.
- Time and memory: hard caps so runaway loops cannot melt shared infrastructure.
- Lifecycle: environments are born for one task and terminated when done.
Where E2B fits in the architecture
At a high level, the agent runtime looks like this:
- Orchestrator (your backend) receives a user message, loads policy, and calls the model.
- The model returns structured tool calls (JSON) or code to execute.
- Instead of running that on the orchestrator host, you spawn a sandbox with a known image—Python + scientific stack, Node + PDF tooling, or a minimal “exec only” image.
- The sandbox runs the step, streams stdout / artifacts back, and returns a compact result to the model.
- The orchestrator logs, bills, and audits the run, then destroys the sandbox.
That separation is what lets you say “yes” to powerful tools without saying “yes” to shared process memory with your production database client.
What teams use sandboxes for in practice
Common high-value patterns:
- Code interpreter experiences—analysis, charts, CSV crunching—with no persistent server state.
- Document ingestion pipelines where parsers and converters are heavy and occasionally hostile (macros, malformed PDFs).
- CI-adjacent agents that propose diffs or run linters inside a hermetic tree.
- Customer-specific extensions where each tenant gets different packages but the same orchestration contract.
The through-line is ephemeral compute with a strict contract: inputs, outputs, timeouts, and allowed side effects.
Security: what a sandbox does and does not promise
Sandboxes reduce risk dramatically, but they are not a substitute for authorization:
- Treat sandbox egress like firewall policy—explicit allowlists beat “open internet minus blocklist.”
- Never mount production secrets into a sandbox that can be influenced by end-user text without a second human or system gate.
- Separate “run user code” from “call internal APIs”—if the model must call your CRM, proxy those calls through your orchestrator with normal OAuth and scoped tokens, not magic credentials inside the sandbox.
For regulated environments, pair technical isolation with run traces and approval records in the dashboard. Formal compliance audit-log exports are available on the enterprise roadmap.
UX: hiding latency without lying
Sandboxes add startup latency. Product teams win when they:
- Prewarm a small pool during business hours if traffic is predictable.
- Stream progress (“Starting secure environment…”) instead of a silent spinner.
- Batch independent tool steps when the user experience allows parallelism.
Users tolerate a few seconds if the narrative is clear; they do not tolerate uncertainty.
E2B vs Docker vs “just run it on the worker”
Rough decision guide:
- On-worker exec — only for fully trusted code paths with static inputs; fastest, highest risk.
- Containers on your own orchestrator — maximum control, maximum operational load.
- Managed sandboxes (E2B-style) — strong default when your core competency is agents, not kernel hardening.
You may end up with more than one tier: strict sandboxes for arbitrary code, hardened internal runners for known-safe templates.
Closing loop with serverless GPU
Sandboxes solve isolation; they do not automatically solve inference economics. Teams often combine managed sandboxes for tool execution with serverless or reserved GPU for model inference—different layers, different scaling curves.
If you have not read it yet, start with serverless GPU for model serving for the inference side of the same platform story.
Docs this pairs with
- Tool calling & agentic execution — execution stack in the product.
- MCP overview — connecting external tools safely.
- Workflows and sandbox code examples — Code steps in visual automations.
Building agents across support, sales, and ops? Our use-case overview ties channels, knowledge, and workflows into one mental model.