Skip to content

fix(execution): run pptx/docx/pdf generation inside isolated-vm sandbox#4217

Merged
icecrasher321 merged 8 commits intostagingfrom
fix/urgent-billing-isolated
Apr 18, 2026
Merged

fix(execution): run pptx/docx/pdf generation inside isolated-vm sandbox#4217
icecrasher321 merged 8 commits intostagingfrom
fix/urgent-billing-isolated

Conversation

@icecrasher321
Copy link
Copy Markdown
Collaborator

Summary

Retires the legacy doc-worker.cjs / pptx-worker.cjs pipeline that ran user DSL via node:vm + full require() in the same UID/PID namespace as the main Next.js process. User code now runs inside the existing isolated-vm pool (V8 isolate, no process / require / fs, no /proc/1/environ reachability).

Type of Change

  • Other: Code Reorg / Security

Testing

Tested manually

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

Retires the legacy doc-worker.cjs / pptx-worker.cjs pipeline that ran user
DSL via node:vm + full require() in the same UID/PID namespace as the main
Next.js process. User code now runs inside the existing isolated-vm pool
(V8 isolate, no process / require / fs, no /proc/1/environ reachability).

Introduces a first-class SandboxTask abstraction under apps/sim/sandbox-tasks/
that mirrors apps/sim/background/ — one file per task, central typed
registry, kebab-case ids. Adding a new thing that runs in the isolate is
one file plus one registry entry.

Runtime additions in lib/execution/:
 - task-mode execution in isolated-vm-worker.cjs: load pre-built library
   bundles, run task bootstrap, run user code, run finalize, transfer
   Uint8Array result as base64 via IPC
 - named broker IPC bridge (generalizes the existing fetch bridge) with
   args size, result size, and per-execution call caps
 - cooperative AbortSignal support: cancel IPC disposes the isolate, pool
   slot is freed, pending broker-call timers are swept
 - compiled scripts + references explicitly released per execution
 - isolate.isDisposed used for cancellation detection (no error-string
   substring matching)

Library bundles (pptxgenjs, docx, pdf-lib) are built into isolate-safe
IIFE bundles by apps/sim/lib/execution/sandbox/bundles/build.ts and
committed; next.config.ts / trigger.config.ts / Dockerfile updated to
ship them instead of the deleted dist/*-worker.cjs artifacts.

Call sites migrated:
 - app/api/workspaces/[id]/pptx/preview/route.ts
 - app/api/files/serve/[...path]/route.ts (+ test mock)
 - lib/copilot/tools/server/files/{workspace-file,edit-content}.ts

All pass owner key user:<userId> for per-user pool fairness + distributed
lease accounting.

Made-with: Cursor
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Apr 18, 2026 1:44am

Request Review

@cursor
Copy link
Copy Markdown

cursor bot commented Apr 17, 2026

PR Summary

High Risk
Touches the untrusted-code execution boundary by replacing the legacy subprocess vm document generators with a new isolated-vm task mode, adding broker IPC, cancellation, and size/rate limits; regressions could impact document generation reliability or sandbox escape surface if mis-scoped.

Overview
Routes and copilot tools that previously generated pptx/docx/pdf via the legacy doc-vm/*-worker.cjs subprocess pipeline now call runSandboxTask (pptx-generate, docx-generate, pdf-generate) and propagate AbortSignal/ownerKey for fair scheduling + cancellation.

The isolated-vm worker/pool is extended with a task execution mode that loads prebuilt library bundles, installs delegated console/TextEncoder/timers, hardens globals before user code runs, and returns finalize bytes as base64; it also adds host-side broker IPC (with per-execution broker registration, JSON size limits, and per-execution call caps) plus explicit cancel handling.

Legacy doc-vm.ts, doc-worker.cjs, pptx-vm.ts, and pptx-worker.cjs are removed, and a first host broker workspaceFileBroker is added for isolate-access to workspace files. New env knobs document broker payload/call limits, and a .cursor rule file codifies worker security policy to prevent credential leakage into the worker process.

Reviewed by Cursor Bugbot for commit 2645afe. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 17, 2026

Greptile Summary

This PR retires the legacy doc-worker.cjs / pptx-worker.cjs pipeline (node:vm + full require() in the same UID/PID namespace as Next.js) and routes all document-generation (PPTX, DOCX, PDF) through the existing isolated-vm pool. User code now runs inside a V8 isolate with no process, require, or fs access; workspace-file retrieval is brokered through a narrow host-side IPC bridge scoped to the calling workspace.

The implementation is architecturally sound and well-layered: sandbox-tasks/ defines recipes, lib/execution/sandbox/ holds the runner + broker + pre-built library bundles, and isolated-vm.ts / isolated-vm-worker.cjs are extended with task-mode execution and a cancellation signal path.

Confidence Score: 5/5

Safe to merge — significant security improvement with no regressions identified.

Both previously flagged cancellation-signal gaps (route.ts and copilot tools) are resolved. All remaining findings are P2: a pre-download size check in the workspace-file broker, and leaving __bundles on globalThis after hardening. Neither is a current defect — they are hardening suggestions. The core isolated-vm sandbox migration is architecturally sound, the cancellation and double-resolve paths are correctly guarded, and the broker call-rate/size limits are in place.

apps/sim/lib/execution/sandbox/brokers/workspace-file.ts (pre-download size guard), apps/sim/lib/execution/isolated-vm-worker.cjs (harden __bundles)

Important Files Changed

Filename Overview
apps/sim/lib/execution/isolated-vm-worker.cjs Core worker extended with executeTask, broker IPC, cancel via isolate.dispose(), and timer delegation to Node's timer heap. Well-structured and defensively written; bundle source caching and activeIsolates map are correct. Minor: __bundles global remains accessible to user code after hardening.
apps/sim/lib/execution/isolated-vm.ts Host-side pool extended with AbortSignal cancellation, broker handler dispatch, and task-mode request types. Cancel path is correctly guarded against double-resolve with resolved flag and pending.cancelled state. Saturation logging added.
apps/sim/lib/execution/sandbox/brokers/workspace-file.ts Host-side workspace file broker that resolves fileId → base64 data URI, correctly scoped to workspaceId. File is downloaded into memory before the broker-result size limit is checked; large files incur unnecessary memory pressure before being rejected.
apps/sim/lib/execution/sandbox/run-task.ts Clean task runner that maps broker definitions to IsolatedVMBrokerHandler, threads AbortSignal and ownerKey, and logs phase timings. Correct error propagation and bytes validation.
apps/sim/sandbox-tasks/docx-generate.ts DOCX task with manual base64 decoder in finalize (correct but complex). The decoder handles all valid base64 lengths correctly. Bootstrap properly exposes addSection/getFileBase64 helpers.
apps/sim/lib/execution/sandbox/bundles/build.ts Build script that bundles pptxgenjs/docx/pdf-lib as IIFE browser targets for isolate consumption. Correct polyfill ordering, proper error handling, temp entry files cleaned up.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Route as API Route
    participant RunTask as runSandboxTask
    participant Pool as isolated-vm pool
    participant Worker as isolated-vm-worker.cjs
    participant Isolate as V8 Isolate
    participant Broker as workspaceFileBroker (host)

    Client->>Route: GET /api/files/serve/...pptx
    Route->>RunTask: runSandboxTask('pptx-generate', {code, workspaceId}, {signal})
    RunTask->>Pool: executeInIsolatedVM(req, {brokers, signal})
    Pool->>Worker: IPC { type:'execute', task:{bundles, bootstrap, brokers, finalize} }
    Worker->>Isolate: load pptxgenjs.cjs bundle
    Worker->>Isolate: run task bootstrap (install pptx global, getFileBase64)
    Worker->>Isolate: harden (unset __brokerRef, __log, etc.)
    Worker->>Isolate: run user code
    Isolate->>Worker: IPC { type:'broker', brokerName:'workspaceFile', argsJson }
    Worker->>Pool: IPC { type:'broker', ... }
    Pool->>Broker: handler(args) scoped to workspaceId
    Broker-->>Pool: {dataUri}
    Pool-->>Worker: IPC { type:'brokerResponse', resultJson }
    Worker-->>Isolate: broker promise resolves
    Worker->>Isolate: run finalize → return Uint8Array as base64
    Worker->>Pool: IPC { type:'result', bytesBase64 }
    Pool-->>RunTask: IsolatedVMExecutionResult
    RunTask-->>Route: Buffer
    Route-->>Client: 200 application/vnd.openxmlformats...
Loading

Reviews (2): Last reviewed commit: "fix lint" | Re-trigger Greptile

Comment thread apps/sim/app/api/files/serve/[...path]/route.ts Outdated
Comment thread apps/sim/lib/copilot/tools/server/files/edit-content.ts
…turation logs

Follow-ups on top of the isolated-vm migration (da14027):

Timer delegation (laverdet/isolated-vm#136 recommended pattern):
 - setTimeout / setInterval / clearTimeout / clearImmediate delegate to
   Node's real timer heap via ivm.Reference. Real delays are honored;
   clearTimeout actually cancels; ms is clamped to the script timeout
   so callbacks can't fire after the isolate is disposed.
 - Per-execution timer tracking + dispose-sweep in finally. Zero stale
   callbacks post-dispose.
 - unwrapPrimitive helper normalizes ivm.Reference-wrapped primitives
   (arguments: { reference: true } applies uniformly to all args).
 - _polyfills.ts shrinks from ~130 lines to the global->globalThis alias.
   Timers / TextEncoder / TextDecoder / console all install per-execution
   from the worker via ivm bridges.

AbortSignal race fix (pre-existing bug surfaced by the timer smoke):
 - Listener is registered after await tryAcquireDistributedLease. If the
   signal aborted during that ~200ms window (Redis down), AbortSignal
   doesn't fire listeners registered after the fact — the abort was
   silently missed. Now re-checks signal.aborted synchronously after
   addEventListener.

Observability:
 - executeTask returns IsolatedVMTaskTimings (setup, runtimeBootstrap,
   bundles, brokerInstall, taskBootstrap, harden, userCode, finalize,
   total) in every success + error path. run-task.ts logs these with
   workspaceId + queueMs so 'which tenant is slow' is queryable.
 - Pool saturation events now emit structured logger.warn with reason
   codes: queue_full_global, queue_full_owner, queue_wait_timeout,
   distributed_lease_limit. Matches the existing broker reject pattern.

Security policy:
 - New .cursor/rules/sim-sandbox.mdc codifies the hard rules for the
   worker process: no app credentials, all credentialed work goes
   through host-side brokers, every broker scopes by workspaceId.
   Pre-merge checklist for future changes to isolated-vm-worker.cjs.

Measured phase breakdown (local smoke, Redis down): pptx wall=~310ms
with bundles=~16ms, finalize=~83ms; docx ~290ms / 17ms / 70ms; pdf
~235ms / 17ms / 5ms. Bundle compilation is not the bottleneck —
library finalize is.

Made-with: Cursor
Three remaining callers of runSandboxTask were not threading a
cancellation signal, so a client disconnect mid-compile left the pool
slot occupied for the full 60s task timeout. Matching the pattern the
pptx-preview route already uses.

 - apps/sim/app/api/files/serve/[...path]/route.ts — GET forwards
   `request.signal` into handleLocalFile / handleCloudProxy, which
   forward into compileDocumentIfNeeded, which forwards into
   runSandboxTask.
 - apps/sim/lib/copilot/tools/server/files/workspace-file.ts — passes
   `context.abortSignal` (transport/user stop) into runSandboxTask.
 - apps/sim/lib/copilot/tools/server/files/edit-content.ts — same.

Smoke: simulated client disconnect at t=1000ms during a task that would
otherwise have waited 10s. The pool slot unwinds at t=1002ms with
AbortError; previously would have sat 60s until the task-level timeout.

Made-with: Cursor
Next.js's type-check worker OOMs at the default 4GB heap on Node 23 for
this project's type graph size. Bumps the heap to 8GB only for the
`next build` invocation inside `bun run build`.

Docker builds are unaffected — `next.config.ts` sets
`typescript.ignoreBuildErrors: true` when DOCKER_BUILD=1, which skips
the type-check pass entirely. This only fixes local `bun run build`.

No functional code changes.

Made-with: Cursor
@icecrasher321
Copy link
Copy Markdown
Collaborator Author

bugbot run

@icecrasher321
Copy link
Copy Markdown
Collaborator Author

@greptile

@icecrasher321 icecrasher321 changed the base branch from main to staging April 18, 2026 00:56
Comment thread apps/sim/lib/copilot/tools/server/files/edit-content.ts
Made-with: Cursor

# Conflicts:
#	apps/sim/lib/copilot/tools/server/files/edit-content.ts
#	apps/sim/lib/copilot/tools/server/files/workspace-file.ts
#	apps/sim/lib/execution/doc-vm.ts
#	apps/sim/lib/execution/pptx-vm.ts
The same extension -> { formatName, sourceMime, taskId } mapping was
duplicated in workspace-file.ts and edit-content.ts. Any future format
or task-id change had to happen in two places.

Exports getDocumentFormatInfo + DocumentFormatInfo from workspace-file.ts
(which already owned the PPTX/DOCX/PDF source MIME constants) and
imports it in edit-content.ts. Same source-of-truth pattern the file
already uses for inferContentType.

Made-with: Cursor
@icecrasher321
Copy link
Copy Markdown
Collaborator Author

bugbot run

Comment thread apps/sim/lib/execution/isolated-vm-worker.cjs Outdated
Both bridges in the isolate used truthiness to detect host-side errors:

    if (response.error) throw new Error(response.error);   // broker
    if (result.error)   throw new Error(result.error);     // fetch

If a host handler ever threw `new Error('')`, err.message would be ''
(falsy), so { error: '' } was silently swallowed and the isolate saw
a successful null result. Existing call sites don't throw empty-message
errors, but the pattern was structurally unsafe.

Switch both to typeof check === 'string' and fall back to a default
message if the string is empty, so all host-reported errors propagate
into the isolate regardless of message content.

Made-with: Cursor
@icecrasher321
Copy link
Copy Markdown
Collaborator Author

bugbot run

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 2645afe. Configure here.

@icecrasher321 icecrasher321 merged commit 2f93205 into staging Apr 18, 2026
14 checks passed
@waleedlatif1 waleedlatif1 deleted the fix/urgent-billing-isolated branch April 18, 2026 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant