fix(execution): run pptx/docx/pdf generation inside isolated-vm sandbox#4217
fix(execution): run pptx/docx/pdf generation inside isolated-vm sandbox#4217icecrasher321 merged 8 commits intostagingfrom
Conversation
Retires the legacy doc-worker.cjs / pptx-worker.cjs pipeline that ran user
DSL via node:vm + full require() in the same UID/PID namespace as the main
Next.js process. User code now runs inside the existing isolated-vm pool
(V8 isolate, no process / require / fs, no /proc/1/environ reachability).
Introduces a first-class SandboxTask abstraction under apps/sim/sandbox-tasks/
that mirrors apps/sim/background/ — one file per task, central typed
registry, kebab-case ids. Adding a new thing that runs in the isolate is
one file plus one registry entry.
Runtime additions in lib/execution/:
- task-mode execution in isolated-vm-worker.cjs: load pre-built library
bundles, run task bootstrap, run user code, run finalize, transfer
Uint8Array result as base64 via IPC
- named broker IPC bridge (generalizes the existing fetch bridge) with
args size, result size, and per-execution call caps
- cooperative AbortSignal support: cancel IPC disposes the isolate, pool
slot is freed, pending broker-call timers are swept
- compiled scripts + references explicitly released per execution
- isolate.isDisposed used for cancellation detection (no error-string
substring matching)
Library bundles (pptxgenjs, docx, pdf-lib) are built into isolate-safe
IIFE bundles by apps/sim/lib/execution/sandbox/bundles/build.ts and
committed; next.config.ts / trigger.config.ts / Dockerfile updated to
ship them instead of the deleted dist/*-worker.cjs artifacts.
Call sites migrated:
- app/api/workspaces/[id]/pptx/preview/route.ts
- app/api/files/serve/[...path]/route.ts (+ test mock)
- lib/copilot/tools/server/files/{workspace-file,edit-content}.ts
All pass owner key user:<userId> for per-user pool fairness + distributed
lease accounting.
Made-with: Cursor
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
PR SummaryHigh Risk Overview The isolated-vm worker/pool is extended with a task execution mode that loads prebuilt library bundles, installs delegated Legacy Reviewed by Cursor Bugbot for commit 2645afe. Configure here. |
Greptile SummaryThis PR retires the legacy The implementation is architecturally sound and well-layered: Confidence Score: 5/5Safe to merge — significant security improvement with no regressions identified. Both previously flagged cancellation-signal gaps (route.ts and copilot tools) are resolved. All remaining findings are P2: a pre-download size check in the workspace-file broker, and leaving __bundles on globalThis after hardening. Neither is a current defect — they are hardening suggestions. The core isolated-vm sandbox migration is architecturally sound, the cancellation and double-resolve paths are correctly guarded, and the broker call-rate/size limits are in place. apps/sim/lib/execution/sandbox/brokers/workspace-file.ts (pre-download size guard), apps/sim/lib/execution/isolated-vm-worker.cjs (harden __bundles) Important Files Changed
Sequence DiagramsequenceDiagram
participant Client
participant Route as API Route
participant RunTask as runSandboxTask
participant Pool as isolated-vm pool
participant Worker as isolated-vm-worker.cjs
participant Isolate as V8 Isolate
participant Broker as workspaceFileBroker (host)
Client->>Route: GET /api/files/serve/...pptx
Route->>RunTask: runSandboxTask('pptx-generate', {code, workspaceId}, {signal})
RunTask->>Pool: executeInIsolatedVM(req, {brokers, signal})
Pool->>Worker: IPC { type:'execute', task:{bundles, bootstrap, brokers, finalize} }
Worker->>Isolate: load pptxgenjs.cjs bundle
Worker->>Isolate: run task bootstrap (install pptx global, getFileBase64)
Worker->>Isolate: harden (unset __brokerRef, __log, etc.)
Worker->>Isolate: run user code
Isolate->>Worker: IPC { type:'broker', brokerName:'workspaceFile', argsJson }
Worker->>Pool: IPC { type:'broker', ... }
Pool->>Broker: handler(args) scoped to workspaceId
Broker-->>Pool: {dataUri}
Pool-->>Worker: IPC { type:'brokerResponse', resultJson }
Worker-->>Isolate: broker promise resolves
Worker->>Isolate: run finalize → return Uint8Array as base64
Worker->>Pool: IPC { type:'result', bytesBase64 }
Pool-->>RunTask: IsolatedVMExecutionResult
RunTask-->>Route: Buffer
Route-->>Client: 200 application/vnd.openxmlformats...
Reviews (2): Last reviewed commit: "fix lint" | Re-trigger Greptile |
…turation logs Follow-ups on top of the isolated-vm migration (da14027): Timer delegation (laverdet/isolated-vm#136 recommended pattern): - setTimeout / setInterval / clearTimeout / clearImmediate delegate to Node's real timer heap via ivm.Reference. Real delays are honored; clearTimeout actually cancels; ms is clamped to the script timeout so callbacks can't fire after the isolate is disposed. - Per-execution timer tracking + dispose-sweep in finally. Zero stale callbacks post-dispose. - unwrapPrimitive helper normalizes ivm.Reference-wrapped primitives (arguments: { reference: true } applies uniformly to all args). - _polyfills.ts shrinks from ~130 lines to the global->globalThis alias. Timers / TextEncoder / TextDecoder / console all install per-execution from the worker via ivm bridges. AbortSignal race fix (pre-existing bug surfaced by the timer smoke): - Listener is registered after await tryAcquireDistributedLease. If the signal aborted during that ~200ms window (Redis down), AbortSignal doesn't fire listeners registered after the fact — the abort was silently missed. Now re-checks signal.aborted synchronously after addEventListener. Observability: - executeTask returns IsolatedVMTaskTimings (setup, runtimeBootstrap, bundles, brokerInstall, taskBootstrap, harden, userCode, finalize, total) in every success + error path. run-task.ts logs these with workspaceId + queueMs so 'which tenant is slow' is queryable. - Pool saturation events now emit structured logger.warn with reason codes: queue_full_global, queue_full_owner, queue_wait_timeout, distributed_lease_limit. Matches the existing broker reject pattern. Security policy: - New .cursor/rules/sim-sandbox.mdc codifies the hard rules for the worker process: no app credentials, all credentialed work goes through host-side brokers, every broker scopes by workspaceId. Pre-merge checklist for future changes to isolated-vm-worker.cjs. Measured phase breakdown (local smoke, Redis down): pptx wall=~310ms with bundles=~16ms, finalize=~83ms; docx ~290ms / 17ms / 70ms; pdf ~235ms / 17ms / 5ms. Bundle compilation is not the bottleneck — library finalize is. Made-with: Cursor
Three remaining callers of runSandboxTask were not threading a cancellation signal, so a client disconnect mid-compile left the pool slot occupied for the full 60s task timeout. Matching the pattern the pptx-preview route already uses. - apps/sim/app/api/files/serve/[...path]/route.ts — GET forwards `request.signal` into handleLocalFile / handleCloudProxy, which forward into compileDocumentIfNeeded, which forwards into runSandboxTask. - apps/sim/lib/copilot/tools/server/files/workspace-file.ts — passes `context.abortSignal` (transport/user stop) into runSandboxTask. - apps/sim/lib/copilot/tools/server/files/edit-content.ts — same. Smoke: simulated client disconnect at t=1000ms during a task that would otherwise have waited 10s. The pool slot unwinds at t=1002ms with AbortError; previously would have sat 60s until the task-level timeout. Made-with: Cursor
Next.js's type-check worker OOMs at the default 4GB heap on Node 23 for this project's type graph size. Bumps the heap to 8GB only for the `next build` invocation inside `bun run build`. Docker builds are unaffected — `next.config.ts` sets `typescript.ignoreBuildErrors: true` when DOCKER_BUILD=1, which skips the type-check pass entirely. This only fixes local `bun run build`. No functional code changes. Made-with: Cursor
|
bugbot run |
|
@greptile |
Made-with: Cursor # Conflicts: # apps/sim/lib/copilot/tools/server/files/edit-content.ts # apps/sim/lib/copilot/tools/server/files/workspace-file.ts # apps/sim/lib/execution/doc-vm.ts # apps/sim/lib/execution/pptx-vm.ts
The same extension -> { formatName, sourceMime, taskId } mapping was
duplicated in workspace-file.ts and edit-content.ts. Any future format
or task-id change had to happen in two places.
Exports getDocumentFormatInfo + DocumentFormatInfo from workspace-file.ts
(which already owned the PPTX/DOCX/PDF source MIME constants) and
imports it in edit-content.ts. Same source-of-truth pattern the file
already uses for inferContentType.
Made-with: Cursor
|
bugbot run |
Both bridges in the isolate used truthiness to detect host-side errors:
if (response.error) throw new Error(response.error); // broker
if (result.error) throw new Error(result.error); // fetch
If a host handler ever threw `new Error('')`, err.message would be ''
(falsy), so { error: '' } was silently swallowed and the isolate saw
a successful null result. Existing call sites don't throw empty-message
errors, but the pattern was structurally unsafe.
Switch both to typeof check === 'string' and fall back to a default
message if the string is empty, so all host-reported errors propagate
into the isolate regardless of message content.
Made-with: Cursor
|
bugbot run |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 2645afe. Configure here.
Summary
Retires the legacy doc-worker.cjs / pptx-worker.cjs pipeline that ran user DSL via node:vm + full require() in the same UID/PID namespace as the main Next.js process. User code now runs inside the existing isolated-vm pool (V8 isolate, no process / require / fs, no /proc/1/environ reachability).
Type of Change
Testing
Tested manually
Checklist