feat(jobs): Add data retention jobs by TheodoreSpeaks · Pull Request #4128 · simstudioai/sim

TheodoreSpeaks · 2026-04-13T18:00:59Z

Summary

Add data retention jobs. 3 jobs created:

Clean up soft deleted resources (7 days free, 30 days paid, customizable enterprise)
Log retention cleanup (7 days free, infinite paid, customizable enterprise)
Task cleanup (7 days free, infinite paid, customizable enterprise)

Type of Change

Testing

Tested locally. Validated that data is deleted from sim and copilot dbs. Validated that s3 buckets clean up data as well.

Checklist

Code follows project style guidelines
Self-reviewed my changes
Tests added/updated and passing
No new warnings introduced
I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

Screenshots/Videos

vercel · 2026-04-13T18:01:06Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
docs	Skipped		Apr 18, 2026 10:29pm

waleedlatif1 · 2026-04-13T18:26:10Z

@TheodoreSpeaks let's consolidate the migrations into a single one, just delete the existing ones and run it once over all the changes in shcema.ts

TheodoreSpeaks · 2026-04-18T02:51:24Z

@BugBot review

cursor · 2026-04-18T02:51:33Z

PR Summary

High Risk
High risk because it introduces new automated deletion jobs across multiple tables and storage backends, plus new workspace retention columns and dispatch logic that could delete data incorrectly if scopes/retention are misconfigured.

Overview
Introduces a data retention system backed by new workspace columns (logRetentionHours, softDeleteRetentionHours, taskCleanupHours) and a central CLEANUP_CONFIG + dispatcher that enqueues plan-scoped jobs and per-workspace enterprise jobs (Trigger.dev batch trigger when available, otherwise the internal job queue with optional inline execution).

Replaces the previous monolithic /api/logs/cleanup behavior with a lightweight cron-triggered dispatcher, and adds new cron endpoints for cleanup-soft-deletes and cleanup-tasks.

Adds three new background cleanup tasks (cleanup-logs, cleanup-soft-deletes, cleanup-tasks) that batch-delete expired DB rows and also remove associated cloud storage objects and Copilot backend chat artifacts, plus shared helpers (batchDeleteByWorkspaceAndTimestamp, prepareChatCleanup).

Adds an Enterprise settings surface for configuring retention: a new /api/workspaces/[id]/data-retention GET/PUT (admin + enterprise-gated, audited) and a new Settings tab/UI (Data Retention) with a locked read-only view for non-enterprise plans.

^{Reviewed by Cursor Bugbot for commit 4d76165. Bugbot is set up for automated code reviews on this repo. Configure here.}

TheodoreSpeaks · 2026-04-18T16:09:43Z

@BugBot review

greptile-apps · 2026-04-18T19:40:49Z

Greptile Summary

Adds three data retention background jobs (soft-delete cleanup, log cleanup, task/chat cleanup) dispatched via Trigger.dev or an inline fallback, with an enterprise-gated UI and API for per-workspace configuration. The migration replaces full soft-delete indexes with partial indexes and adds three retention columns to workspace.

P1 — S3 objects orphaned for workspace_file rows: cleanupWorkspaceFileStorage only queries workspaceFiles (plural) for S3 cleanup, but CLEANUP_TARGETS also includes workspaceFile (singular), which has its own key column. When cleanupTable hard-deletes those rows the corresponding S3 objects are never removed.
P2 — task cleanup defaults mismatch: The PR description states "7 days free" for task cleanup, but getRetentionDefaultHours returns null for both free and paid tiers (confirmed by the in-code comment "No task cleanup"). Clarify whether this is intentional or a description oversight.

Confidence Score: 3/5

Not safe to merge as-is: the S3 cleanup gap will permanently orphan workspace_file objects in object storage on every cleanup run.

One confirmed P1 data-integrity bug (workspace_file S3 objects never deleted) that will silently accumulate orphaned cloud storage objects on each cron execution. Everything else — batching logic, auth, migration, Trigger.dev wiring, enterprise UI — is well-structured.

apps/sim/background/cleanup-soft-deletes.ts — cleanupWorkspaceFileStorage must also cover the workspaceFile (singular) table

Important Files Changed

Filename	Overview
apps/sim/background/cleanup-soft-deletes.ts	Batched soft-delete cleanup with partial-index support; P1 bug: workspaceFile S3 objects are never deleted when DB rows are purged
apps/sim/background/cleanup-tasks.ts	Task/chat/run cleanup with correct deletion ordering; duplicate copilotChats query for feedback deletion (addressed in prior thread)
apps/sim/background/cleanup-logs.ts	Batched execution log cleanup with S3 file handling; dynamic import inside loop (addressed in prior thread)
apps/sim/lib/billing/cleanup-dispatcher.ts	Dispatches free/paid/enterprise cleanup jobs; taskCleanupHours returns null for free/paid despite PR description claiming 7-day free default
apps/sim/lib/cleanup/chat-cleanup.ts	Collects file refs from workspaceFiles and JSONB messages, calls copilot backend, deletes S3 files with correct context per file
apps/sim/app/api/workspaces/[id]/data-retention/route.ts	GET/PUT API for data retention config; properly gates writes to enterprise plan with admin permission check and audit logging
packages/db/migrations/0193_unknown_franklin_richards.sql	Adds log/soft-delete/task retention columns to workspace; replaces full indexes with partial indexes on deleted_at/archived_at for query efficiency
apps/sim/ee/data-retention/components/data-retention-settings.tsx	Enterprise-gated UI for configuring retention periods; correctly renders locked vs. editable views based on plan
apps/sim/ee/data-retention/hooks/data-retention.ts	React Query hooks for data retention with correct staleTime, signal forwarding, and onSettled invalidation
apps/sim/lib/core/async-jobs/backends/trigger-dev.ts	Adds cleanup-logs, cleanup-soft-deletes, cleanup-tasks to the Trigger.dev job type mapping; no issues

Sequence Diagram

sequenceDiagram
    participant Cron as Cron (GET /api/cron/*)
    participant Dispatcher as dispatchCleanupJobs
    participant Queue as JobQueue (Trigger.dev / DB)
    participant Task as Background Task
    participant DB as Database
    participant S3 as Object Storage
    participant Copilot as Copilot Backend

    Cron->>Dispatcher: dispatchCleanupJobs(jobType, retentionColumn)
    Dispatcher->>Queue: enqueue free-tier job
    Dispatcher->>Queue: enqueue paid-tier job
    Dispatcher->>DB: query enterprise workspaces with non-NULL retention
    Dispatcher->>Queue: batchTrigger enterprise jobs

    Queue->>Task: run(payload)
    Task->>DB: resolveTierWorkspaceIds or lookup workspace retention
    Task->>DB: SELECT expiring rows (batched, LIMIT 2000)
    Task->>S3: delete associated files (pre-deletion)
    Task->>Copilot: POST /api/tasks/cleanup (chat IDs)
    Task->>DB: DELETE rows by ID
    Task-->>Queue: complete

_{Reviews (2): Last reviewed commit: "fix lint" | Re-trigger Greptile}

TheodoreSpeaks · 2026-04-18T19:54:06Z

@greptile review

greptile-apps · 2026-04-18T20:02:46Z

+async function cleanupWorkspaceFileStorage(
+  workspaceIds: string[],
+  retentionDate: Date
+): Promise<{ filesDeleted: number; filesFailed: number }> {
+  const stats = { filesDeleted: 0, filesFailed: 0 }
+
+  if (!isUsingCloudStorage() || workspaceIds.length === 0) return stats
+
+  const filesToDelete = await db
+    .select({ key: workspaceFiles.key })
+    .from(workspaceFiles)
+    .where(
+      and(
+        inArray(workspaceFiles.workspaceId, workspaceIds),
+        isNotNull(workspaceFiles.deletedAt),
+        lt(workspaceFiles.deletedAt, retentionDate)
+      )
+    )
+    .limit(BATCH_SIZE * MAX_BATCHES_PER_TABLE)
+
+  for (const file of filesToDelete) {
+    try {
+      await StorageService.deleteFile({ key: file.key, context: 'workspace' })
+      stats.filesDeleted++
+    } catch (error) {
+      stats.filesFailed++
+      logger.error(`Failed to delete storage file ${file.key}:`, { error })
+    }
+  }
+
+  return stats
+}


workspaceFile S3 objects orphaned on hard-delete

cleanupWorkspaceFileStorage queries only the workspaceFiles (plural) table, but CLEANUP_TARGETS also includes workspaceFile (singular — workspace_file table). That table has a key column pointing directly to an S3 object. When cleanupTable purges those DB rows, the S3 objects are never removed, leaving permanent storage orphans.

The function needs a parallel query over the workspaceFile table and should call StorageService.deleteFile for each matching key before cleanupTable deletes the rows.

cursor · 2026-04-18T20:18:24Z

+              logRetentionHours: ws.logRetentionHours,
+              softDeleteRetentionHours: ws.softDeleteRetentionHours,
+              taskCleanupHours: ws.taskCleanupHours,
+            }


Enterprise effective values ignore configured defaults when null

Medium Severity

For enterprise workspaces, the effective retention values are set directly from the workspace's configured columns (ws.logRetentionHours, etc.) with no fallback. When an enterprise workspace hasn't configured any retention, all effective values are null (meaning "forever"/no cleanup). This means the dispatcher's isNotNull(retentionCol) check never matches, so no cleanup job is ever dispatched. Enterprise workspaces that want the system's enterprise defaults (which are also null) behave identically to those who explicitly chose "Forever" — there's no way to distinguish "not yet configured" from "intentionally set to never." This prevents an enterprise admin from expecting any out-of-the-box cleanup behavior.

^{Reviewed by Cursor Bugbot for commit adb8f71. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 3 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 4d76165. Configure here.}

cursor · 2026-04-18T22:36:36Z

+    .where(eq(workspace.id, payload.workspaceId))
+    .limit(1)
+
+  if (!ws?.hours) return null


Falsy check skips enterprise cleanup for zero-hour retention

Low Severity

resolveCleanupScope uses !ws?.hours which is a falsy check that treats 0 as equivalent to null/not-configured. While the Zod validation on the PUT endpoint enforces min(24), the database column is an unconstrained integer. If a value of 0 were ever stored (e.g., via migration or direct DB access), cleanup would silently skip that workspace instead of applying zero-hour retention. A strict null check like ws?.hours == null would be more correct.

^{Reviewed by Cursor Bugbot for commit 4d76165. Configure here.}

cursor · 2026-04-18T22:36:36Z

+function hoursToDisplayDays(hours: number | null): string {
+  if (hours === null) return 'never'
+  return String(Math.round(hours / 24))
+}


Select displays broken state for non-standard hour values

Low Severity

hoursToDisplayDays converts hours to a day string via Math.round(hours / 24), but the result may not match any entry in DAY_OPTIONS. The API accepts any integer between 24 and 43800, so values like 36 hours would produce '2' — not in the select options. This causes the Select component to have a value with no matching SelectItem, resulting in a blank or broken display for enterprise workspaces whose retention was set via direct API call.

^{Reviewed by Cursor Bugbot for commit 4d76165. Configure here.}

TheodoreSpeaks force-pushed the feat/auto-redaction branch from 57194bd to a3c1bab Compare April 17, 2026 21:23

vercel bot deployed to Preview April 17, 2026 21:26 View deployment

TheodoreSpeaks force-pushed the feat/auto-redaction branch from a3c1bab to 63f7b59 Compare April 18, 2026 02:44

TheodoreSpeaks changed the title ~~Feat/auto redaction (wip)~~ feat(jobs): Add data retention jobs Apr 18, 2026

vercel bot deployed to Preview April 18, 2026 02:47 View deployment

vercel bot deployed to Preview April 18, 2026 02:54 View deployment

cursor bot reviewed Apr 18, 2026

View reviewed changes

Comment thread apps/sim/background/cleanup-soft-deletes.ts Outdated

Comment thread packages/db/migrations/meta/_journal.json

Comment thread packages/db/migrations/meta/0191_snapshot.json Outdated

Comment thread apps/sim/background/cleanup-tasks.ts

vercel bot temporarily deployed to Preview April 18, 2026 16:05 Inactive

vercel bot deployed to Preview April 18, 2026 16:12 View deployment

cursor bot reviewed Apr 18, 2026

View reviewed changes

Comment thread apps/sim/lib/billing/cleanup-dispatcher.ts Outdated

vercel bot deployed to Preview April 18, 2026 19:09 View deployment

TheodoreSpeaks added 10 commits April 18, 2026 12:15

Add configurable cleanup for soft delete and logs

2e62dcd

Add task cleanup, switch to triggering off trigger.dev

373a0f1

fix migration

aeffefb

add bucket attachment cleanup

4148d2b

fix lint

8e0d8bc

fix migration

ee2bb2d

Batch delete resources

5f73a10

fix lint

566bca6

Regenerate migration

68c8e60

fix staging

9cb8dba

TheodoreSpeaks force-pushed the feat/auto-redaction branch from 0cb8970 to 9cb8dba Compare April 18, 2026 19:26

vercel bot deployed to Preview April 18, 2026 19:29 View deployment

fix lint

28b9847

vercel bot deployed to Preview April 18, 2026 19:33 View deployment

TheodoreSpeaks marked this pull request as ready for review April 18, 2026 19:34

greptile-apps bot reviewed Apr 18, 2026

View reviewed changes

Comment thread apps/sim/background/cleanup-soft-deletes.ts

Comment thread apps/sim/background/cleanup-tasks.ts

Comment thread apps/sim/lib/billing/cleanup-dispatcher.ts Outdated

Comment thread apps/sim/background/cleanup-logs.ts Outdated

greptile-apps bot reviewed Apr 18, 2026

View reviewed changes

address greptile comments

adb8f71

vercel bot deployed to Preview April 18, 2026 20:12 View deployment

cursor bot reviewed Apr 18, 2026

View reviewed changes

TheodoreSpeaks added 2 commits April 18, 2026 14:39

Merge remote-tracking branch 'origin/staging' into feat/auto-redaction

fa5c3aa

fix lint

8c65208

vercel bot deployed to Preview April 18, 2026 21:44 View deployment

cursor bot reviewed Apr 18, 2026

View reviewed changes

Comment thread apps/sim/background/cleanup-soft-deletes.ts Outdated

Use standard types

4d76165

vercel bot temporarily deployed to Preview April 18, 2026 22:29 Inactive

cursor bot reviewed Apr 18, 2026

View reviewed changes

Conversation

TheodoreSpeaks commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Type of Change

Testing

Checklist

Screenshots/Videos

Uh oh!

vercel bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

waleedlatif1 commented Apr 13, 2026

Uh oh!

TheodoreSpeaks commented Apr 18, 2026

Uh oh!

cursor bot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TheodoreSpeaks commented Apr 18, 2026

Uh oh!

Uh oh!

greptile-apps bot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TheodoreSpeaks commented Apr 18, 2026

Uh oh!

greptile-apps bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor bot Apr 18, 2026

Choose a reason for hiding this comment

Enterprise effective values ignore configured defaults when null

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 18, 2026

Choose a reason for hiding this comment

Falsy check skips enterprise cleanup for zero-hour retention

Uh oh!

cursor bot Apr 18, 2026

Choose a reason for hiding this comment

Select displays broken state for non-standard hour values

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TheodoreSpeaks commented Apr 13, 2026 •

edited

Loading

vercel bot commented Apr 13, 2026 •

edited

Loading

cursor bot commented Apr 18, 2026 •

edited

Loading

greptile-apps bot commented Apr 18, 2026 •

edited

Loading