Typed exception flow

Every failed publish has a typed cause, a typed next action, and a typed owner.

When a Cenelira publish fails, the worker writes a raw error onto the job. The recovery surface does not show that string. It calls classifyReliabilityFailureCause in src/lib/reliability/failureClassification.ts, which maps the raw error onto a typed code, a normalized bucket, and a typed source.

On top of that cause, fetchWorkspaceReliability in src/lib/reliability/workspaceReliability.ts resolves a typed nextAction, a typed owner, and a typed replayability state. The same row feeds the in-app surface and the signed proof export, so the cause and the recovery decision live on the artifact that leaves the app.

Definition

What is failed-post recovery?

Every Cenelira publish failure is classified into a typed cause, assigned a typed next action, and attributed to a typed owner. The recovery surface lists the open items and offers replay only when the gates allow it.

Mechanism

Where is the classification done?

In src/lib/reliability/failureClassification.ts and src/lib/reliability/workspaceReliability.ts. The classifier resolves a typed cause from the job error, the surface resolves the next action from the cause and replayability gate, and the proof export reads the cause, next action, and owner directly from the typed row.

Limit

What is out of scope?

This is a typed-cause and typed-action surface. It does not promise that every failure has a one-click fix; some causes resolve to no automated next action, and some replays are refused for typed reasons.

Who this is for

Built for operators who are tired of triaging by stack trace.

Failed-post recovery earns its rent on teams where every failure has to be classified, owned, and either replayed or handed off. The surface assumes a queue that produces real exceptions, and adds less value when nothing ever fails and nobody ever has to explain why.

For
  • Agency operators whose clients ask "what broke and who is fixing it" the morning after a queue night.
  • Brand teams who need a typed record of every exception and who recovered it for audit and proof export.
  • Ops leads who want replay gated by review state, target binding, and grant validity, not by whoever clicked replay first.
Not for
  • Teams that prefer raw provider error strings and a single "retry" button in chat.
  • Workflows that rely on screenshots of failed posts as the recovery record.
  • Operators who expect Cenelira to recover from every failure automatically. Some causes resolve to no automated next action.

Cause taxonomy

Eleven normalized buckets and four platform-specific cause codes.

classifyReliabilityFailureCause first checks for raw-error signatures that map to a platform-specific code, then falls back to the normalized PublishBucket set. The cause is recorded with a source of raw_error_signature, normalized_bucket, or status_only, so a downstream reader can tell how the classification was reached.

  • permissions_missing

    A scope, account permission, or page connection the publish path needed at execution time is not present on the active grant.

    Normalized bucket · Next action: reconnect_account
  • reconnect_required

    The platform indicated the connection itself is no longer valid and a fresh connect flow is required before another publish attempt.

    Normalized bucket · Next action: reconnect_account
  • auth_failed

    The active grant returned an auth failure that is not specifically classified as expired or as a missing scope.

    Normalized bucket · Next action: reconnect_account
  • auth_expired

    The active grant has expired. The publish path refuses instead of attempting an authenticated call against an expired token.

    Normalized bucket · Next action: reconnect_account
  • media_invalid

    The platform rejected the supplied media as malformed, unsupported, or out of policy for the destination intent.

    Normalized bucket · Next action: review_media
  • rate_limited

    The platform throttled the publish attempt. The recovery surface marks the next action as a deferred retry rather than an immediate replay.

    Normalized bucket · Next action: retry_later
  • publish_failed

    A platform-side publish error or misconfiguration the classifier could not split into a more specific bucket.

    Normalized bucket · Next action: replay_publish
  • child_create_failed

    A bundle child item failed during creation on the platform side, before the parent post could be assembled.

    Normalized bucket · Next action: review_media
  • child_processing_timeout

    A child item exceeded the processing timeout while the platform was preparing the bundle for publish.

    Normalized bucket · Next action: replay_publish
  • parent_create_failed

    The parent post could not be created after the child items were prepared.

    Normalized bucket · Next action: replay_publish
  • unknown

    The publish error did not match any normalized bucket. The schedule is still recorded as failed and is replayable when no other gate refuses.

    Normalized bucket · Next action: replay_publish
  • tiktok_private_account_required

    TikTok refused the post because the connected account is unaudited and the platform requires a private destination for unaudited clients. The recovery surface records the cause and does not assign an automated next step.

    Platform-specific · Next action: none
  • pinterest_forbidden

    Pinterest returned a 403 the classifier maps to a permissions issue on the connected account.

    Platform-specific · Next action: reconnect_account
  • x_media_capability_missing

    The X workspace capability check found no media-upload capability for the connected account at execution time.

    Platform-specific · Next action: reconnect_account
  • x_media_upload_scope_missing

    X rejected the media upload init with a 403 that the classifier attributes to a missing upload scope on the active grant.

    Platform-specific · Next action: reconnect_account

Next-action codes

Seven typed next actions. Each one is a real surface.

reliabilityNextAction resolves the next step from the typed cause and the replayability gate. Each code corresponds to a concrete surface: a link to a connection repair flow, a link to the handoff pack, a typed replay mutation, or no action at all when the platform refused for a reason Cenelira cannot fix.

  • review_connections

    The schedule is in a target-repair state; the operator is sent to the connections surface to fix the destination binding.

    Kind: link
  • open_handoff

    The schedule is in handoff state; the operator opens the handoff pack instead of replaying the publish path.

    Kind: link
  • reconnect_account

    The active grant is missing scopes, expired, or otherwise refused; the operator is sent to the connect flow.

    Kind: link
  • review_media

    A media item was rejected; the operator is sent to the schedule to inspect the asset before another attempt.

    Kind: link
  • replay_publish

    The recovery surface offers a typed replay; replay itself is gated by review state, post status, target repair, and grant validity.

    Kind: mutation
  • retry_later

    The platform is rate-limited; the surface records the cause and does not offer an immediate replay.

    Kind: none
  • none

    The classifier returns a cause for which no automated next action is defined; the row is recorded but no replay is offered.

    Kind: none

Replay gate

Replay is a typed enum, not a button that always works.

WorkspaceReliabilityReplayability is a discriminated union with a replayable state and a blocked state. The blocked state carries one of eight typed reasons. Replay itself is a rate-limited POST /api/jobs/replay in src/app/api/jobs/replay/route.ts, gated again at the API by the same review-state, target-repair, post-status, and grant-validity checks the surface ran. The schedule cannot move forward until the gate clears.

  • target_repair_required

    The destination binding is missing or invalid; replay is refused until target repair completes.

    Blocked reason
  • review_not_approved

    The schedule is not in an approved review state; replay refuses for the same reason the publish path refuses.

    Blocked reason
  • not-connected

    No active grant covers the connected account; the operator is sent to reconnect rather than replay.

    Blocked reason
  • x-media-capability-missing

    X media upload capability is not present for the workspace; replay refuses until the capability is granted.

    Blocked reason
  • still-running

    The schedule is still posting or its job is still running; replay refuses to avoid double publishes.

    Blocked reason
  • already-posted

    The post row records SUCCESS; replay refuses because a prior attempt already produced a platform post.

    Blocked reason
  • handoff-not-replayable

    The schedule is in handoff mode or status; the next step is to open the handoff pack, not replay.

    Blocked reason
  • canceled-not-replayable

    The schedule is in a canceled mutation status; replay refuses because the operator has already taken the schedule out of the queue.

    Blocked reason

Owner attribution

Six owner kinds. The first match wins.

reliabilityOwner in src/lib/reliability/workspaceReliability.ts walks a fixed precedence: handoff_owner for handoff items, then recovery_owner, then last_actor when the last actor type is USER, then creator, then system, then unassigned. The recovery surface and the proof export both read this typed owner. When no human is attributable, the row records system or unassigned honestly instead of inventing one.

  • handoff_owner

    A user explicitly attributed as the handoff owner on the schedule. Selected for handoff items when set.

    Owner kind
  • recovery_owner

    A user attributed as the recovery owner on the schedule, set when an operator claims a failure.

    Owner kind
  • last_actor

    The last user actor recorded on the schedule. Selected when last_actor.type is USER and the user record is present.

    Owner kind
  • creator

    The user who created the schedule. Selected when no other typed owner is present.

    Owner kind
  • system

    A system actor performed the last mutation. Recorded with no human user reference; surfaced as a typed owner kind.

    Owner kind
  • unassigned

    No typed owner could be resolved from the schedule. The row is still surfaced; it just carries no owner reference.

    Owner kind

Proof artifact

The proof record carries the cause, the action, and the owner.

collectWorkspaceProofRecords in src/lib/exports/workspaceProofExport.ts reuses the same WorkspaceReliabilityItem the recovery surface renders. Three fields on the signed 17-field proof record carry the recovery state.

exceptionCause is set to item.cause.code. The same code the surface shows on the row appears in the export, so a reader can match the proof entry to the typed bucket or platform-specific cause.

recoveryAction is set to item.nextAction.code. The export records what Cenelira offered as the next step at the time the row was assembled, not whatever button was clicked later.

recoveryActor is set to item.owner.user?.label ?? (kind === 'system' ? 'system' : null). When a typed user owner is resolved, the label is recorded. When the owner kind is system, the literal system is recorded. When neither holds, the field is null instead of guessing.

publishOutcome records publish_failed when the schedule, job, or post status is FAILED, alongside the typed handoff_pending, handoff_completed, queued, posting, published, and canceled outcomes. The full record schema and the enum live in src/lib/exports/proofRecordSchema.ts. See the proof record schema for the full 17-field inventory.

What this does not claim

Honest limits on recovery.

Failed-post recovery is one surface in the publish path. This page is explicit about where it stops.

  • Not every cause resolves to a one-click fix. Codes like tiktok_private_account_required resolve to none, and the row is recorded without an automated next action.
  • A row whose target binding is missing or invalid is surfaced as a repair-required issue with cause domain target_repair, and replay is blocked with target_repair_required. The publish-path refusal that produced the state is documented on the wrong-account prevention page.
  • Some failures cannot be attributed to a human. The owner kinds system and unassigned are recorded as such; the row is not invented to look like a person owns it.
  • The replay endpoint is rate-limited per user and per IP. Cenelira does not promise that replay is always available; the typed blocked reasons on this page describe when the gate refuses.
  • Cenelira cannot re-authenticate the platform user from the recovery surface. When the cause requires a fresh grant, the next action sends the operator to the connect flow.
  • This page describes how Cenelira classifies and records publish failures. It makes no claim about how other publishing tools structure their own exception or recovery fields.

FAQ

Short answers for operators.

Which failure causes does Cenelira classify?

Eleven normalized PublishBucket codes (permissions_missing, reconnect_required, auth_failed, auth_expired, media_invalid, rate_limited, publish_failed, child_create_failed, child_processing_timeout, parent_create_failed, unknown) plus four platform-specific codes layered above the buckets (tiktok_private_account_required, pinterest_forbidden, x_media_capability_missing, x_media_upload_scope_missing). The list lives in src/lib/reliability/failureClassification.ts.

What next-action codes are available?

Seven: review_connections, open_handoff, reconnect_account, review_media, replay_publish, retry_later, and none. Each code is set by reliabilityNextAction in src/lib/reliability/workspaceReliability.ts based on the typed cause and the replayability gate.

When is replay actually available?

Replay is offered when the cause maps to publish_failed, child_processing_timeout, parent_create_failed, or unknown, and none of the eight blocked reasons applies (target_repair_required, review_not_approved, not-connected, x-media-capability-missing, still-running, already-posted, handoff-not-replayable, canceled-not-replayable). Replay itself is a rate-limited POST to /api/jobs/replay.

How does Cenelira decide who owns a failure?

Six typed owner kinds resolved by reliabilityOwner: handoff_owner (handoff items), recovery_owner (operator-claimed), last_actor (USER), creator, system, and unassigned. The first match wins. Some failures resolve to system or unassigned, and the surface records that honestly instead of inventing a person.

Where do recovery details land on the proof record?

In the signed 17-field proof record, exceptionCause is set to the typed cause code, recoveryAction is set to the typed next-action code, and recoveryActor is set to the owner's user label, the literal 'system' for system owners, or null when no typed owner could be resolved. collectWorkspaceProofRecords in src/lib/exports/workspaceProofExport.ts assembles the row from the same WorkspaceReliabilityItem.

Are wrong-account failures handled here?

A schedule whose target binding is missing or invalid is surfaced as a repair_required item with cause domain target_repair, and replay is blocked with target_repair_required. The publish-path refusal that produced the state is documented on the wrong-account prevention page.

Last reviewed 2026-04-25

Get notified when the recovery and replay checklist ships.

A short checklist for teams who own a queue, covering how to read the typed cause and next-action codes, when replay is gated by review state or target repair, and what the proof record carries when the schedule eventually leaves the queue. Leave an email and we will send it when the checklist is ready.

We use this only to tell you when the relevant update ships. Privacy.

Failed-post recovery · Cenelira – Cenelira