What is failed-post recovery?
Every Cenelira publish failure is classified into a typed cause, assigned a typed next action, and attributed to a typed owner. The recovery surface lists the open items and offers replay only when the gates allow it.
Typed exception flow
When a Cenelira publish fails, the worker writes a raw error onto the job. The recovery surface does not show that string. It calls classifyReliabilityFailureCause in src/lib/reliability/failureClassification.ts, which maps the raw error onto a typed code, a normalized bucket, and a typed source.
On top of that cause, fetchWorkspaceReliability in src/lib/reliability/workspaceReliability.ts resolves a typed nextAction, a typed owner, and a typed replayability state. The same row feeds the in-app surface and the signed proof export, so the cause and the recovery decision live on the artifact that leaves the app.
Every Cenelira publish failure is classified into a typed cause, assigned a typed next action, and attributed to a typed owner. The recovery surface lists the open items and offers replay only when the gates allow it.
In src/lib/reliability/failureClassification.ts and src/lib/reliability/workspaceReliability.ts. The classifier resolves a typed cause from the job error, the surface resolves the next action from the cause and replayability gate, and the proof export reads the cause, next action, and owner directly from the typed row.
This is a typed-cause and typed-action surface. It does not promise that every failure has a one-click fix; some causes resolve to no automated next action, and some replays are refused for typed reasons.
Who this is for
Failed-post recovery earns its rent on teams where every failure has to be classified, owned, and either replayed or handed off. The surface assumes a queue that produces real exceptions, and adds less value when nothing ever fails and nobody ever has to explain why.
Cause taxonomy
classifyReliabilityFailureCause first checks for raw-error signatures that map to a platform-specific code, then falls back to the normalized PublishBucket set. The cause is recorded with a source of raw_error_signature, normalized_bucket, or status_only, so a downstream reader can tell how the classification was reached.
A scope, account permission, or page connection the publish path needed at execution time is not present on the active grant.
The platform indicated the connection itself is no longer valid and a fresh connect flow is required before another publish attempt.
The active grant returned an auth failure that is not specifically classified as expired or as a missing scope.
The active grant has expired. The publish path refuses instead of attempting an authenticated call against an expired token.
The platform rejected the supplied media as malformed, unsupported, or out of policy for the destination intent.
The platform throttled the publish attempt. The recovery surface marks the next action as a deferred retry rather than an immediate replay.
A platform-side publish error or misconfiguration the classifier could not split into a more specific bucket.
A bundle child item failed during creation on the platform side, before the parent post could be assembled.
A child item exceeded the processing timeout while the platform was preparing the bundle for publish.
The parent post could not be created after the child items were prepared.
The publish error did not match any normalized bucket. The schedule is still recorded as failed and is replayable when no other gate refuses.
TikTok refused the post because the connected account is unaudited and the platform requires a private destination for unaudited clients. The recovery surface records the cause and does not assign an automated next step.
Pinterest returned a 403 the classifier maps to a permissions issue on the connected account.
The X workspace capability check found no media-upload capability for the connected account at execution time.
X rejected the media upload init with a 403 that the classifier attributes to a missing upload scope on the active grant.
Next-action codes
reliabilityNextAction resolves the next step from the typed cause and the replayability gate. Each code corresponds to a concrete surface: a link to a connection repair flow, a link to the handoff pack, a typed replay mutation, or no action at all when the platform refused for a reason Cenelira cannot fix.
The schedule is in a target-repair state; the operator is sent to the connections surface to fix the destination binding.
The schedule is in handoff state; the operator opens the handoff pack instead of replaying the publish path.
The active grant is missing scopes, expired, or otherwise refused; the operator is sent to the connect flow.
A media item was rejected; the operator is sent to the schedule to inspect the asset before another attempt.
The recovery surface offers a typed replay; replay itself is gated by review state, post status, target repair, and grant validity.
The platform is rate-limited; the surface records the cause and does not offer an immediate replay.
The classifier returns a cause for which no automated next action is defined; the row is recorded but no replay is offered.
Replay gate
WorkspaceReliabilityReplayability is a discriminated union with a replayable state and a blocked state. The blocked state carries one of eight typed reasons. Replay itself is a rate-limited POST /api/jobs/replay in src/app/api/jobs/replay/route.ts, gated again at the API by the same review-state, target-repair, post-status, and grant-validity checks the surface ran. The schedule cannot move forward until the gate clears.
The destination binding is missing or invalid; replay is refused until target repair completes.
The schedule is not in an approved review state; replay refuses for the same reason the publish path refuses.
No active grant covers the connected account; the operator is sent to reconnect rather than replay.
X media upload capability is not present for the workspace; replay refuses until the capability is granted.
The schedule is still posting or its job is still running; replay refuses to avoid double publishes.
The post row records SUCCESS; replay refuses because a prior attempt already produced a platform post.
The schedule is in handoff mode or status; the next step is to open the handoff pack, not replay.
The schedule is in a canceled mutation status; replay refuses because the operator has already taken the schedule out of the queue.
Owner attribution
reliabilityOwner in src/lib/reliability/workspaceReliability.ts walks a fixed precedence: handoff_owner for handoff items, then recovery_owner, then last_actor when the last actor type is USER, then creator, then system, then unassigned. The recovery surface and the proof export both read this typed owner. When no human is attributable, the row records system or unassigned honestly instead of inventing one.
A user explicitly attributed as the handoff owner on the schedule. Selected for handoff items when set.
A user attributed as the recovery owner on the schedule, set when an operator claims a failure.
The last user actor recorded on the schedule. Selected when last_actor.type is USER and the user record is present.
The user who created the schedule. Selected when no other typed owner is present.
A system actor performed the last mutation. Recorded with no human user reference; surfaced as a typed owner kind.
No typed owner could be resolved from the schedule. The row is still surfaced; it just carries no owner reference.
Proof artifact
collectWorkspaceProofRecords in src/lib/exports/workspaceProofExport.ts reuses the same WorkspaceReliabilityItem the recovery surface renders. Three fields on the signed 17-field proof record carry the recovery state.
exceptionCause is set to item.cause.code. The same code the surface shows on the row appears in the export, so a reader can match the proof entry to the typed bucket or platform-specific cause.
recoveryAction is set to item.nextAction.code. The export records what Cenelira offered as the next step at the time the row was assembled, not whatever button was clicked later.
recoveryActor is set to item.owner.user?.label ?? (kind === 'system' ? 'system' : null). When a typed user owner is resolved, the label is recorded. When the owner kind is system, the literal system is recorded. When neither holds, the field is null instead of guessing.
publishOutcome records publish_failed when the schedule, job, or post status is FAILED, alongside the typed handoff_pending, handoff_completed, queued, posting, published, and canceled outcomes. The full record schema and the enum live in src/lib/exports/proofRecordSchema.ts. See the proof record schema for the full 17-field inventory.
What this does not claim
Failed-post recovery is one surface in the publish path. This page is explicit about where it stops.
tiktok_private_account_required resolve to none, and the row is recorded without an automated next action.target_repair, and replay is blocked with target_repair_required. The publish-path refusal that produced the state is documented on the wrong-account prevention page.system and unassigned are recorded as such; the row is not invented to look like a person owns it.blocked reasons on this page describe when the gate refuses.FAQ
Eleven normalized PublishBucket codes (permissions_missing, reconnect_required, auth_failed, auth_expired, media_invalid, rate_limited, publish_failed, child_create_failed, child_processing_timeout, parent_create_failed, unknown) plus four platform-specific codes layered above the buckets (tiktok_private_account_required, pinterest_forbidden, x_media_capability_missing, x_media_upload_scope_missing). The list lives in src/lib/reliability/failureClassification.ts.
Seven: review_connections, open_handoff, reconnect_account, review_media, replay_publish, retry_later, and none. Each code is set by reliabilityNextAction in src/lib/reliability/workspaceReliability.ts based on the typed cause and the replayability gate.
Replay is offered when the cause maps to publish_failed, child_processing_timeout, parent_create_failed, or unknown, and none of the eight blocked reasons applies (target_repair_required, review_not_approved, not-connected, x-media-capability-missing, still-running, already-posted, handoff-not-replayable, canceled-not-replayable). Replay itself is a rate-limited POST to /api/jobs/replay.
Six typed owner kinds resolved by reliabilityOwner: handoff_owner (handoff items), recovery_owner (operator-claimed), last_actor (USER), creator, system, and unassigned. The first match wins. Some failures resolve to system or unassigned, and the surface records that honestly instead of inventing a person.
In the signed 17-field proof record, exceptionCause is set to the typed cause code, recoveryAction is set to the typed next-action code, and recoveryActor is set to the owner's user label, the literal 'system' for system owners, or null when no typed owner could be resolved. collectWorkspaceProofRecords in src/lib/exports/workspaceProofExport.ts assembles the row from the same WorkspaceReliabilityItem.
A schedule whose target binding is missing or invalid is surfaced as a repair_required item with cause domain target_repair, and replay is blocked with target_repair_required. The publish-path refusal that produced the state is documented on the wrong-account prevention page.
Last reviewed 2026-04-25
A short checklist for teams who own a queue, covering how to read the typed cause and next-action codes, when replay is gated by review state or target repair, and what the proof record carries when the schedule eventually leaves the queue. Leave an email and we will send it when the checklist is ready.