|
| 1 | +# Feasibility Report — Post-Verification Rendered Page as Activation Signal |
| 2 | + |
| 3 | +- **Spec:** `evaluate-post-verification-rendered-page-as-activa-19` |
| 4 | +- **Task:** 4.2 — Write the feasibility report (findings, recommended default |
| 5 | + page strategy, sync/async decision, failure-mode handling, open risks) |
| 6 | +- **Status:** Final. Inputs: tasks 1.x, 2.x, 3.x, and 4.1 of this spec. |
| 7 | +- **Owner:** Get Started squad (team:omega) |
| 8 | +- **Date:** 2026-04-30 |
| 9 | + |
| 10 | +This report is the single consolidated answer to the four acceptance criteria |
| 11 | +of the spike. Everything below is drawn from the numbered investigation |
| 12 | +documents in `docs/spikes/`; those remain the source-of-truth for method and |
| 13 | +raw data. This report is the squad-review surface and the input to task 4.3 |
| 14 | +(MVP scope for the follow-up implementation ticket). |
| 15 | + |
| 16 | +Non-goals of the spec — production UI, changes to the render API / crawler / |
| 17 | +verification endpoint, batch rendering, screenshots, visual diffing, pricing / |
| 18 | +quota / accounting changes, i18n / a11y / analytics polish, caching work, and |
| 19 | +refactors of `features/integrationWizard` beyond the throwaway prototype — are |
| 20 | +respected throughout. Nothing in this report proposes work inside those areas. |
| 21 | + |
| 22 | +## 1. TL;DR |
| 23 | + |
| 24 | +| Acceptance criterion | Verdict | |
| 25 | +|---|---| |
| 26 | +| Feasibility of post-verification render is clear | **Feasible.** No changes to this repo required. `GET /render?url=…` already exposes everything the wizard needs. | |
| 27 | +| Page selection approach decided | **Verified domain root (`https://<verified-domain>/`)**, with an optional user override pre-filled to the same value. | |
| 28 | +| Failure cases documented | Five render-path cases + two wizard-side cases, all non-blocking in the UI. Enumerated in §5. | |
| 29 | +| Recommended implementation path defined | **Async-first** call to `/render` from the wizard backend at verification-success, with a **3 s opportunistic sync window**. MVP is a single wizard PR; no render-service change. See §6. | |
| 30 | + |
| 31 | +## 2. Feasibility of post-verification render |
| 32 | + |
| 33 | +**Feasible with the existing render path, with no changes to this repo.** |
| 34 | +Confirmed by reading `lib/server.js`, `lib/index.js`, `lib/util.js`, and |
| 35 | +`lib/browsers/chrome.js` during tasks 1.2, 2.1, 2.3, and 2.4. |
| 36 | + |
| 37 | +Key facts that make it feasible today: |
| 38 | + |
| 39 | +- Single entry point: `GET /render?url=<url>&renderType=html` (and the POST |
| 40 | + equivalent) hits `server.onRequest` in `lib/server.js`. No new endpoint, |
| 41 | + no new route, no crawler or sitemap dependency (task 2.1). |
| 42 | +- Input validation is already at the boundary: `valid-url.isWebUri` rejects |
| 43 | + malformed input with `400` before the plugin chain runs |
| 44 | + (`lib/server.js:221-225`). The wizard does not need to re-validate. |
| 45 | +- Response contract is already what the wizard needs: status code from the |
| 46 | + origin is propagated (`req.prerender.statusCode`, `lib/server.js:303`), |
| 47 | + configurable `renderErrorStatusCode` (default `504`) with |
| 48 | + `x-prerender-504-reason` header for renderer-side failures |
| 49 | + (`lib/server.js:484-486`), and rendered HTML in the body. |
| 50 | +- Bounded worst case: `pageLoadTimeout` (default 20 s, `lib/server.js:9`) |
| 51 | + plus `waitAfterLastRequest` (500 ms, `lib/server.js:5`) plus the 60 s |
| 52 | + per-request hang watchdog (`lib/server.js:241`) cap how long any single |
| 53 | + call can take. |
| 54 | +- Auth is already solved for service-to-service use: if deployed behind |
| 55 | + `basicAuth`, the wizard backend reuses the same credential the crawler |
| 56 | + uses. No new auth path is required (task 1.2). |
| 57 | + |
| 58 | +The spec's reliability bar (≥ 95% success on verified URLs) and latency |
| 59 | +budget (P50 ≤ 6 s, P95 ≤ 12 s, hard ceiling 20 s) are achievable with the |
| 60 | +verified-root selection because the root is the URL whose reachability we |
| 61 | +just proved during verification — DNS, TLS, and bot-gating are known-good at |
| 62 | +call time (task 2.4). The budget is still tight against the 20 s |
| 63 | +`pageLoadTimeout`, which is why §4 below recommends against blocking the |
| 64 | +wizard on the render. |
| 65 | + |
| 66 | +## 3. Method (summary) |
| 67 | + |
| 68 | +Detail lives in the individual spike documents; this is the précis so the |
| 69 | +report stands alone. |
| 70 | + |
| 71 | +- **Entry point & auth scoping:** read of `lib/server.js`, `lib/index.js`, |
| 72 | + `lib/plugins/{basicAuth,whitelist,blacklist}.js` — `post-verification-render-entry-point.md`. |
| 73 | +- **Feasibility, without crawler dependency:** read of `lib/server.js`, |
| 74 | + `lib/util.js`, confirmation that no crawler/sitemap/queue code is reachable |
| 75 | + from the render path — `post-verification-render-feasibility.md`. |
| 76 | +- **Failure enumeration & reproduction:** read of `lib/browsers/chrome.js` |
| 77 | + + live repros against the local server on branch tip `e83e20d` — |
| 78 | + `post-verification-render-failure-cases.md`. |
| 79 | +- **Page-selection comparison:** three-strategy evaluation (verified root, |
| 80 | + user-provided, small fixed set) against reliability and latency from the |
| 81 | + onboarding-render sample — |
| 82 | + `evaluate-post-verification-rendered-page-activation-page-selection.md`. |
| 83 | +- **Sync vs. async decision:** latency and failure-mode data applied to the |
| 84 | + onboarding flow — `sync-vs-async-decision.md` (task 4.1). |
| 85 | + |
| 86 | +## 4. Decisions |
| 87 | + |
| 88 | +### 4.1 Page selection: verified domain root, with override |
| 89 | + |
| 90 | +**Decision: default to `https://<verified-domain>/`.** Optional user override, |
| 91 | +pre-filled with the verified root. |
| 92 | + |
| 93 | +Rationale (full detail in task 2.4 doc): |
| 94 | + |
| 95 | +- The root is the URL whose reachability we just proved during verification — |
| 96 | + conditional probability of render success is materially higher than for an |
| 97 | + arbitrary user-supplied URL. |
| 98 | +- Single render call, so p95 latency is bounded by one `pageLoadTimeout`. |
| 99 | +- No probe/fallback code path, no backend caching, no render-accounting |
| 100 | + change — all explicitly out of scope for this spec. |
| 101 | +- Override input covers legitimate edge cases (SPA shells, locale redirects, |
| 102 | + marketing-only homepages) without introducing a probe-and-pick path. |
| 103 | + |
| 104 | +**Explicitly rejected:** the "small fixed set" probe-and-pick strategy. Each |
| 105 | +failed probe costs a full `pageLoadTimeout` before fall-through, which breaks |
| 106 | +the latency budget and compounds onboarding-render accounting (non-goal). |
| 107 | + |
| 108 | +### 4.2 Sync vs. async: async-first with a 3 s opportunistic sync window |
| 109 | + |
| 110 | +**Decision: async-first. Never block the wizard on the render.** Fire the |
| 111 | +render the moment verification succeeds; if it returns within a 3 s wall-clock |
| 112 | +budget show it inline; otherwise advance the wizard and update the preview |
| 113 | +slot in place when the render finally resolves (success or failure). |
| 114 | + |
| 115 | +Rationale (full detail in task 4.1 doc, `sync-vs-async-decision.md`): |
| 116 | + |
| 117 | +- P95 latency is too close to the 20 s `pageLoadTimeout` for a blocking |
| 118 | + wizard step to be acceptable. Timeouts and navigate errors only resolve |
| 119 | + after the full page-load budget elapses, so a sync UX pays worst-case |
| 120 | + latency on every failure. |
| 121 | +- A fully deferred UX (email / later surface) loses the "Aha" moment the |
| 122 | + spike is trying to create — a non-trivial fraction of homepages do render |
| 123 | + quickly and those users deserve inline feedback. |
| 124 | +- The 3 s sync budget is well above typical fast-mode render time and well |
| 125 | + below hang-perception thresholds for a secondary widget inside a completed |
| 126 | + step. Tunable; the follow-up ticket should instrument actual hit rate. |
| 127 | + |
| 128 | +The verification step itself is **always** marked complete regardless of |
| 129 | +render outcome. The preview is additive, never a gate. |
| 130 | + |
| 131 | +## 5. Failure-mode handling |
| 132 | + |
| 133 | +Consolidates task 2.3 (`post-verification-render-failure-cases.md`) and the |
| 134 | +wizard-side surface from task 1.2. All states are non-blocking; all are |
| 135 | +derived from `statusCode`, body, and `x-prerender-504-reason` alone — no new |
| 136 | +render-service signal is required. |
| 137 | + |
| 138 | +### 5.1 Render-path cases (originate in this repo) |
| 139 | + |
| 140 | +| # | Case | Server signal | HTTP seen by wizard | Wizard UX state | |
| 141 | +|---|---|---|---|---| |
| 142 | +| 1 | Renderer error (Chrome-side) | `renderErrorStatusCode` branch in `chrome.js`, `x-prerender-504-reason` set | `504` (default `RENDERING_ERROR_STATUS_CODE`) | "We couldn't render `<url>` automatically." Retry button; no auto-retry in MVP. | |
| 143 | +| 2 | Non-2xx upstream (4xx/5xx propagated) | `tab.prerender.statusCode = params.response.status` | Origin status (e.g. `404`, `500`, `301`) | Show status + short explanation ("Your site returned `403`. If this page is behind login, try a public URL."). Offer override. | |
| 144 | +| 3 | Blocked / auth-gated page | Origin `401`/`403`, or `200` with login HTML / WAF interstitial | Pass-through status, or `200` with thin body | Hard case: same as #2. Soft case: flag as "rendered, but content looks thin"; needs a body-length / login-heuristic check on the wizard side. | |
| 145 | +| 4 | Empty / unparseable payload | `parseHtmlFromPage` resolves with empty body | `200` (or passed-through) with ~empty body | "Preview looks empty." Treat as render failure. Offer override. | |
| 146 | +| 5 | Page-load timeout | `pageLoadTimeout` + `tab.prerender.timedout` | `timeoutStatusCode` if set, else captured status | "Preview is taking longer than expected." Retry button. | |
| 147 | + |
| 148 | +### 5.2 Operational / wizard-side cases |
| 149 | + |
| 150 | +| # | Case | Trigger | Wizard UX state | |
| 151 | +|---|---|---|---| |
| 152 | +| 6 | Invalid URL (shouldn't happen with verified root; can with override) | `valid-url.isWebUri` rejects → `400` | Inline input error on the override field. | |
| 153 | +| 7 | Service unavailable / auth rejected / domain not allow-listed | `401` (basicAuth), `404` (whitelist/blacklist), network failure | Generic "Preview temporarily unavailable — try again" banner. **Never** blocks the wizard. | |
| 154 | + |
| 155 | +### 5.3 Properties the MVP relies on |
| 156 | + |
| 157 | +- **Every case is observable from the response alone.** No new telemetry, no |
| 158 | + new endpoint, no new header is needed for the MVP UI to distinguish them. |
| 159 | +- **No case blocks verification completion.** The wizard's primary CTA stays |
| 160 | + enabled at all times. This is the invariant the async-first decision |
| 161 | + protects. |
| 162 | +- **Retry is user-initiated.** The spike deliberately does not auto-retry: |
| 163 | + cascading retries on the long tail would break the latency budget for any |
| 164 | + user left watching. |
| 165 | + |
| 166 | +## 6. Recommended implementation path (for task 4.3 / the follow-up ticket) |
| 167 | + |
| 168 | +Scope so the follow-up fits in a single wizard PR. No render-service change. |
| 169 | + |
| 170 | +1. **Call site.** After verification resolves successfully, the wizard |
| 171 | + backend issues one |
| 172 | + `GET <PRERENDER_BASE_URL>/render?url=<verifiedRoot>&renderType=html&followRedirects=true` |
| 173 | + using the existing crawler credential. Pre-fill an override input with |
| 174 | + the verified root. Pass a neutral user-agent so the rendered output |
| 175 | + matches what a real visitor sees, not the crawler UA. |
| 176 | +2. **Kickoff timing.** Fire the render immediately when verification succeeds |
| 177 | + — not when the user clicks "Next." The sync window is a race against the |
| 178 | + render, not against user think-time. |
| 179 | +3. **Sync window.** 3 s wall-clock from kickoff. If the render resolves within |
| 180 | + the window, show the preview inline. Otherwise, advance the wizard with a |
| 181 | + "Your preview is rendering…" affordance in the preview slot. |
| 182 | +4. **Async resolution.** When the render ultimately resolves, update the |
| 183 | + preview slot in place (success, failure, or empty). If the user has moved |
| 184 | + on, the finished preview is available on the post-onboarding / integration |
| 185 | + settings page. |
| 186 | +5. **UI states.** `loading`, `success` (iframe / sandbox, plus "looks right? |
| 187 | + / try a different URL"), and the seven failure states from §5.1–5.2. All |
| 188 | + required by the universal async-UI rule; none require backend changes. |
| 189 | +6. **Sandbox.** Render output goes into a sandboxed iframe; treat as |
| 190 | + untrusted third-party HTML; do not persist beyond the request/response |
| 191 | + lifecycle in the MVP. |
| 192 | +7. **Client-side timeout cap.** Hard cap the wizard's own fetch at slightly |
| 193 | + above `pageLoadTimeout` (e.g. 25 s) so a stuck tab cannot wedge the |
| 194 | + preview slot indefinitely. |
| 195 | +8. **Scope guardrails (explicit deferrals).** No caching, no batching, no |
| 196 | + screenshots, no before/after diff, no per-customer quota accounting, no |
| 197 | + analytics instrumentation, no i18n / a11y polish — all spec non-goals. |
| 198 | + Task 4.3 is the place to lock this list down for the implementation |
| 199 | + ticket. |
| 200 | + |
| 201 | +Rough sizing (not binding; task 4.3 refines): |
| 202 | + |
| 203 | +- **Frontend:** 1 panel, 1 sandboxed iframe, state machine covering the |
| 204 | + seven failure states and the sync / async handoff. Small. |
| 205 | +- **Backend:** 1 service-to-service call at the verification-success |
| 206 | + handler, reusing existing credentials and HTTP client. Smaller. |
| 207 | +- **Ops:** no new service, no new env var, no new dashboard. Monitor the |
| 208 | + 3 s sync-window hit rate post-launch to tune the budget. |
| 209 | + |
| 210 | +## 7. Open risks and follow-ups |
| 211 | + |
| 212 | +All items here are **out of scope for this spike**. They are tracked here so |
| 213 | +they are not lost going into task 4.3 and the follow-up implementation. |
| 214 | + |
| 215 | +1. **Sync-window hit rate is not measured yet.** 3 s is the starting point; |
| 216 | + the follow-up should instrument how often the inline race wins and tune |
| 217 | + from data, not intuition. |
| 218 | +2. **SPA-shell false negatives.** A `200` with near-empty body after |
| 219 | + hydration didn't finish will read as "empty payload" in §5.1 case 4. |
| 220 | + Body-length heuristics are a MVP wizard concern, not a render-service |
| 221 | + concern; the threshold is a judgement call the follow-up makes. |
| 222 | +3. **Soft bot / WAF blocks indistinguishable from a thin-content page.** |
| 223 | + Same class as (2); same treatment. Accept false-negative activation |
| 224 | + signals as a known limit of the MVP; document in the UI copy. |
| 225 | +4. **Post-wizard surfacing.** Whether the preview should persist in the |
| 226 | + integration's ongoing settings view is a product decision, not a |
| 227 | + rendering decision. Flagged for the follow-up. |
| 228 | +5. **Override abuse / retries.** A customer spamming override + retry is |
| 229 | + rate-limited by the render service's own connection concurrency, but the |
| 230 | + spec explicitly defers quota/accounting. If abuse is observed post-launch, |
| 231 | + mitigate in the wizard backend, not here. |
| 232 | +6. **User-agent choice.** A neutral UA keeps the rendered output |
| 233 | + representative but means the wizard cannot reuse crawler-UA-specific |
| 234 | + cached renders on the render service. This is a deliberate trade-off. |
| 235 | +7. **Browser health.** The render server restarts Chrome periodically |
| 236 | + (`browserTryRestartPeriod`) and may reject with `503` when the browser |
| 237 | + isn't connected. The async-first design absorbs this; the follow-up just |
| 238 | + needs to treat `503` as the transient-unavailable state in §5.2 case 7. |
| 239 | + |
| 240 | +## 8. Acceptance-criteria mapping |
| 241 | + |
| 242 | +| Criterion | Section | |
| 243 | +|---|---| |
| 244 | +| Feasibility of post-verification render is clear | §2 | |
| 245 | +| Page selection approach decided | §4.1 | |
| 246 | +| Failure cases documented | §5 | |
| 247 | +| Recommended implementation path defined | §6 | |
| 248 | + |
| 249 | +## 9. Non-goals respected |
| 250 | + |
| 251 | +This report does not: |
| 252 | + |
| 253 | +- Propose changes to the render API, the crawler, or the verification |
| 254 | + endpoint. |
| 255 | +- Propose multi-URL batching, screenshots, visual diffing, or before/after |
| 256 | + comparison. |
| 257 | +- Propose pricing, quota, or render-accounting changes for onboarding renders. |
| 258 | +- Propose caching beyond what the existing render path already provides. |
| 259 | +- Propose i18n, a11y, or analytics instrumentation for the eventual UI. |
| 260 | +- Propose any refactor of `features/integrationWizard` beyond hosting the |
| 261 | + throwaway prototype covered in task 3.1. |
| 262 | +- Ship production UI; the wizard implementation is the follow-up ticket |
| 263 | + scoped by task 4.3 on the basis of this report. |
0 commit comments