Appearance
Checklist phát hành (Go-Live) — Tích hợp Pancake CRM (Pancake CRM Integration)
Tham chiếu: PRD v1.0 | Ngày: 15/05/2026
Mục đích: Chốt cổng trước deploy + pre-check + rollback + sign-off cho team vận hành (Pilot 4 tuần W1-W4). Đọc trước:
decision-brief.md→ §E1 Cổng → §E2 Pre-check → §E3 Pilot wave gates → §E4 Deploy steps. Canonical:go-live-checklist.mdlà canonical owner của readiness, cổng deploy, rollback, sign-off vận hành.
Đầu vào chuẩn (Canonical Inputs)
| File | Vai trò |
|---|---|
SOURCE_OF_TRUTH.md | Solution Lock 25 DEC — ưu tiên cao nhất |
decision-brief.md | Tóm tắt rủi ro + handoff |
prd.md | Scope, FR/AC, LIFECYCLE, A8 KPI |
dev-spec.md | Migration sequence + Hasura metadata + cron + observability |
qa-test-plan.md | Coverage + chaos test + load test |
handoff.md | RACI + timeline + blocker |
E1) Cổng phát hành (gates per pilot wave)
Pilot 4 tuần với 4 cổng cumulative. Mỗi wave phải pass exit criteria trước khi mở wave kế tiếp.
Cổng tổng quát (W1-W4)
| # | Cổng | Phụ trách | Trạng thái |
|---|---|---|---|
| G-1 | PO ký duyệt PRD (Z + A0-A12) | PO | Chờ |
| G-2 | Tech Lead ký duyệt dev-spec (C1-C12) + schema review với DBA | Tech Lead | Chờ |
| G-3 | QA pass TC-001..018 trên staging (TC pass rate ≥ 95%) | QA | Chờ |
| G-4 | Performance verify: webhook latency p95 < 1s, event process p95 < 30s, load test 100 req/s sustained 10 phút | BE + DevOps | Chờ |
| G-5 | Chaos test outage pass: 4-layer recovery (webhook + Cron 6 polling + Cron 7 reconciliation + DLQ replay) | QA | Chờ |
| G-6 | Smoke test staging với Pancake webhook.site simulation | QA + PO | Chờ |
| G-7 | Monitoring + alertmanager rules deployed: 5 alerts active (success_rate, outage, dlq_count, reconciliation_miss, webhook_error_rate) | DevOps | Chờ |
| G-8 | Rollback plan documented + tested (down.sql + Hasura metadata revert + feature flag kill switch) | BE + DevOps | Chờ |
| G-9 | KPI baseline manual workflow đo PD-011 (Manager 1 tuần đo response time hiện tại) | PO + Manager | Chờ |
| G-10 | Out-of-band PD-001/002 (Pancake support HMAC + IP whitelist) — Block W4 only | Ops | Chờ |
| G-11 | Out-of-band PD-003 (customer_consent.consent_data.marketing key) — Block Phase 4 SPECIFY | BE + PO | Chờ |
| G-12 | Webhook URL setup ở Pancake admin: https://diva.com.vn/api/pancake/record/{token} | Ops | Chờ |
Exit criteria per Pilot wave
Wave 1 (W1) — 1 source low-volume
| Criteria | Target | Cách đo |
|---|---|---|
| Event success rate (W1 4 tuần đầu thực tế chỉ 1 tuần) | ≥ 99.5% | count(status='processed')/total 24h |
| Webhook latency p95 | < 1s | Prometheus histogram |
| Process latency p95 | < 30s | SQL query pancake_webhook_event |
| Zero-miss SLO | 0 events/ngày | Cron 7 daily report |
| Telesale survey feedback (4-5/5) | ≥ 4/5 | Survey 1 telesale active source W1 |
| No Pancake suspension event | 0 | pancake_connection.status history |
| TC-001..010 pass (webhook + process happy path) | 100% | QA execution |
→ Pass W1 exit → cho phép enable W2.
Wave 2 (W2) — 5 source mid-volume
| Criteria | Target | Cách đo |
|---|---|---|
| Event success rate cumulative W1+W2 | ≥ 99.5% | Cumulative SQL |
| Manager intervention rate (manual assign manual) | < 20% | count(ticket WHERE assignee_id changed manually) / total |
| Telesale survey (5-10 người) | ≥ 4/5 | Survey end W2 |
| Auto-assign correct rate | ≥ 85% (sẽ ≥ 90% sau W4) | SQL aggregate |
| TC-008 round-robin tests pass | 100% | QA |
| DLQ replay test pass (Tab 3) | ≥ 1 successful replay test | Manual admin test |
→ Pass W2 exit → cho phép enable W3.
Wave 3 (W3) — 15 source high-volume + load test
| Criteria | Target | Cách đo |
|---|---|---|
| Event success rate | ≥ 99.5% | SQL |
| Webhook latency p95 dưới load | < 1s sustained | Load test K6 100 req/s × 10 phút |
| DB performance | No sequential scan trên pancake_webhook_event.normalized_phone | EXPLAIN ANALYZE |
| Cron 7 reconciliation duration | < 30 phút | Cron timing log |
| Memory webhook | < 256 MB per pod | Prometheus |
| Chaos test outage | 4-layer recovery pass | TC-014-001..015 |
| Circuit breaker test (Pancake REST timeout) | Open sau 5 fail/60s, half-open sau 30s | TC-013-003..005 |
→ Pass W3 exit → cho phép enable W4 production.
Wave 4 (W4) — 40+ source all production
| Criteria | Target | Cách đo |
|---|---|---|
| Toàn bộ KPI 5 metric đạt target | Latency ≤30s, SLA ≤30min, auto-assign ≥90%, success ≥99.5%, zero-miss=0 | Grafana dashboard |
| Manager survey: "ngừng copy-paste lead Pancake" | 100% (15/15 manager) | Survey W4 end |
| PD-001 HMAC signature enforcement (nếu Pancake support response) | Enforced | Header verify |
| PD-002 IP whitelist enforcement | Enforced | Middleware allowlist |
| Pancake không bị suspended | 0 lần | Connection status history |
| Cron 7 daily reconciliation missing events | < 5 events/ngày (DEC-025) | Cron report |
→ Sign-off Go-Live → AUTO-PUBLISH Phase 7.
E2) Kiểm tra trước triển khai code (Dev pre-flight)
BE Dev pre-flight
- [ ]
go test ./...pass trên main branch (no regression) - [ ] Hasura console access verified, metadata version khớp staging
- [ ] Latest migration check:
\d+ account→phone_code, phone_number, phone_enabledđã exist, KHÔNG cónormalized_phone, pancake_metadata(sẽ ADD) - [ ] Verify schema queries:sql
SELECT column_name FROM information_schema.columns WHERE table_name='account' AND column_name IN ('normalized_phone', 'pancake_metadata'); -- Expected: 0 rows (chưa có, sẽ ADD) SELECT column_name FROM information_schema.columns WHERE table_name='contact_book' AND column_name='primary_phone'; -- Expected: 0 rows (chưa có, sẽ ADD per DEC-006) SELECT table_name FROM information_schema.tables WHERE table_name LIKE 'pancake_%'; -- Expected: 0 rows (chưa có, sẽ CREATE 4 bảng) SELECT id, name FROM crm_master_data WHERE type='ticket_source'; -- Expected: 8 rows (ticket_source_1..8), sẽ INSERT thêm ticket_source_pancake SELECT timestamp FROM controller.migration_log ORDER BY timestamp DESC LIMIT 1; -- Expected: 1777870069927 (latest verified Phase 3), migration mới sẽ dùng 1777870069928+ - [ ] libphonenumber dependency verified trong
go.modline 32 (github.com/ttacon/libphonenumber) - [ ]
sony/gobreakerdependency ADD trong go.mod (sẽ verify sau khi PR merged) - [ ] Pre-check anchor SQL pattern (theo CLAUDE.md pitfalls #4) cho mọi migration UPDATE/REPLACE — PL/pgSQL với 4-layer protection
FE Dev pre-flight
- [ ]
pnpm codegenpass (Hasura schema introspection sau migration apply) - [ ] Module Settings accessible tại
/admin/settings/pancake-crm - [ ] Test account Admin role có sẵn trên staging
- [ ] Cross-browser baseline: Chrome 120+, Safari 16+, Firefox 120+
- [ ] Vue 3 + Quasar 2.x compatibility verified
- [ ] URQL composables hot-reload working
QA pre-flight
- [ ] Test data set chuẩn bị: 10 Pancake sample payload (happy path + 5 edge case + 3 invalid + 2 duplicate)
- [ ] webhook.site URL available để simulate Pancake → Diva test (W1)
- [ ] Load test tool (K6/JMeter) configured cho 100 req/s
- [ ] Chaos test scripts cho outage simulation (block Pancake REST + simulate webhook drop)
- [ ] Test account Admin/Manager/Telesale roles có sẵn
Ops / DevOps pre-flight
- [ ] Reverse proxy (Nginx/Cloudflare) configured cho
https://diva.com.vn/api/pancake/record/{token}route → webhook service - [ ] TLS certificate valid + auto-renew
- [ ] Rate limit 1000 req/s/IP cho webhook endpoint
- [ ] Prometheus scrape config thêm webhook + crm-api metrics endpoint
- [ ] Alertmanager rules deployed (5 rules: success_rate, outage_active, dlq_count, reconciliation_miss, webhook_error_rate)
- [ ] On-call rotation cập nhật: ops + tech lead cho 4 tuần pilot
E3) Kiểm tra trước deploy (per migration + per service)
BE deploy pre-check
- [ ] Tất cả 8 migration đã test trên staging (
up.sql+down.sqlrunnable) - [ ] Hasura metadata YAML đã apply staging + verify event trigger fire (test event)
- [ ]
EXPLAIN ANALYZEshows:accountlookup bynormalized_phonedùng indexaccount_normalized_phone_idx, KHÔNG sequential scanpancake_webhook_eventlookup by(record_id, modified_on, payload_hash)dùng UNIQUE constraint
- [ ] Script rollback đã test:bash→ Schema revert clean, no orphan data
hasura migrate apply --down 1 --database-name default - [ ] Backfill
normalized_phonescript dry-run trên copy of production data (1M account → ~8 phút expected) - [ ] Capacity test: insert 1000 dummy
pancake_webhook_eventrows → query performance < 100ms
FE deploy pre-check
- [ ]
pnpm codegenpass post-migration apply (Hasura introspection updated) - [ ]
pnpm buildpass, bundle size delta < 50KB acceptable - [ ] Đã test trên staging với dữ liệu thật (10 sample Pancake event)
- [ ] Cross-browser smoke test Chrome/Safari/Firefox: Settings/Pancake 4 tabs render OK, replay action OK
- [ ] Permission test: Manager/Telesale role → Settings/Pancake hidden completely (no menu, redirect home)
Ops / TL deploy pre-check
- [ ] Database snapshot trước deploy (PostgreSQL
pg_dumphoặc cloud provider snapshot) - [ ] Grafana dashboard skeleton deployed (5 KPI section)
- [ ] Rollback plan documented + tested (xem §E5)
- [ ] PagerDuty/alert routing verified (5 alerts → ops on-call)
- [ ] Pancake admin webhook URL setup: chỉ enable cho W1 source, kill switch ON ban đầu
E4) Các bước deploy (10 bước sequential)
| # | Bước | Phụ trách | Verify | Rollback |
|---|---|---|---|---|
| 1 | Apply migration 1777870069928 (ALTER account ADD normalized_phone + pancake_metadata + index) | BE | \d account shows new columns | down.sql DROP COLUMN |
| 2 | Apply migration 1777870069929 (Backfill normalized_phone script, ~8 phút for 1M accounts) | BE | SELECT COUNT(*) FROM account WHERE normalized_phone IS NOT NULL AND phone_enabled ≈ active phone count | UPDATE SET normalized_phone=NULL |
| 3 | Apply migration 1777870069930 (ALTER contact_book ADD primary_phone + FK + UNIQUE — DEC-006) | BE | \d contact_book shows new column + constraints | DROP COLUMN + DROP CONSTRAINT |
| 4 | Apply migration 1777870069931 (CREATE 5 pancake tables + indexes) | BE | \dt pancake_* returns 5 rows | DROP TABLE pancake_* CASCADE |
| 5 | Apply migration 1777870069932 (INSERT crm_master_data ticket_source_pancake) | BE | SELECT * FROM crm_master_data WHERE id='ticket_source_pancake' returns 1 row | DELETE |
| 6 | Apply migration 1777870069933 (INSERT module_permission_action pancake_crm_integration + 6 actions + Admin seed) | BE | Permission check useGlobalStore.hasPermission('pancake_crm_integration', 'access') returns true cho Admin | DELETE rows |
| 7 | Apply migration 1777870069934 (INSERT 3 notification_template) | BE | SELECT * FROM notification_template WHERE code LIKE 'noti_pancake%' returns 3 rows | DELETE |
| 8 | Apply migration 1777870069935 (Update app_setting JSON ADD key pancake_integration) | BE | SELECT app_settings->'pancake_integration' FROM app_setting WHERE id=1 returns object | UPDATE remove key |
| 9 | Apply Hasura metadata: hasura metadata apply — 4 bảng pancake_* YAML + event trigger pancake_webhook_event_status_update + 7 cron triggers + permission Admin | BE | hasura metadata diff → no diff. Hasura console event triggers tab shows new trigger | hasura metadata apply --from-file backup.yaml |
| 10 | Deploy services/webhook (PR: pancake handler) | DevOps | GET /healthz 200; smoke test webhook.site POST | Revert previous Docker image |
| 11 | Deploy services/crm-api (PR: event handler + 7 cron + REST client + refactor distribute_ticket) | DevOps | GET /healthz 200; smoke test Hasura trigger fire | Revert previous image |
| 12 | Deploy diva-admin FE (PR: 3 file delta + 5 Settings pages mới + routes + menu + permission) | DevOps | Smoke test Admin login → Settings/Pancake menu visible, 4 tabs render | Revert previous build |
| 13 | Pancake admin setup: nhập webhook URL + api_key cho workspace (Ops thao tác Pancake console) | Ops | Pancake "Test webhook" button trả 200 | Disable webhook URL in Pancake admin |
| 14 | Enable W1 pilot: kill switch OFF ban đầu (Settings Tab 1) + chỉ 1 source is_active=true (Settings Tab 2) | PO + Ops | pancake_webhook_event.status='ingested' rows xuất hiện | Toggle kill switch ON → flow paused |
Kiểm tra ngay sau deploy (Day 0)
- [ ] Webhook endpoint accessible:
curl -X POST https://diva.com.vn/api/pancake/record/{token}returns 200 (với valid payload) - [ ] Hasura event trigger fire: simulate UPDATE
pancake_webhook_event.status='received'→ handlerpancake_process_recordđược gọi trong 1-2s - [ ] FE menu Settings/Pancake hiển thị cho Admin, ẩn cho non-admin
- [ ] Cron triggers active:
hasura metadata --cron-eventsshows 7 pancake cron - [ ] No error spike trong webhook + crm-api logs (zerolog)
- [ ] Prometheus metrics scrape OK:
pancake_webhook_requests_totalavailable - [ ] Grafana dashboard render data (sau 5-10 phút có event đầu tiên)
Theo dõi Day 1-7 (W1 pilot)
- [ ] Query performance:
EXPLAIN ANALYZEdaily check, no sequential scan - [ ] No error spike trong logs (đặc biệt
dead_lettercount < 1% of total) - [ ] Telesale W1 feedback (gọi 1-on-1 với 5 telesale active sau 3 ngày)
- [ ] PO weekly review meeting Thứ 6: KPI dashboard + blocker review
- [ ] Cron 7 daily reconciliation report missing events = 0
- [ ] Pancake suspension status check daily:
pancake_connection.status='active'
E5) Rollback plan (per failure mode)
Quy tắc: Mọi rollback đầu tiên dùng feature flag kill switch (lowest blast radius). Migration rollback chỉ dùng khi schema corruption.
Tier 1 — Feature flag rollback (< 5 phút, no downtime)
| Trigger | Action | Verify |
|---|---|---|
| Bug nhẹ ảnh hưởng 1 source | Settings Tab 2: toggle source is_active=false | Webhook nhận source đó → status='skipped_source_disabled' |
| Bug ảnh hưởng tất cả source | Settings Tab 1: toggle kill switch ON | Mọi webhook → status='skipped_kill_switch', no ticket created |
| Pancake gửi spam (rate spike đột biến) | Reverse proxy: tăng rate limit hoặc block IP | Webhook 429 cho IP đó (KHÔNG break flow chính) |
| DLQ explode (> 100 events/giờ status=dead_letter) | Pause Hasura event trigger temporarily (Hasura console) | Trigger stopped, events stay in received |
Tier 2 — Service rollback (5-15 phút, brief downtime)
| Trigger | Action | Verify |
|---|---|---|
Bug trong webhook handler pancake.go | Revert Docker image services/webhook to previous version | GET /healthz 200; old handler logic active |
Bug trong event handler event_pancake_process_record.go | Revert Docker image services/crm-api | Existing 9 handler still work (KHÔNG break Stringee/iCall) |
| Bug FE Settings pages | Revert Docker image diva-admin | Menu Pancake hidden (KHÔNG break Settings cũ) |
Tier 3 — Migration rollback (30 phút, downtime + data risk)
Chỉ dùng nếu schema corruption hoặc constraint conflict. Tier 1+2 phải fail trước khi Tier 3.
| Migration | Down.sql action | Risk |
|---|---|---|
| 1777870069928 (account ALTER) | DROP COLUMN normalized_phone, pancake_metadata + DROP INDEX | Low (no FK depend) |
| 1777870069929 (backfill) | UPDATE account SET normalized_phone=NULL | Low (data revert) |
| 1777870069930 (contact_book ALTER DEC-006) | DROP COLUMN primary_phone + DROP CONSTRAINT FK + UNIQUE | Medium — FK drop có thể leak orphan contact_book rows. Cần verify trước |
| 1777870069931 (CREATE 5 pancake tables) | DROP TABLE CASCADE | High nếu đã có data — backup trước |
| 1777870069932 (INSERT crm_master_data) | DELETE WHERE id='ticket_source_pancake' | High — nếu có ticket nào đã reference source_id này → orphan FK |
| 1777870069933 (permission seed) | DELETE rows | Low |
| 1777870069934 (notification template) | DELETE rows | Low |
| 1777870069935 (app_setting key) | UPDATE remove key | Low |
Cảnh báo migration 1777870069932 rollback: Trước khi DELETE master data row, PHẢI:
sql
-- Pre-check ticket reference
SELECT COUNT(*) FROM ticket WHERE source_id='ticket_source_pancake';
-- Nếu > 0: KHÔNG DELETE. Migrate ticket sang source khác trước (vd 'ticket_source_8') hoặc soft-disable master dataTier 4 — Pancake admin rollback
| Trigger | Action |
|---|---|
| Pancake suspend webhook (auto, sau 80% error/30 phút) | Manual re-enable qua Pancake admin Tools settings + investigate root cause |
| Webhook URL phải đổi (URL change) | Update Pancake admin → new URL + token; old URL còn 24h grace period |
Communication plan rollback
| Severity | Audience | Channel | Timing |
|---|---|---|---|
| Tier 1 (feature flag) | PO + Tech Lead | Slack #pancake-integration | Trong 5 phút |
| Tier 2 (service) | + Ops + DevOps | Slack + email | Trong 15 phút |
| Tier 3 (migration) | + Sếp + toàn team | Slack urgent + email + meeting 30 phút | Trong 30 phút |
| Tier 4 (Pancake admin) | + Marketing (vì Pancake là tool của họ) | Slack + email | Trong 15 phút |
E6) Ký duyệt (Sign-off)
| Nhóm ký duyệt | Người | Ngày | Trạng thái | Ghi chú |
|---|---|---|---|---|
| Nghiệp vụ | PO/BA | //2026 | Chờ | Sau khi G-1, G-3, G-9 pass + W4 KPI đạt target |
| Kỹ thuật | Tech Lead | //2026 | Chờ | Sau khi G-2, G-4, G-5, G-7 pass |
| QA | QA Lead | //2026 | Chờ | Sau khi G-3, G-5, G-6 pass + TC pass rate ≥ 95% |
| Vận hành | DevOps Lead | //2026 | Chờ | Sau khi G-7, G-8 pass + monitoring stable 7 ngày |
| Bảo mật | Security Lead | //2026 | Chờ | Sau khi PD-001/002 (HMAC + IP) enforced |
| Compliance | Legal/Compliance | //2026 | Chờ | Sau khi PD-003 (consent_data.marketing) confirmed |
| Sếp/Business Lead | Director | //2026 | Chờ | Sau toàn bộ trên + Manager survey 100% |
Tiêu chí sign-off cuối cùng (W4 end)
- [ ] Toàn bộ 5 KPI đạt target (Latency p95 ≤30s, SLA p95 ≤30min, Auto-assign ≥90%, Success ≥99.5%, Zero-miss=0)
- [ ] Manager survey: 100% (15/15) "đã ngừng copy-paste lead Pancake"
- [ ] Telesale survey: ≥ 4/5 average (60 telesale)
- [ ] Không có Pancake suspension event trong toàn pilot
- [ ] Cron 7 daily reconciliation: missing events < 5/ngày sustained
- [ ] Chaos test outage 4-layer recovery pass
- [ ] PD-001 + PD-002 enforced (hoặc accepted risk waiver từ Security Lead nếu Pancake support chưa response)
- [ ] PD-003 enforced (
customer_consent.consent_data.marketingkey confirmed) - [ ] Grafana dashboard production stable 7 ngày
- [ ] Documentation published sang dva-doc (Phase 7 AUTO-PUBLISH success)
Post-sign-off
- [ ] Phase 7 AUTO-PUBLISH: invoke
/publish-doc pancake-crm-integration - [ ] Hand-off ops: 7 cron + 1 event trigger + 1 action under ops monitoring
- [ ] MVP-2 backlog: outbound Diva → Pancake (POST records/tickets) — defer trigger TBD
- [ ] Quarterly review: PD-007 (Pancake
ticketevent publish?), PD-010 (Pancake API roadmap)
Hết Go-Live Checklist v1.0. Sẵn sàng pilot W1 (28/05/2026 → 04/06/2026).
E7) Pass 1 Resolutions (Phase 5.2 — additions)
Date: 15/05/2026 | Trigger: QA/DevOps + Tech Lead review.
E7.1) 3 alertmanager rules thêm (DevOps P0-D1)
yaml
# alertmanager rules - pancake.yaml
groups:
- name: pancake_crm_integration
rules:
- alert: PancakeEventProcessLatencyHigh
expr: histogram_quantile(0.95, rate(pancake_event_process_duration_seconds_bucket[5m])) > 30
for: 10m
labels: { severity: critical, team: pancake }
annotations:
summary: "Pancake event process p95 > 30s (SLO breach)"
description: "{{ $value }}s — vượt target ≤30s. Check DB lock, Hasura, network."
runbook_url: "https://docs.diva.com.vn/runbook/pancake-latency-high"
- alert: PancakeWebhookLatencyP99High
expr: histogram_quantile(0.99, rate(pancake_webhook_latency_seconds_bucket[5m])) > 2
for: 5m
labels: { severity: critical, team: pancake }
annotations:
summary: "Pancake webhook p99 latency > 2s"
description: "Tail latency. Check DB connection pool saturation."
- alert: PancakeCircuitBreakerOpenLong
expr: pancake_circuit_breaker_state{name="pancake_rest"} == 2
for: 30m
labels: { severity: critical, team: pancake }
annotations:
summary: "Pancake REST circuit breaker open > 30 phút"
description: "Source sync stopped. New Pancake sources không detect được."E7.2) HMAC verification path spec (DevOps P0-D2, post PD-001)
Decision (TBD-Pancake support response):
| Mục | Default (W1-W3 trust IP+token) | Production (W4+ post PD-001) |
|---|---|---|
| HMAC header name | (no header) | X-Pancake-Signature (TBD verify) |
| Algorithm | (no) | HMAC-SHA256 hex |
| Timestamp tolerance window | (no) | 5 phút (chống replay attack) |
| Verify location | (no) | Nginx Lua reverse proxy TRƯỚC webhook receiver (fail-fast, không count vào Pancake error rate) |
| Secret storage | (no) | pancake_connection.webhook_secret_encrypted (column mới — defer M2 nếu Pancake support response trễ) |
Implementation Nginx Lua snippet (template):
lua
-- nginx/lua/pancake_hmac_verify.lua
local timestamp = ngx.req.get_headers()["X-Pancake-Timestamp"]
local signature = ngx.req.get_headers()["X-Pancake-Signature"]
local body = ngx.req.get_body_data()
-- Verify timestamp tolerance 5 phút
if math.abs(ngx.time() - tonumber(timestamp)) > 300 then
ngx.status = 200 -- still fail-open
ngx.say('{"status":"timestamp_out_of_window"}')
ngx.exit(200)
end
-- Compute expected signature
local secret = ngx.shared.pancake_secrets:get(token_from_url)
local expected = ngx.encode_hex(ngx.hmac_sha256(secret, timestamp .. "." .. body))
if signature ~= expected then
ngx.status = 200
ngx.say('{"status":"hmac_invalid"}')
ngx.exit(200)
end
-- Pass through to webhook handlerE7.3) Deploy step 9 split 9a/9b (DevOps P0-D3)
| Step | Old | New |
|---|---|---|
| 9 | Apply Hasura metadata (table YAML + event trigger + cron triggers + permission) cùng lúc | Split: |
| 9a | — | Apply table YAML + permission + event trigger (KHÔNG cron triggers) — trước services deploy |
| Services deploy | step 10-11 | step 10-11 sau 9a |
| 9b | — | Apply cron triggers SAU services deploy — đảm bảo cron fire vào endpoint đã sẵn sàng |
Verify: hasura metadata diff step 9a → no diff trước services. Sau services lên, apply 9b → cron triggers active.
E7.4) down-soft.sql cho migration rollback safety
sql
-- down-soft.sql cho migration 1777870069932 (insert_ticket_source_pancake)
-- Dùng thay vì DELETE nếu có ticket nào đã reference ticket_source_pancake
UPDATE crm_master_data SET disabled=true, updated_at=NOW()
WHERE id='ticket_source_pancake';
-- Pre-check chạy trước rollback:
-- SELECT COUNT(*) FROM ticket WHERE source_id='ticket_source_pancake';
-- Nếu >0: dùng down-soft.sql. Nếu =0: dùng down.sql DELETE.E7.5) Backfill normalized_phone mitigation (DevOps P0-D5)
External Go job thay migration DO $$ block:
bash
# Schedule chạy 02:00-06:00 ICT (low-traffic window)
# scripts/backfill_normalized_phone.go
# Batch 10k row/transaction + pg_sleep(0.1) giữa batches
# Estimate: 1M accounts × 10k batch × (5s execute + 100ms sleep) = ~10-12 phút wall clock
# Lock impact: chỉ rows trong batch, release sau 5s mỗi batch → production queries không bị block lâuPre-check verify:
- Production
accounttable có ~1M rows - POS + telesale activity overnight thấp (verify với DevOps team observation)
- Prometheus
pg_stat_activitycho thấy không có lock contention > 1s
E7.6) E1 gates update (W1-W4 exit criteria)
| Gate | Cũ | Mới |
|---|---|---|
| W1 exit Telesale survey | n=1 ≥ 4/5 | n≥3 ≥ 4/5 (realistic statistical) |
| W3 exit DB performance | EXPLAIN ANALYZE show no seq scan | Snapshot pg_stat_user_indexes before vs after + EXPLAIN ANALYZE no seq scan |
| W4 exit Manager survey | 100% (15/15) | ≥87% (13/15) — realistic threshold |
| NEW W3 exit: | — | Rollback drill rehearsal pass (Tier 2 service rollback verify MTTR ≤ 15 phút) |
| NEW W4 exit: | — | Pancake support response PD-001/002 (HMAC + IP whitelist) HOẶC formal risk waiver từ Security Lead |
E7.7) Out-of-band action items thêm
| # | Item | Owner | Hạn |
|---|---|---|---|
| 7 | Grep cron_triggers.yaml schedule pattern 0 2 * — verify Pancake Cron 7 (02:00 AM) không conflict existing nightly batch | BE-1 | Trước W1 |
| 8 | Verify PG version production (≥15 cho NULLS NOT DISTINCT, hoặc <15 dùng partial index) | DevOps | Trước W1 |
| 9 | Prometheus retention recheck (15d default vs ~7.5GB storage need cho 100k events/tháng) | DevOps | Trước W3 |
| 10 | PagerDuty rotation schedule per week cho 4 tuần pilot | DevOps lead | Trước W1 |
| 11 | Alert workflow document (ACK responsible, incident escalation, postmortem trigger) | DevOps + Tech Lead | Trước W1 |