Skip to content

Checklist phát hành (Go-Live) — Tích hợp Pancake CRM (Pancake CRM Integration)

Tham chiếu: PRD v1.0 | Ngày: 15/05/2026


Mục đích: Chốt cổng trước deploy + pre-check + rollback + sign-off cho team vận hành (Pilot 4 tuần W1-W4). Đọc trước: decision-brief.md → §E1 Cổng → §E2 Pre-check → §E3 Pilot wave gates → §E4 Deploy steps. Canonical: go-live-checklist.md là canonical owner của readiness, cổng deploy, rollback, sign-off vận hành.

Đầu vào chuẩn (Canonical Inputs)

FileVai trò
SOURCE_OF_TRUTH.mdSolution Lock 25 DEC — ưu tiên cao nhất
decision-brief.mdTóm tắt rủi ro + handoff
prd.mdScope, FR/AC, LIFECYCLE, A8 KPI
dev-spec.mdMigration sequence + Hasura metadata + cron + observability
qa-test-plan.mdCoverage + chaos test + load test
handoff.mdRACI + timeline + blocker

E1) Cổng phát hành (gates per pilot wave)

Pilot 4 tuần với 4 cổng cumulative. Mỗi wave phải pass exit criteria trước khi mở wave kế tiếp.

Cổng tổng quát (W1-W4)

#CổngPhụ tráchTrạng thái
G-1PO ký duyệt PRD (Z + A0-A12)POChờ
G-2Tech Lead ký duyệt dev-spec (C1-C12) + schema review với DBATech LeadChờ
G-3QA pass TC-001..018 trên staging (TC pass rate ≥ 95%)QAChờ
G-4Performance verify: webhook latency p95 < 1s, event process p95 < 30s, load test 100 req/s sustained 10 phútBE + DevOpsChờ
G-5Chaos test outage pass: 4-layer recovery (webhook + Cron 6 polling + Cron 7 reconciliation + DLQ replay)QAChờ
G-6Smoke test staging với Pancake webhook.site simulationQA + POChờ
G-7Monitoring + alertmanager rules deployed: 5 alerts active (success_rate, outage, dlq_count, reconciliation_miss, webhook_error_rate)DevOpsChờ
G-8Rollback plan documented + tested (down.sql + Hasura metadata revert + feature flag kill switch)BE + DevOpsChờ
G-9KPI baseline manual workflow đo PD-011 (Manager 1 tuần đo response time hiện tại)PO + ManagerChờ
G-10Out-of-band PD-001/002 (Pancake support HMAC + IP whitelist) — Block W4 onlyOpsChờ
G-11Out-of-band PD-003 (customer_consent.consent_data.marketing key) — Block Phase 4 SPECIFYBE + POChờ
G-12Webhook URL setup ở Pancake admin: https://diva.com.vn/api/pancake/record/{token}OpsChờ

Exit criteria per Pilot wave

Wave 1 (W1) — 1 source low-volume

CriteriaTargetCách đo
Event success rate (W1 4 tuần đầu thực tế chỉ 1 tuần)≥ 99.5%count(status='processed')/total 24h
Webhook latency p95< 1sPrometheus histogram
Process latency p95< 30sSQL query pancake_webhook_event
Zero-miss SLO0 events/ngàyCron 7 daily report
Telesale survey feedback (4-5/5)≥ 4/5Survey 1 telesale active source W1
No Pancake suspension event0pancake_connection.status history
TC-001..010 pass (webhook + process happy path)100%QA execution

Pass W1 exit → cho phép enable W2.

Wave 2 (W2) — 5 source mid-volume

CriteriaTargetCách đo
Event success rate cumulative W1+W2≥ 99.5%Cumulative SQL
Manager intervention rate (manual assign manual)< 20%count(ticket WHERE assignee_id changed manually) / total
Telesale survey (5-10 người)≥ 4/5Survey end W2
Auto-assign correct rate≥ 85% (sẽ ≥ 90% sau W4)SQL aggregate
TC-008 round-robin tests pass100%QA
DLQ replay test pass (Tab 3)≥ 1 successful replay testManual admin test

Pass W2 exit → cho phép enable W3.

Wave 3 (W3) — 15 source high-volume + load test

CriteriaTargetCách đo
Event success rate≥ 99.5%SQL
Webhook latency p95 dưới load< 1s sustainedLoad test K6 100 req/s × 10 phút
DB performanceNo sequential scan trên pancake_webhook_event.normalized_phoneEXPLAIN ANALYZE
Cron 7 reconciliation duration< 30 phútCron timing log
Memory webhook< 256 MB per podPrometheus
Chaos test outage4-layer recovery passTC-014-001..015
Circuit breaker test (Pancake REST timeout)Open sau 5 fail/60s, half-open sau 30sTC-013-003..005

Pass W3 exit → cho phép enable W4 production.

Wave 4 (W4) — 40+ source all production

CriteriaTargetCách đo
Toàn bộ KPI 5 metric đạt targetLatency ≤30s, SLA ≤30min, auto-assign ≥90%, success ≥99.5%, zero-miss=0Grafana dashboard
Manager survey: "ngừng copy-paste lead Pancake"100% (15/15 manager)Survey W4 end
PD-001 HMAC signature enforcement (nếu Pancake support response)EnforcedHeader verify
PD-002 IP whitelist enforcementEnforcedMiddleware allowlist
Pancake không bị suspended0 lầnConnection status history
Cron 7 daily reconciliation missing events< 5 events/ngày (DEC-025)Cron report

Sign-off Go-Live → AUTO-PUBLISH Phase 7.


E2) Kiểm tra trước triển khai code (Dev pre-flight)

BE Dev pre-flight

  • [ ] go test ./... pass trên main branch (no regression)
  • [ ] Hasura console access verified, metadata version khớp staging
  • [ ] Latest migration check: \d+ accountphone_code, phone_number, phone_enabled đã exist, KHÔNG có normalized_phone, pancake_metadata (sẽ ADD)
  • [ ] Verify schema queries:
    sql
    SELECT column_name FROM information_schema.columns WHERE table_name='account' AND column_name IN ('normalized_phone', 'pancake_metadata');
    -- Expected: 0 rows (chưa có, sẽ ADD)
    
    SELECT column_name FROM information_schema.columns WHERE table_name='contact_book' AND column_name='primary_phone';
    -- Expected: 0 rows (chưa có, sẽ ADD per DEC-006)
    
    SELECT table_name FROM information_schema.tables WHERE table_name LIKE 'pancake_%';
    -- Expected: 0 rows (chưa có, sẽ CREATE 4 bảng)
    
    SELECT id, name FROM crm_master_data WHERE type='ticket_source';
    -- Expected: 8 rows (ticket_source_1..8), sẽ INSERT thêm ticket_source_pancake
    
    SELECT timestamp FROM controller.migration_log ORDER BY timestamp DESC LIMIT 1;
    -- Expected: 1777870069927 (latest verified Phase 3), migration mới sẽ dùng 1777870069928+
  • [ ] libphonenumber dependency verified trong go.mod line 32 (github.com/ttacon/libphonenumber)
  • [ ] sony/gobreaker dependency ADD trong go.mod (sẽ verify sau khi PR merged)
  • [ ] Pre-check anchor SQL pattern (theo CLAUDE.md pitfalls #4) cho mọi migration UPDATE/REPLACE — PL/pgSQL với 4-layer protection

FE Dev pre-flight

  • [ ] pnpm codegen pass (Hasura schema introspection sau migration apply)
  • [ ] Module Settings accessible tại /admin/settings/pancake-crm
  • [ ] Test account Admin role có sẵn trên staging
  • [ ] Cross-browser baseline: Chrome 120+, Safari 16+, Firefox 120+
  • [ ] Vue 3 + Quasar 2.x compatibility verified
  • [ ] URQL composables hot-reload working

QA pre-flight

  • [ ] Test data set chuẩn bị: 10 Pancake sample payload (happy path + 5 edge case + 3 invalid + 2 duplicate)
  • [ ] webhook.site URL available để simulate Pancake → Diva test (W1)
  • [ ] Load test tool (K6/JMeter) configured cho 100 req/s
  • [ ] Chaos test scripts cho outage simulation (block Pancake REST + simulate webhook drop)
  • [ ] Test account Admin/Manager/Telesale roles có sẵn

Ops / DevOps pre-flight

  • [ ] Reverse proxy (Nginx/Cloudflare) configured cho https://diva.com.vn/api/pancake/record/{token} route → webhook service
  • [ ] TLS certificate valid + auto-renew
  • [ ] Rate limit 1000 req/s/IP cho webhook endpoint
  • [ ] Prometheus scrape config thêm webhook + crm-api metrics endpoint
  • [ ] Alertmanager rules deployed (5 rules: success_rate, outage_active, dlq_count, reconciliation_miss, webhook_error_rate)
  • [ ] On-call rotation cập nhật: ops + tech lead cho 4 tuần pilot

E3) Kiểm tra trước deploy (per migration + per service)

BE deploy pre-check

  • [ ] Tất cả 8 migration đã test trên staging (up.sql + down.sql runnable)
  • [ ] Hasura metadata YAML đã apply staging + verify event trigger fire (test event)
  • [ ] EXPLAIN ANALYZE shows:
    • account lookup by normalized_phone dùng index account_normalized_phone_idx, KHÔNG sequential scan
    • pancake_webhook_event lookup by (record_id, modified_on, payload_hash) dùng UNIQUE constraint
  • [ ] Script rollback đã test:
    bash
    hasura migrate apply --down 1 --database-name default
    → Schema revert clean, no orphan data
  • [ ] Backfill normalized_phone script dry-run trên copy of production data (1M account → ~8 phút expected)
  • [ ] Capacity test: insert 1000 dummy pancake_webhook_event rows → query performance < 100ms

FE deploy pre-check

  • [ ] pnpm codegen pass post-migration apply (Hasura introspection updated)
  • [ ] pnpm build pass, bundle size delta < 50KB acceptable
  • [ ] Đã test trên staging với dữ liệu thật (10 sample Pancake event)
  • [ ] Cross-browser smoke test Chrome/Safari/Firefox: Settings/Pancake 4 tabs render OK, replay action OK
  • [ ] Permission test: Manager/Telesale role → Settings/Pancake hidden completely (no menu, redirect home)

Ops / TL deploy pre-check

  • [ ] Database snapshot trước deploy (PostgreSQL pg_dump hoặc cloud provider snapshot)
  • [ ] Grafana dashboard skeleton deployed (5 KPI section)
  • [ ] Rollback plan documented + tested (xem §E5)
  • [ ] PagerDuty/alert routing verified (5 alerts → ops on-call)
  • [ ] Pancake admin webhook URL setup: chỉ enable cho W1 source, kill switch ON ban đầu

E4) Các bước deploy (10 bước sequential)

#BướcPhụ tráchVerifyRollback
1Apply migration 1777870069928 (ALTER account ADD normalized_phone + pancake_metadata + index)BE\d account shows new columnsdown.sql DROP COLUMN
2Apply migration 1777870069929 (Backfill normalized_phone script, ~8 phút for 1M accounts)BESELECT COUNT(*) FROM account WHERE normalized_phone IS NOT NULL AND phone_enabled ≈ active phone countUPDATE SET normalized_phone=NULL
3Apply migration 1777870069930 (ALTER contact_book ADD primary_phone + FK + UNIQUE — DEC-006)BE\d contact_book shows new column + constraintsDROP COLUMN + DROP CONSTRAINT
4Apply migration 1777870069931 (CREATE 5 pancake tables + indexes)BE\dt pancake_* returns 5 rowsDROP TABLE pancake_* CASCADE
5Apply migration 1777870069932 (INSERT crm_master_data ticket_source_pancake)BESELECT * FROM crm_master_data WHERE id='ticket_source_pancake' returns 1 rowDELETE
6Apply migration 1777870069933 (INSERT module_permission_action pancake_crm_integration + 6 actions + Admin seed)BEPermission check useGlobalStore.hasPermission('pancake_crm_integration', 'access') returns true cho AdminDELETE rows
7Apply migration 1777870069934 (INSERT 3 notification_template)BESELECT * FROM notification_template WHERE code LIKE 'noti_pancake%' returns 3 rowsDELETE
8Apply migration 1777870069935 (Update app_setting JSON ADD key pancake_integration)BESELECT app_settings->'pancake_integration' FROM app_setting WHERE id=1 returns objectUPDATE remove key
9Apply Hasura metadata: hasura metadata apply — 4 bảng pancake_* YAML + event trigger pancake_webhook_event_status_update + 7 cron triggers + permission AdminBEhasura metadata diff → no diff. Hasura console event triggers tab shows new triggerhasura metadata apply --from-file backup.yaml
10Deploy services/webhook (PR: pancake handler)DevOpsGET /healthz 200; smoke test webhook.site POSTRevert previous Docker image
11Deploy services/crm-api (PR: event handler + 7 cron + REST client + refactor distribute_ticket)DevOpsGET /healthz 200; smoke test Hasura trigger fireRevert previous image
12Deploy diva-admin FE (PR: 3 file delta + 5 Settings pages mới + routes + menu + permission)DevOpsSmoke test Admin login → Settings/Pancake menu visible, 4 tabs renderRevert previous build
13Pancake admin setup: nhập webhook URL + api_key cho workspace (Ops thao tác Pancake console)OpsPancake "Test webhook" button trả 200Disable webhook URL in Pancake admin
14Enable W1 pilot: kill switch OFF ban đầu (Settings Tab 1) + chỉ 1 source is_active=true (Settings Tab 2)PO + Opspancake_webhook_event.status='ingested' rows xuất hiệnToggle kill switch ON → flow paused

Kiểm tra ngay sau deploy (Day 0)

  • [ ] Webhook endpoint accessible: curl -X POST https://diva.com.vn/api/pancake/record/{token} returns 200 (với valid payload)
  • [ ] Hasura event trigger fire: simulate UPDATE pancake_webhook_event.status='received' → handler pancake_process_record được gọi trong 1-2s
  • [ ] FE menu Settings/Pancake hiển thị cho Admin, ẩn cho non-admin
  • [ ] Cron triggers active: hasura metadata --cron-events shows 7 pancake cron
  • [ ] No error spike trong webhook + crm-api logs (zerolog)
  • [ ] Prometheus metrics scrape OK: pancake_webhook_requests_total available
  • [ ] Grafana dashboard render data (sau 5-10 phút có event đầu tiên)

Theo dõi Day 1-7 (W1 pilot)

  • [ ] Query performance: EXPLAIN ANALYZE daily check, no sequential scan
  • [ ] No error spike trong logs (đặc biệt dead_letter count < 1% of total)
  • [ ] Telesale W1 feedback (gọi 1-on-1 với 5 telesale active sau 3 ngày)
  • [ ] PO weekly review meeting Thứ 6: KPI dashboard + blocker review
  • [ ] Cron 7 daily reconciliation report missing events = 0
  • [ ] Pancake suspension status check daily: pancake_connection.status='active'

E5) Rollback plan (per failure mode)

Quy tắc: Mọi rollback đầu tiên dùng feature flag kill switch (lowest blast radius). Migration rollback chỉ dùng khi schema corruption.

Tier 1 — Feature flag rollback (< 5 phút, no downtime)

TriggerActionVerify
Bug nhẹ ảnh hưởng 1 sourceSettings Tab 2: toggle source is_active=falseWebhook nhận source đó → status='skipped_source_disabled'
Bug ảnh hưởng tất cả sourceSettings Tab 1: toggle kill switch ONMọi webhook → status='skipped_kill_switch', no ticket created
Pancake gửi spam (rate spike đột biến)Reverse proxy: tăng rate limit hoặc block IPWebhook 429 cho IP đó (KHÔNG break flow chính)
DLQ explode (> 100 events/giờ status=dead_letter)Pause Hasura event trigger temporarily (Hasura console)Trigger stopped, events stay in received

Tier 2 — Service rollback (5-15 phút, brief downtime)

TriggerActionVerify
Bug trong webhook handler pancake.goRevert Docker image services/webhook to previous versionGET /healthz 200; old handler logic active
Bug trong event handler event_pancake_process_record.goRevert Docker image services/crm-apiExisting 9 handler still work (KHÔNG break Stringee/iCall)
Bug FE Settings pagesRevert Docker image diva-adminMenu Pancake hidden (KHÔNG break Settings cũ)

Tier 3 — Migration rollback (30 phút, downtime + data risk)

Chỉ dùng nếu schema corruption hoặc constraint conflict. Tier 1+2 phải fail trước khi Tier 3.

MigrationDown.sql actionRisk
1777870069928 (account ALTER)DROP COLUMN normalized_phone, pancake_metadata + DROP INDEXLow (no FK depend)
1777870069929 (backfill)UPDATE account SET normalized_phone=NULLLow (data revert)
1777870069930 (contact_book ALTER DEC-006)DROP COLUMN primary_phone + DROP CONSTRAINT FK + UNIQUEMedium — FK drop có thể leak orphan contact_book rows. Cần verify trước
1777870069931 (CREATE 5 pancake tables)DROP TABLE CASCADEHigh nếu đã có data — backup trước
1777870069932 (INSERT crm_master_data)DELETE WHERE id='ticket_source_pancake'High — nếu có ticket nào đã reference source_id này → orphan FK
1777870069933 (permission seed)DELETE rowsLow
1777870069934 (notification template)DELETE rowsLow
1777870069935 (app_setting key)UPDATE remove keyLow

Cảnh báo migration 1777870069932 rollback: Trước khi DELETE master data row, PHẢI:

sql
-- Pre-check ticket reference
SELECT COUNT(*) FROM ticket WHERE source_id='ticket_source_pancake';
-- Nếu > 0: KHÔNG DELETE. Migrate ticket sang source khác trước (vd 'ticket_source_8') hoặc soft-disable master data

Tier 4 — Pancake admin rollback

TriggerAction
Pancake suspend webhook (auto, sau 80% error/30 phút)Manual re-enable qua Pancake admin Tools settings + investigate root cause
Webhook URL phải đổi (URL change)Update Pancake admin → new URL + token; old URL còn 24h grace period

Communication plan rollback

SeverityAudienceChannelTiming
Tier 1 (feature flag)PO + Tech LeadSlack #pancake-integrationTrong 5 phút
Tier 2 (service)+ Ops + DevOpsSlack + emailTrong 15 phút
Tier 3 (migration)+ Sếp + toàn teamSlack urgent + email + meeting 30 phútTrong 30 phút
Tier 4 (Pancake admin)+ Marketing (vì Pancake là tool của họ)Slack + emailTrong 15 phút

E6) Ký duyệt (Sign-off)

Nhóm ký duyệtNgườiNgàyTrạng tháiGhi chú
Nghiệp vụPO/BA//2026ChờSau khi G-1, G-3, G-9 pass + W4 KPI đạt target
Kỹ thuậtTech Lead//2026ChờSau khi G-2, G-4, G-5, G-7 pass
QAQA Lead//2026ChờSau khi G-3, G-5, G-6 pass + TC pass rate ≥ 95%
Vận hànhDevOps Lead//2026ChờSau khi G-7, G-8 pass + monitoring stable 7 ngày
Bảo mậtSecurity Lead//2026ChờSau khi PD-001/002 (HMAC + IP) enforced
ComplianceLegal/Compliance//2026ChờSau khi PD-003 (consent_data.marketing) confirmed
Sếp/Business LeadDirector//2026ChờSau toàn bộ trên + Manager survey 100%

Tiêu chí sign-off cuối cùng (W4 end)

  • [ ] Toàn bộ 5 KPI đạt target (Latency p95 ≤30s, SLA p95 ≤30min, Auto-assign ≥90%, Success ≥99.5%, Zero-miss=0)
  • [ ] Manager survey: 100% (15/15) "đã ngừng copy-paste lead Pancake"
  • [ ] Telesale survey: ≥ 4/5 average (60 telesale)
  • [ ] Không có Pancake suspension event trong toàn pilot
  • [ ] Cron 7 daily reconciliation: missing events < 5/ngày sustained
  • [ ] Chaos test outage 4-layer recovery pass
  • [ ] PD-001 + PD-002 enforced (hoặc accepted risk waiver từ Security Lead nếu Pancake support chưa response)
  • [ ] PD-003 enforced (customer_consent.consent_data.marketing key confirmed)
  • [ ] Grafana dashboard production stable 7 ngày
  • [ ] Documentation published sang dva-doc (Phase 7 AUTO-PUBLISH success)

Post-sign-off

  • [ ] Phase 7 AUTO-PUBLISH: invoke /publish-doc pancake-crm-integration
  • [ ] Hand-off ops: 7 cron + 1 event trigger + 1 action under ops monitoring
  • [ ] MVP-2 backlog: outbound Diva → Pancake (POST records/tickets) — defer trigger TBD
  • [ ] Quarterly review: PD-007 (Pancake ticket event publish?), PD-010 (Pancake API roadmap)

Hết Go-Live Checklist v1.0. Sẵn sàng pilot W1 (28/05/2026 → 04/06/2026).


E7) Pass 1 Resolutions (Phase 5.2 — additions)

Date: 15/05/2026 | Trigger: QA/DevOps + Tech Lead review.

E7.1) 3 alertmanager rules thêm (DevOps P0-D1)

yaml
# alertmanager rules - pancake.yaml
groups:
  - name: pancake_crm_integration
    rules:
      - alert: PancakeEventProcessLatencyHigh
        expr: histogram_quantile(0.95, rate(pancake_event_process_duration_seconds_bucket[5m])) > 30
        for: 10m
        labels: { severity: critical, team: pancake }
        annotations:
          summary: "Pancake event process p95 > 30s (SLO breach)"
          description: "{{ $value }}s — vượt target ≤30s. Check DB lock, Hasura, network."
          runbook_url: "https://docs.diva.com.vn/runbook/pancake-latency-high"

      - alert: PancakeWebhookLatencyP99High
        expr: histogram_quantile(0.99, rate(pancake_webhook_latency_seconds_bucket[5m])) > 2
        for: 5m
        labels: { severity: critical, team: pancake }
        annotations:
          summary: "Pancake webhook p99 latency > 2s"
          description: "Tail latency. Check DB connection pool saturation."

      - alert: PancakeCircuitBreakerOpenLong
        expr: pancake_circuit_breaker_state{name="pancake_rest"} == 2
        for: 30m
        labels: { severity: critical, team: pancake }
        annotations:
          summary: "Pancake REST circuit breaker open > 30 phút"
          description: "Source sync stopped. New Pancake sources không detect được."

E7.2) HMAC verification path spec (DevOps P0-D2, post PD-001)

Decision (TBD-Pancake support response):

MụcDefault (W1-W3 trust IP+token)Production (W4+ post PD-001)
HMAC header name(no header)X-Pancake-Signature (TBD verify)
Algorithm(no)HMAC-SHA256 hex
Timestamp tolerance window(no)5 phút (chống replay attack)
Verify location(no)Nginx Lua reverse proxy TRƯỚC webhook receiver (fail-fast, không count vào Pancake error rate)
Secret storage(no)pancake_connection.webhook_secret_encrypted (column mới — defer M2 nếu Pancake support response trễ)

Implementation Nginx Lua snippet (template):

lua
-- nginx/lua/pancake_hmac_verify.lua
local timestamp = ngx.req.get_headers()["X-Pancake-Timestamp"]
local signature = ngx.req.get_headers()["X-Pancake-Signature"]
local body = ngx.req.get_body_data()

-- Verify timestamp tolerance 5 phút
if math.abs(ngx.time() - tonumber(timestamp)) > 300 then
  ngx.status = 200 -- still fail-open
  ngx.say('{"status":"timestamp_out_of_window"}')
  ngx.exit(200)
end

-- Compute expected signature
local secret = ngx.shared.pancake_secrets:get(token_from_url)
local expected = ngx.encode_hex(ngx.hmac_sha256(secret, timestamp .. "." .. body))

if signature ~= expected then
  ngx.status = 200
  ngx.say('{"status":"hmac_invalid"}')
  ngx.exit(200)
end

-- Pass through to webhook handler

E7.3) Deploy step 9 split 9a/9b (DevOps P0-D3)

StepOldNew
9Apply Hasura metadata (table YAML + event trigger + cron triggers + permission) cùng lúcSplit:
9aApply table YAML + permission + event trigger (KHÔNG cron triggers) — trước services deploy
Services deploystep 10-11step 10-11 sau 9a
9bApply cron triggers SAU services deploy — đảm bảo cron fire vào endpoint đã sẵn sàng

Verify: hasura metadata diff step 9a → no diff trước services. Sau services lên, apply 9b → cron triggers active.

E7.4) down-soft.sql cho migration rollback safety

sql
-- down-soft.sql cho migration 1777870069932 (insert_ticket_source_pancake)
-- Dùng thay vì DELETE nếu có ticket nào đã reference ticket_source_pancake
UPDATE crm_master_data SET disabled=true, updated_at=NOW()
WHERE id='ticket_source_pancake';

-- Pre-check chạy trước rollback:
-- SELECT COUNT(*) FROM ticket WHERE source_id='ticket_source_pancake';
-- Nếu >0: dùng down-soft.sql. Nếu =0: dùng down.sql DELETE.

E7.5) Backfill normalized_phone mitigation (DevOps P0-D5)

External Go job thay migration DO $$ block:

bash
# Schedule chạy 02:00-06:00 ICT (low-traffic window)
# scripts/backfill_normalized_phone.go
# Batch 10k row/transaction + pg_sleep(0.1) giữa batches
# Estimate: 1M accounts × 10k batch × (5s execute + 100ms sleep) = ~10-12 phút wall clock
# Lock impact: chỉ rows trong batch, release sau 5s mỗi batch → production queries không bị block lâu

Pre-check verify:

  • Production account table có ~1M rows
  • POS + telesale activity overnight thấp (verify với DevOps team observation)
  • Prometheus pg_stat_activity cho thấy không có lock contention > 1s

E7.6) E1 gates update (W1-W4 exit criteria)

GateMới
W1 exit Telesale surveyn=1 ≥ 4/5n≥3 ≥ 4/5 (realistic statistical)
W3 exit DB performanceEXPLAIN ANALYZE show no seq scanSnapshot pg_stat_user_indexes before vs after + EXPLAIN ANALYZE no seq scan
W4 exit Manager survey100% (15/15)≥87% (13/15) — realistic threshold
NEW W3 exit:Rollback drill rehearsal pass (Tier 2 service rollback verify MTTR ≤ 15 phút)
NEW W4 exit:Pancake support response PD-001/002 (HMAC + IP whitelist) HOẶC formal risk waiver từ Security Lead

E7.7) Out-of-band action items thêm

#ItemOwnerHạn
7Grep cron_triggers.yaml schedule pattern 0 2 * — verify Pancake Cron 7 (02:00 AM) không conflict existing nightly batchBE-1Trước W1
8Verify PG version production (≥15 cho NULLS NOT DISTINCT, hoặc <15 dùng partial index)DevOpsTrước W1
9Prometheus retention recheck (15d default vs ~7.5GB storage need cho 100k events/tháng)DevOpsTrước W3
10PagerDuty rotation schedule per week cho 4 tuần pilotDevOps leadTrước W1
11Alert workflow document (ACK responsible, incident escalation, postmortem trigger)DevOps + Tech LeadTrước W1