Focused Documentation

Section 5 - Technical Architecture

Halaman ini fokus khusus untuk arsitektur teknis CSI, jadi struktur lebih bersih, mudah dipindai, dan nyaman dipakai saat diskusi engineering.

PART 5 — TECHNICAL ARCHITECTURE

5.1 Architecture Philosophy & Principles

Prinsip desain berikut bersifat non-negotiable dan menjadi fondasi seluruh platform.

No Silent Failure

Setiap error wajib tercapture, punya trace ID, dan masuk monitoring. User tidak boleh menerima error tanpa jejak sistem.

Idempotency First

Message yang sama tidak boleh diproses dua kali. Konflik AI vs admin harus terdeteksi otomatis.

Domain-Agnostic Core

Platform core tidak hardcode per-industri. Logic khusus domain ditempatkan di domain layer.

Composable Over Monolithic

Tidak ada super-agent tunggal. Sub-agent dan tools yang composable lebih maintainable, testable, dan scalable.

LTM by Default

Setiap interaksi memperkaya profile pengguna. Memory adalah fitur fundamental, bukan add-on.

Cost-Conscious Design

Setiap call LLM harus justified, dengan routing model Flash vs Pro/Thinking sesuai kompleksitas.

Tenant Isolation

Data, konfigurasi, dan context tiap tenant harus fully isolated dengan enforcement RLS.

5.2 Tech Stack

LayerTechnologyRationale
OrchestrationN8N (self-hosted)Visual, modular, extensible
LLM Primary (Conversational)Gemini 2.5 Flash / Gemini 3 FlashCost-efficient, multimodal
LLM Secondary (Reasoning)Gemini Pro / Thinking (TBD per use-case)Complex reasoning & doc generation
DatabaseSupabase PostgreSQLUnified DB, RLS built-in
Vector Store / RAGSupabase Vector (pgvector)No separate vector DB needed
FrontendNext.jsSSR, API routes, TypeScript
Channel (Primary)Waha (WhatsApp)Production-ready
Channel (Roadmap)WhatsApp Official, Web Widget, TikTok, Meta APIQ3-Q4 2026 expansion
MonitoringN8N logs + custom dashboardPhase 1 monitoring

5.3 Agent Architecture

Arsitektur agent menggunakan 3 layer utama.

Final Architecture

Multi-Layer Agentic System

L0 - Channel Layer

Omnichannel intake and normalization.

WhatsApp

Primary production channel

Web Widget

Website embedded entry

TikTok / Meta

Roadmap channels

Webhook Gateway

Unified inbound handler

L1 - Orchestrator

Main reasoning, policy guard, and final response owner.

Main CS Agent

Intent, tool routing, response generation

Moderation Agent

Safety, abuse, crisis and emergency checks

Fallback Agent

Soft fallback and resilient handoff

L2 - Utility + Domain

Reusable sub-agents and domain-specific execution logic.

Utility Sub-Agents

Time, location, transport, pricing, docs

Domain Sub-Agents

Medical, automotive, F&B, commerce logic

L3 - Specialist (Optional)

Deep specialization for high-complexity cases.

Medical Specialist

Sub-specialist escalation (e.g. oncology)

Business Specialist

High-value financing and niche workflows

Data + Memory Layer

Persistence, retrieval, observability, and tenant isolation.

Supabase PostgreSQL

Tenant-scoped operational store with RLS

Vector + LTM

Semantic recall, summaries, and profile memory

5.3.1 L1 — Orchestrator

Fungsi utama:

  • Inject LTM context
  • Detect intent
  • Call sub-agent / tools
  • Generate final response

Komponen:

  • Main CS Agent
  • Moderation Agent
  • Fallback Agent

L1 bertanggung jawab atas reasoning utama dan koordinasi antar agent.

5.3.2 L2 — Utility & Domain-Specific Layer

Layer ini berisi sub-agent reusable dan domain logic.

5.3.2.1 Utility Sub-Agent

Utility Sub-Agent adalah kumpulan sub-agent modular yang reusable lintas industri. Setiap sub-agent bersifat:

  • Single responsibility
  • Stateless (tidak menyimpan state internal)
  • Structured input/output (JSON schema)
  • Fully composable oleh L1 Orchestrator

A. Time Management

  • TIME_PARSER
  • TIME_SLOT_CHECKER
  • TIME_CONFLICT_DETECTOR
  • TIME_WINDOW_GENERATOR
  • TIME_FORMATTER

B. Address & Location

  • ADDRESS_PARSER
  • ADDRESS_GEOCODER
  • ADDRESS_VALIDATOR
  • ADDRESS_DISAMBIGUATOR
  • MAPS_LINK_GENERATOR

C. Transport & Logistics

  • DISTANCE_CALCULATOR
  • ETA_CALCULATOR
  • ZONE_CLASSIFIER
  • ROUTE_OPTIMIZER
  • TRAFFIC_ADVISOR

D. Payment & Pricing

  • PRICE_CALCULATOR
  • PROMO_CODE_VALIDATOR
  • PAYMENT_LINK_GENERATOR
  • PAYMENT_STATUS_CHECKER
  • INVOICE_GENERATOR

E. Document & Content

  • DOCS_GENERATOR
  • RESPONSE_FORMATTER
  • LANGUAGE_MANAGER

A. Time Management Sub-Agent

  1. TIME_PARSER
  • Purpose: Mengubah ekspresi waktu natural menjadi datetime terstruktur.
  • Input: Free text ("besok jam 3", "next Monday 4pm", "in 2 hours")
  • Output: { datetime_utc, datetime_local, timezone, is_ambiguous }
  • Config: default_timezone per tenant
  1. TIME_SLOT_CHECKER
  • Purpose: Validasi ketersediaan slot sesuai operating hours & resource.
  • Input: Parsed datetime + resource_id
  • Output: { available, reason, next_available, alternatives[] }
  • Config: operating_hours, holidays, blocked_dates
  1. TIME_CONFLICT_DETECTOR
  • Purpose: Deteksi tabrakan jadwal.
  • Input: datetime + duration + resource_id
  • Output: { conflict, conflicting_booking, suggestions[] }
  1. TIME_WINDOW_GENERATOR
  • Purpose: Generate daftar slot dalam rentang tanggal.
  • Input: date_range + duration
  • Output: available_slots[]
  • Config: slot_duration, buffer_between_slots
  1. TIME_FORMATTER
  • Purpose: Format datetime ke teks natural sesuai bahasa.
  • Input: datetime + language + format_type
  • Output: Besok jam 15.00 / Tomorrow at 3pm

B. Address & Location Sub-Agent

  1. ADDRESS_PARSER
  • Purpose: Ekstrak alamat & landmark dari teks bebas.
  • Input: Free text address
  • Output: { address_text, landmarks, confidence }
  1. ADDRESS_GEOCODER
  • Purpose: Konversi alamat menjadi koordinat GPS.
  • Input: Parsed address
  • Output: { lat, lng, formatted_address, confidence }
  • Integration: Google Maps API
  1. ADDRESS_VALIDATOR
  • Purpose: Validasi alamat dalam service area.
  • Input: lat/lng + tenant_id
  • Output: { in_service_area, zone_name, surcharge }
  1. ADDRESS_DISAMBIGUATOR
  • Purpose: Klarifikasi alamat ambigu.
  • Output: { candidates[], clarifying_question }
  1. MAPS_LINK_GENERATOR
  • Purpose: Generate Google Maps link.
  • Output: { short_link, directions_link }

C. Transport & Logistics Sub-Agent

  1. DISTANCE_CALCULATOR
  • Purpose: Hitung jarak aktual.
  • Output: { distance_km }
  1. ETA_CALCULATOR
  • Purpose: Estimasi waktu tiba berbasis traffic.
  • Output: { eta_minutes, arrival_time }
  1. ZONE_CLASSIFIER
  • Purpose: Klasifikasi zona untuk pricing & dispatch.
  • Output: { zone_id, zone_tier, surcharge }
  1. ROUTE_OPTIMIZER
  • Purpose: Optimasi urutan kunjungan multi-stop.
  • Output: { optimized_route[], total_distance, total_time }
  1. TRAFFIC_ADVISOR
  • Purpose: Update kondisi lalu lintas real-time.
  • Output: { traffic_status, delay_minutes }

D. Payment & Pricing Sub-Agent

  1. PRICE_CALCULATOR
  • Purpose: Hitung total harga dengan breakdown.
  • Input: line_items + zone + promo
  • Output: { subtotal, fees[], discounts[], tax, total }
  1. PROMO_CODE_VALIDATOR
  • Purpose: Validasi dan apply promo.
  • Output: { valid, discount_amount, rejection_reason }
  1. PAYMENT_LINK_GENERATOR
  • Purpose: Generate payment link.
  • Output: { payment_url, expires_at }
  • Integration: Xendit / Stripe / dll
  1. PAYMENT_STATUS_CHECKER
  • Purpose: Cek status pembayaran.
  • Output: { status, paid_at }
  1. INVOICE_GENERATOR
  • Purpose: Generate invoice PDF.
  • Output: { invoice_url, invoice_number }

E. Document & Content Sub-Agent

  1. DOCS_GENERATOR
  • Purpose: Generate dokumen profesional dari data terstruktur.
  • Output: { document_url }
  1. RESPONSE_FORMATTER
  • Purpose: Adapt output sesuai channel (WA/Web).
  • Output: formatted_messages[]
  1. LANGUAGE_MANAGER
  • Purpose: Menjaga tone & konsistensi bahasa sesuai persona tenant.
  • Output: Polished response text

Utility Sub-Agent bersifat domain-agnostic dan dapat digunakan oleh semua Domain-Specific Sub-Agent tanpa perubahan struktur core.

5.3.2.2 Domain-Specific Sub-Agent

Berisi logic industri tertentu.

Contoh:

  • Medical booking logic
  • Automotive lead qualification
  • F&B order handling

Domain layer hanya mengandung business rules, bukan infrastructure logic.

5.3.3 L3 — Sub-Specialist Agent (Optional)

Digunakan untuk domain dengan sub-specialization mendalam.

Contoh:

  • Pediatric Oncology (Medical)
  • High-value Financing (Automotive)

L3 dipanggil hanya jika diperlukan oleh L2 atau L1.

5.4 Long-Term Memory Architecture

LTM (Long-Term Memory) adalah backbone personalization dan reliability platform CSI. Desainnya domain-agnostic: core schema tetap sama lintas industri, sementara detail field mengikuti template per domain (medical menjadi reference implementation paling lengkap dari CepatSehat Addendum).

5.4.1 Core Data Model (Domain-Agnostic)

  • Hard Fields (JSONB): data penting yang harus bisa dipakai agent secara deterministik.
  • Summaries + Embeddings: ringkasan percakapan untuk semantic recall (hemat token, tetap relevan).
  • Hybrid Memory: gabungan structured fields + narrative summaries.
SQL
-- Master table: domain-agnostic
CREATE TABLE user_profiles (
  user_id       UUID PRIMARY KEY,
  tenant_id     UUID NOT NULL REFERENCES tenants(id),
  full_name          TEXT,
  phone_number       TEXT,
  preferred_language TEXT DEFAULT 'id',
  safety_fields      JSONB DEFAULT '{}',
  preferences        JSONB DEFAULT '{}',
  usage_history      JSONB DEFAULT '{}',
  operational_flags  JSONB DEFAULT '{}',
  stm_summary   TEXT,
  ltm_summary   TEXT,
  consent_given          BOOLEAN DEFAULT false,
  data_retention_until   TIMESTAMP,
  updated_at    TIMESTAMP DEFAULT NOW()
);

CREATE TABLE conversation_summaries (
  id                UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id           UUID REFERENCES user_profiles(user_id),
  tenant_id         UUID REFERENCES tenants(id),
  conversation_date DATE NOT NULL,
  summary           TEXT NOT NULL,
  key_topics        TEXT[],
  sentiment         TEXT,
  action_items      JSONB DEFAULT '[]',
  embedding         VECTOR(768),
  created_at        TIMESTAMP DEFAULT NOW()
);

5.4.2 Hard Fields Reference (Medical)

safety_fields (medical)

  • drug_allergies[] (CRITICAL)
  • pregnancy_status (CRITICAL)
  • g6pd_status (CRITICAL)
  • anticoagulant_use (CRITICAL)
  • ckd_status, ckd_stage (CRITICAL)
  • history_anaphylaxis (CRITICAL)
  • chronic_conditions[]
  • psychiatric_history, cardiac_history_major

preferences (medical)

  • service_preference: home_visit / clinic / telemedicine
  • budget_band: low / mid / high / luxury
  • needle_anxiety
  • communication_style
  • preferred_payment_method

usage_history (medical)

  • iv_sessions_90d (computed)
  • last_iv_protocol
  • last_iv_reaction
  • pharmacy_orders_30d (computed)
  • lab_tests_90d (computed)

operational_flags (medical)

  • vip_tier: standard / silver / gold / platinum
  • no_show_flag, no_show_count_90d
  • risk_flags_computed[]
  • consent_medical_data

5.4.3 Domain Template Pattern (Lintas Industri)

Setiap domain mengikuti 4 bucket yang sama, hanya field definition yang berbeda.

  • MEDICAL: safety_fields, preferences, usage_history, operational_flags
  • AUTOMOTIVE: safety_fields, preferences, usage_history, operational_flags
  • RETAIL / COMMERCE: safety_fields, preferences, usage_history, operational_flags

5.4.4 LTM Pipeline (Hybrid: Live + Scheduled)

Mode 1 — Live Update (Real-Time Capture)

  • Triggered saat percakapan berjalan (critical safety info, explicit preference, STM mendekati limit).
  • Event-driven, cost-aware (single extraction call, additive merge, no silent overwrite).

Mode 2 — Scheduled Batch Updates (Cron)

DAILY (01:00 WITA)

  • Ambil percakapan kemarin
  • Buat summary 2-3 kalimat (Gemini Flash)
  • Extract hard fields yang missed
  • Generate embedding
  • Insert conversation_summaries
  • Update stm_summary di user_profiles
  • Compute usage_history (rolling 30/90 hari)

WEEKLY (Sunday 02:00 WITA)

  • Recalculate operational_flags (vip, loyalty, risk)
  • Apply retention policy (delete/expire)
  • Weekly activity summary (tenant dashboard)

MONTHLY (1st 03:00 WITA)

  • Ambil summary 3 bulan terakhir
  • Synthesize ltm_summary (Thinking model, max 200 kata)
  • Update ltm_summary
  • Archive summary lama (cold storage)

Mode 3 — Onboarding: Tenant LTM Structure Configuration

JSON
{
  "tenant_id": "tenant_reviv_bali",
  "structure": {
    "safety_fields": {
      "drug_allergies": { "type": "array", "critical": true },
      "pregnancy_status": { "type": "boolean", "critical": true },
      "g6pd_status": { "type": "enum", "values": ["normal", "deficient", "unknown"], "critical": true }
    },
    "preferences": {
      "service_preference": { "type": "enum", "values": ["home_visit", "clinic", "telemedicine"] },
      "budget_band": { "type": "enum", "values": ["low", "mid", "high", "luxury"] }
    },
    "usage_history": {
      "iv_sessions_90d": { "type": "integer", "computed": true },
      "last_iv_protocol": { "type": "string" }
    },
    "operational_flags": {
      "vip_tier": { "type": "enum", "values": ["standard", "silver", "gold", "platinum"] }
    }
  },
  "version": "1.0"
}

5.4.5 LTM Injection at Conversation Start

L1 Orchestrator:

  1. Pull hard fields dari user_profiles.
  2. Pull stm_summary.
  3. Semantic search conversation_summaries (top 3).
  4. Pull ltm_summary bila tersedia.
  5. Compose context string.
  6. Inject ke L1 system prompt.

Target injected context: ~500-1500 tokens (vs 5000+ jika full history).

5.4.6 Why This Matters

  • Token efficiency: hemat 70-85% untuk sebagian besar percakapan.
  • Better personalization: critical info baru langsung dipakai.
  • Tenant control: field yang di-track bisa dikonfigurasi.
  • Compliance ready: consent + retention baked-in.

5.5 Messaging Pipeline Detail

Pipeline memastikan reliability dan multimodal support.

Flow: Webhook -> Idempotency Guard -> Conflict Detector -> Smart Buffer -> Tenant Loader -> L1

Komponen utama:

  • Idempotency Guard: mencegah duplicate processing.
  • Admin Conflict Detector: mencegah double reply admin vs AI.
  • Smart Buffer: gabungkan text + image + audio dalam burst window.

5.6 Data Layer

Supabase sebagai core backbone.

Menyimpan:

  • Tenant configuration
  • User profiles & LTM
  • Execution logs
  • Monitoring metrics
  • Wizard sessions
  • Idempotency keys
  • Message buffers

Semua tabel tenant-scoped dengan RLS aktif.

5.7 Monitoring & Alert System

Monitoring adalah backbone operational excellence.

Granularity

Track per workflow, per node, dan per tenant.

Proactivity

Deteksi issue sebelum user mengeluh.

Actionability

Alert harus memicu tindakan, bukan jadi noise.

5.7.1 Data Points yang Di-Track

Level 1 — Per Workflow Execution

MetricDeskripsiThreshold / Alert
execution_idUnique ID trace end-to-end-
tenant_idTenant pemicu workflow-
user_idUser berinteraksi-
workflow_nameWorkflow / domain agent-
total_duration_msWebhook ke response sent>15s WARNING, >30s CRITICAL
overall_statussuccess / partial_success / failedfailed = CRITICAL
total_cost_usdAkumulasi cost LLM + API>$0.50 WARNING
total_tokensTotal token semua LLM call>50K INVESTIGATE
channelwhatsapp / web / sms-
response_sentResponse berhasil dikirimfalse = CRITICAL

Level 2 — Per Node Process

MetricDeskripsiThreshold / Alert
node_idNode name di workflow-
node_typellm_call / db_query / tool_execution / api_call-
duration_msWaktu eksekusi nodeLLM >8s, DB >2s, Tool >5s = WARNING
cost_usdCost per node-
statussuccess / retry / failedfailed = INVESTIGATE
retry_countJumlah retry>2 = INVESTIGATE
error_messageError detail-
output_size_bytesUkuran output>500KB = INVESTIGATE

Level 3 — Tenant Aggregation

MetricDeskripsiAlertTujuan
conversations_todayTotal percakapan hari ini>quota = soft notificationCost control
conversations_mtdTotal month-to-dateapproaching quota = upsell signalRevenue
total_cost_mtdTotal biaya bulan ini>expected = investigateMargin
avg_response_timeP50/P95/P99P95 >10s = degradationSLA
closing_rate_mtdConversations -> closings<15% = quality issueQuality
error_rate_1h% failed execution (1h)>1% WARN, >5% CRITReliability

5.7.2 Alert System Architecture

Sistem alert mengevaluasi metrik secara periodik, melakukan dedupe, lalu routing ke kanal notifikasi sesuai severity.

5.7.3 Severity Levels & Routing

SeverityCondition (Examples)Notification ChannelResponse SLA
CRITICALError rate >5% (1h), DB down, response_sent=false (10 min)WhatsApp Ops + Email + PagerDuty/Ticket<5 menit
WARNINGError rate >1%, P95 >10s, cost spike >150%, quota approachingEmail Ops + Dashboard<30 menit
INFODeployment done, backup success, daily summaryDashboard only-

5.7.4 Alert Message Template (WhatsApp Ops Group)

TEXT
🚨 CRITICAL ALERT

Platform: AI CSI Production
Severity: CRITICAL
Time: 2026-02-14 15:32:18 WITA

Issue: Error rate 8.3% (threshold: 5%)
Affected: All tenants
Duration: Last 15 minutes

Details:
- Failed executions: 47 / 567
- Primary error: "Gemini API timeout"
- Affected workflow: main_cs_agent

Action Required: Investigate Gemini API status

Dashboard: https://monitor.csi.ai/alerts/ALT-20260214-001
Runbook: https://docs.csi.ai/runbooks/gemini-timeout

On-Call: @MasIsan @AGIA

5.7.5 Integrations (Ticketing & Ops Workflow)

Supported integrations:

  • Linear (API)
  • Jira (Webhook)
  • Asana (API)
  • Slack (Webhook)
  • WhatsApp (Waha API)
  • Email (SMTP)

5.7.6 SLA Monitoring & Enforcement

MetricTargetMeasurementPenalty / Action
Uptime99.5%MonthlyCredit 10% MRR
Response Time (P95)<10 detikPer conversationInternal monitoring
Error Rate<0.5%Per dayInvestigation required
Critical Alert Response<5 menitPer incidentEscalation if breach
Ticket Resolution (P0)<4 jamPer ticketPost-mortem required

5.8 QA Automation

QA otomatis terjadwal untuk menjaga stabilitas tanpa testing manual harian.

Fokus QA:

  • Stabilitas teknis (latency, error, tool failure)
  • Konsistensi persona & tone
  • Akurasi business logic
  • Ketahanan edge cases
  • Kualitas LTM recall

5.8.1 Daily Synthetic Test (N8N Cron — 02:00 WITA)

For each active tenant -> for each configured agent:

  1. Load test scenarios (test_scenarios)
  2. Send simulated message
  3. Capture response
  4. Validate behavior
  5. Log result

Jika test gagal:

  • Mark agent as degraded
  • Send alert to ops
  • Auto-create incident ticket
  • Notify tenant admin jika persist >24 jam

5.8.2 QA Metrics & Logging

Field utama di qa_test_results:

  • tenant_id
  • agent_name
  • scenario_id
  • response_time_ms
  • validation_passed
  • failure_reason
  • created_at

5.8.3 QA Dashboard Example

Daily report:

  • Pass rate per agent
  • Trend response time
  • Agent degraded list
  • Rekomendasi prompt tuning

5.8.4 Weekly Deep Testing

Includes:

  • Load testing (100 concurrent conversations / tenant)
  • Edge-case messages
  • Integration testing (booking -> payment -> confirmation)
  • LTM recall test
  • Tool chaining validation

5.9 Prompt Structure Hierarchy (L1 / L2 / L3)

5.9.1 Design Philosophy — Layered Responsibility

Prinsip utama:

  • L1 tidak boleh tahu detail domain spesifik
  • L2 tidak boleh duplicate orchestration logic
  • L3 tidak boleh override safety rules L1
  • L1 selalu punya kontrol akhir response

5.9.2 L1 — Main Orchestrator Prompt

Purpose: routing, context management, safety enforcement, tone consistency.

L1 owns:

  • Conversation flow
  • Tool invocation
  • Personalization
  • Formatting
  • Human escalation

5.9.3 L2 — Sub Agent-Level Prompts

Purpose: execute specific tasks with precision (HOW).

L2 returns structured output (JSON). L1 mengubah output ini menjadi conversational response.

5.9.4 L3 — Hyper-Specialist Prompts (Optional)

Dipakai hanya untuk kasus kompleks saat dipanggil explicit oleh L2.

5.9.5 Hierarchy Decision Flow

User Message -> L1 -> L2 (if needed) -> L3 (if escalated) -> back to L2 -> back to L1 -> send response.

5.10 Human-AI Collaboration Architecture

The Handoff Problem

AI tidak akan handle 100% kasus secara sempurna. Handoff ke human harus seamless tanpa kehilangan context.

Prinsip desain:

  • Seamless transition (no reset)
  • Full context preservation
  • Clear ownership (AI vs Human)
  • Reversible control

Toggle Architecture — AI ON/OFF Per User

State logic:

  • AI Active -> AI merespons otomatis
  • Admin Mode -> AI stop, human takeover
  • Auto-Paused -> sistem mendeteksi admin aktif
SQL
CREATE TABLE user_ai_mode (
  user_id UUID PRIMARY KEY,
  tenant_id UUID NOT NULL,
  ai_enabled BOOLEAN DEFAULT true,
  disabled_by_admin_id UUID,
  disabled_at TIMESTAMP,
  reason TEXT,
  auto_resume_at TIMESTAMP
);

CREATE INDEX idx_ai_mode_lookup ON user_ai_mode(user_id, tenant_id);

AI Gate Logic

Sebelum L1 dieksekusi:

  • Query user_ai_mode
  • Jika ai_enabled=false -> log message, skip AI, human handle
  • Jika ai_enabled=true -> lanjut ke agent

Context Preservation — Memory Continuity

Semua pesan dicatat, baik AI maupun human.

SQL
CREATE TABLE messages (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL,
  tenant_id UUID NOT NULL,
  direction TEXT NOT NULL,
  source TEXT NOT NULL,
  message_text TEXT NOT NULL,
  handled_by TEXT,
  ai_paused_auto BOOLEAN DEFAULT false,
  admin_id UUID,
  trace_id UUID,
  cost_usd DECIMAL(10, 6),
  tokens_used INTEGER,
  created_at TIMESTAMP DEFAULT NOW()
);

Operational guarantees:

  • No message loss
  • No double reply
  • No context reset after resume
  • Reversible control anytime
  • Clear audit trail

5.11 RAG & Knowledge Retrieval Architecture

5.11.1 Problem Statement

Masalah implementasi awal:

  • Chunk terlalu kecil (~1,000 chars)
  • topK rendah (4)
  • Clinical reasoning chain terpecah
  • Main agent terlalu monolithic (42K+ words prompt)

Dampak:

  • Continuity rendah
  • Reasoning depth tidak stabil

5.11.2 Phase 1 — Improved Embedding RAG

Configuration upgrade:

  • Chunk size ~8,000 chars (~1,024 tokens)
  • Overlap 200-500 chars
  • topK Reviv = 15
  • topK agent lain = 4

Outcome:

  • Better semantic grouping
  • Reduced reasoning break
  • Tactical improvement tanpa overhaul

5.11.3 Phase 2 — Deterministic Full-Document Loading

Flow:

  1. Orchestrator extract keyword
  2. Keyword match ke JSON tag mapping
  3. Load full KB file (bukan chunk)
  4. Inject 6-10 dokumen relevan ke reasoning agent
  5. Embedding RAG tetap sebagai fallback

5.11.4 Tag-Based Retrieval Structure

Contoh tag di KB file:

  • htn
  • cardiovascular
  • hypertension workup
  • lab panel
  • chronic disease

5.11.5 Multi-Agent Compatibility

RAG dapat dipanggil oleh orchestrator, domain agent, maupun sub-specialist.

5.11.6 Trade-Off Analysis

ApproachContinuityDeterminismFlexibilityRisk
Small chunk + topK 4LowLowHighHigh miss
8K chunk + topK 15Medium-HighMediumHighAcceptable
Full-doc + tag matchVery HighHighMediumLow miss

Rekomendasi:

  • Phase 1: Re-embed + overlap tuning (immediate)
  • Phase 2: Keyword -> full doc loading (strategic)

5.11.7 Monitoring & Token Consideration

Wajib ada:

  • Per-agent token tracking
  • Historical usage dashboard
  • Cost per execution logging
  • Model routing optimization

5.11.8 Final Positioning

RAG platform = hybrid retrieval system:

  • Deterministic tag matching (primary)
  • Full document loading (primary reasoning context)
  • Embedding semantic recall (fallback)
  • Multi-agent orchestrated reasoning

Dengan pendekatan ini, platform siap untuk medical-grade reasoning, domain regulasi tinggi, dan skala tanpa kehilangan kualitas.