Focused Documentation
Section 5 - Technical Architecture
Halaman ini fokus khusus untuk arsitektur teknis CSI, jadi struktur lebih bersih, mudah dipindai, dan nyaman dipakai saat diskusi engineering.
PART 5 — TECHNICAL ARCHITECTURE
5.1 Architecture Philosophy & Principles
Prinsip desain berikut bersifat non-negotiable dan menjadi fondasi seluruh platform.
No Silent Failure
Setiap error wajib tercapture, punya trace ID, dan masuk monitoring. User tidak boleh menerima error tanpa jejak sistem.
Idempotency First
Message yang sama tidak boleh diproses dua kali. Konflik AI vs admin harus terdeteksi otomatis.
Domain-Agnostic Core
Platform core tidak hardcode per-industri. Logic khusus domain ditempatkan di domain layer.
Composable Over Monolithic
Tidak ada super-agent tunggal. Sub-agent dan tools yang composable lebih maintainable, testable, dan scalable.
LTM by Default
Setiap interaksi memperkaya profile pengguna. Memory adalah fitur fundamental, bukan add-on.
Cost-Conscious Design
Setiap call LLM harus justified, dengan routing model Flash vs Pro/Thinking sesuai kompleksitas.
Tenant Isolation
Data, konfigurasi, dan context tiap tenant harus fully isolated dengan enforcement RLS.
5.2 Tech Stack
| Layer | Technology | Rationale |
|---|---|---|
| Orchestration | N8N (self-hosted) | Visual, modular, extensible |
| LLM Primary (Conversational) | Gemini 2.5 Flash / Gemini 3 Flash | Cost-efficient, multimodal |
| LLM Secondary (Reasoning) | Gemini Pro / Thinking (TBD per use-case) | Complex reasoning & doc generation |
| Database | Supabase PostgreSQL | Unified DB, RLS built-in |
| Vector Store / RAG | Supabase Vector (pgvector) | No separate vector DB needed |
| Frontend | Next.js | SSR, API routes, TypeScript |
| Channel (Primary) | Waha (WhatsApp) | Production-ready |
| Channel (Roadmap) | WhatsApp Official, Web Widget, TikTok, Meta API | Q3-Q4 2026 expansion |
| Monitoring | N8N logs + custom dashboard | Phase 1 monitoring |
5.3 Agent Architecture
Arsitektur agent menggunakan 3 layer utama.
Final Architecture
Multi-Layer Agentic System
L0 - Channel Layer
Omnichannel intake and normalization.
Primary production channel
Web Widget
Website embedded entry
TikTok / Meta
Roadmap channels
Webhook Gateway
Unified inbound handler
L1 - Orchestrator
Main reasoning, policy guard, and final response owner.
Main CS Agent
Intent, tool routing, response generation
Moderation Agent
Safety, abuse, crisis and emergency checks
Fallback Agent
Soft fallback and resilient handoff
L2 - Utility + Domain
Reusable sub-agents and domain-specific execution logic.
Utility Sub-Agents
Time, location, transport, pricing, docs
Domain Sub-Agents
Medical, automotive, F&B, commerce logic
L3 - Specialist (Optional)
Deep specialization for high-complexity cases.
Medical Specialist
Sub-specialist escalation (e.g. oncology)
Business Specialist
High-value financing and niche workflows
Data + Memory Layer
Persistence, retrieval, observability, and tenant isolation.
Supabase PostgreSQL
Tenant-scoped operational store with RLS
Vector + LTM
Semantic recall, summaries, and profile memory
5.3.1 L1 — Orchestrator
Fungsi utama:
- Inject LTM context
- Detect intent
- Call sub-agent / tools
- Generate final response
Komponen:
- Main CS Agent
- Moderation Agent
- Fallback Agent
L1 bertanggung jawab atas reasoning utama dan koordinasi antar agent.
5.3.2 L2 — Utility & Domain-Specific Layer
Layer ini berisi sub-agent reusable dan domain logic.
5.3.2.1 Utility Sub-Agent
Utility Sub-Agent adalah kumpulan sub-agent modular yang reusable lintas industri. Setiap sub-agent bersifat:
- Single responsibility
- Stateless (tidak menyimpan state internal)
- Structured input/output (JSON schema)
- Fully composable oleh L1 Orchestrator
A. Time Management
- • TIME_PARSER
- • TIME_SLOT_CHECKER
- • TIME_CONFLICT_DETECTOR
- • TIME_WINDOW_GENERATOR
- • TIME_FORMATTER
B. Address & Location
- • ADDRESS_PARSER
- • ADDRESS_GEOCODER
- • ADDRESS_VALIDATOR
- • ADDRESS_DISAMBIGUATOR
- • MAPS_LINK_GENERATOR
C. Transport & Logistics
- • DISTANCE_CALCULATOR
- • ETA_CALCULATOR
- • ZONE_CLASSIFIER
- • ROUTE_OPTIMIZER
- • TRAFFIC_ADVISOR
D. Payment & Pricing
- • PRICE_CALCULATOR
- • PROMO_CODE_VALIDATOR
- • PAYMENT_LINK_GENERATOR
- • PAYMENT_STATUS_CHECKER
- • INVOICE_GENERATOR
E. Document & Content
- • DOCS_GENERATOR
- • RESPONSE_FORMATTER
- • LANGUAGE_MANAGER
A. Time Management Sub-Agent
- TIME_PARSER
- Purpose: Mengubah ekspresi waktu natural menjadi datetime terstruktur.
- Input: Free text ("besok jam 3", "next Monday 4pm", "in 2 hours")
- Output:
{ datetime_utc, datetime_local, timezone, is_ambiguous } - Config:
default_timezoneper tenant
- TIME_SLOT_CHECKER
- Purpose: Validasi ketersediaan slot sesuai operating hours & resource.
- Input: Parsed datetime +
resource_id - Output:
{ available, reason, next_available, alternatives[] } - Config:
operating_hours,holidays,blocked_dates
- TIME_CONFLICT_DETECTOR
- Purpose: Deteksi tabrakan jadwal.
- Input:
datetime + duration + resource_id - Output:
{ conflict, conflicting_booking, suggestions[] }
- TIME_WINDOW_GENERATOR
- Purpose: Generate daftar slot dalam rentang tanggal.
- Input:
date_range + duration - Output:
available_slots[] - Config:
slot_duration,buffer_between_slots
- TIME_FORMATTER
- Purpose: Format datetime ke teks natural sesuai bahasa.
- Input:
datetime + language + format_type - Output:
Besok jam 15.00/Tomorrow at 3pm
B. Address & Location Sub-Agent
- ADDRESS_PARSER
- Purpose: Ekstrak alamat & landmark dari teks bebas.
- Input: Free text address
- Output:
{ address_text, landmarks, confidence }
- ADDRESS_GEOCODER
- Purpose: Konversi alamat menjadi koordinat GPS.
- Input: Parsed address
- Output:
{ lat, lng, formatted_address, confidence } - Integration: Google Maps API
- ADDRESS_VALIDATOR
- Purpose: Validasi alamat dalam service area.
- Input:
lat/lng + tenant_id - Output:
{ in_service_area, zone_name, surcharge }
- ADDRESS_DISAMBIGUATOR
- Purpose: Klarifikasi alamat ambigu.
- Output:
{ candidates[], clarifying_question }
- MAPS_LINK_GENERATOR
- Purpose: Generate Google Maps link.
- Output:
{ short_link, directions_link }
C. Transport & Logistics Sub-Agent
- DISTANCE_CALCULATOR
- Purpose: Hitung jarak aktual.
- Output:
{ distance_km }
- ETA_CALCULATOR
- Purpose: Estimasi waktu tiba berbasis traffic.
- Output:
{ eta_minutes, arrival_time }
- ZONE_CLASSIFIER
- Purpose: Klasifikasi zona untuk pricing & dispatch.
- Output:
{ zone_id, zone_tier, surcharge }
- ROUTE_OPTIMIZER
- Purpose: Optimasi urutan kunjungan multi-stop.
- Output:
{ optimized_route[], total_distance, total_time }
- TRAFFIC_ADVISOR
- Purpose: Update kondisi lalu lintas real-time.
- Output:
{ traffic_status, delay_minutes }
D. Payment & Pricing Sub-Agent
- PRICE_CALCULATOR
- Purpose: Hitung total harga dengan breakdown.
- Input:
line_items + zone + promo - Output:
{ subtotal, fees[], discounts[], tax, total }
- PROMO_CODE_VALIDATOR
- Purpose: Validasi dan apply promo.
- Output:
{ valid, discount_amount, rejection_reason }
- PAYMENT_LINK_GENERATOR
- Purpose: Generate payment link.
- Output:
{ payment_url, expires_at } - Integration: Xendit / Stripe / dll
- PAYMENT_STATUS_CHECKER
- Purpose: Cek status pembayaran.
- Output:
{ status, paid_at }
- INVOICE_GENERATOR
- Purpose: Generate invoice PDF.
- Output:
{ invoice_url, invoice_number }
E. Document & Content Sub-Agent
- DOCS_GENERATOR
- Purpose: Generate dokumen profesional dari data terstruktur.
- Output:
{ document_url }
- RESPONSE_FORMATTER
- Purpose: Adapt output sesuai channel (WA/Web).
- Output:
formatted_messages[]
- LANGUAGE_MANAGER
- Purpose: Menjaga tone & konsistensi bahasa sesuai persona tenant.
- Output: Polished response text
Utility Sub-Agent bersifat domain-agnostic dan dapat digunakan oleh semua Domain-Specific Sub-Agent tanpa perubahan struktur core.
5.3.2.2 Domain-Specific Sub-Agent
Berisi logic industri tertentu.
Contoh:
- Medical booking logic
- Automotive lead qualification
- F&B order handling
Domain layer hanya mengandung business rules, bukan infrastructure logic.
5.3.3 L3 — Sub-Specialist Agent (Optional)
Digunakan untuk domain dengan sub-specialization mendalam.
Contoh:
- Pediatric Oncology (Medical)
- High-value Financing (Automotive)
L3 dipanggil hanya jika diperlukan oleh L2 atau L1.
5.4 Long-Term Memory Architecture
LTM (Long-Term Memory) adalah backbone personalization dan reliability platform CSI. Desainnya domain-agnostic: core schema tetap sama lintas industri, sementara detail field mengikuti template per domain (medical menjadi reference implementation paling lengkap dari CepatSehat Addendum).
5.4.1 Core Data Model (Domain-Agnostic)
- Hard Fields (JSONB): data penting yang harus bisa dipakai agent secara deterministik.
- Summaries + Embeddings: ringkasan percakapan untuk semantic recall (hemat token, tetap relevan).
- Hybrid Memory: gabungan structured fields + narrative summaries.
SQL-- Master table: domain-agnostic CREATE TABLE user_profiles ( user_id UUID PRIMARY KEY, tenant_id UUID NOT NULL REFERENCES tenants(id), full_name TEXT, phone_number TEXT, preferred_language TEXT DEFAULT 'id', safety_fields JSONB DEFAULT '{}', preferences JSONB DEFAULT '{}', usage_history JSONB DEFAULT '{}', operational_flags JSONB DEFAULT '{}', stm_summary TEXT, ltm_summary TEXT, consent_given BOOLEAN DEFAULT false, data_retention_until TIMESTAMP, updated_at TIMESTAMP DEFAULT NOW() ); CREATE TABLE conversation_summaries ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID REFERENCES user_profiles(user_id), tenant_id UUID REFERENCES tenants(id), conversation_date DATE NOT NULL, summary TEXT NOT NULL, key_topics TEXT[], sentiment TEXT, action_items JSONB DEFAULT '[]', embedding VECTOR(768), created_at TIMESTAMP DEFAULT NOW() );
5.4.2 Hard Fields Reference (Medical)
safety_fields (medical)
- drug_allergies[] (CRITICAL)
- pregnancy_status (CRITICAL)
- g6pd_status (CRITICAL)
- anticoagulant_use (CRITICAL)
- ckd_status, ckd_stage (CRITICAL)
- history_anaphylaxis (CRITICAL)
- chronic_conditions[]
- psychiatric_history, cardiac_history_major
preferences (medical)
- service_preference: home_visit / clinic / telemedicine
- budget_band: low / mid / high / luxury
- needle_anxiety
- communication_style
- preferred_payment_method
usage_history (medical)
- iv_sessions_90d (computed)
- last_iv_protocol
- last_iv_reaction
- pharmacy_orders_30d (computed)
- lab_tests_90d (computed)
operational_flags (medical)
- vip_tier: standard / silver / gold / platinum
- no_show_flag, no_show_count_90d
- risk_flags_computed[]
- consent_medical_data
5.4.3 Domain Template Pattern (Lintas Industri)
Setiap domain mengikuti 4 bucket yang sama, hanya field definition yang berbeda.
- MEDICAL: safety_fields, preferences, usage_history, operational_flags
- AUTOMOTIVE: safety_fields, preferences, usage_history, operational_flags
- RETAIL / COMMERCE: safety_fields, preferences, usage_history, operational_flags
5.4.4 LTM Pipeline (Hybrid: Live + Scheduled)
Mode 1 — Live Update (Real-Time Capture)
- Triggered saat percakapan berjalan (critical safety info, explicit preference, STM mendekati limit).
- Event-driven, cost-aware (single extraction call, additive merge, no silent overwrite).
Mode 2 — Scheduled Batch Updates (Cron)
DAILY (01:00 WITA)
- Ambil percakapan kemarin
- Buat summary 2-3 kalimat (Gemini Flash)
- Extract hard fields yang missed
- Generate embedding
- Insert
conversation_summaries - Update
stm_summarydiuser_profiles - Compute usage_history (rolling 30/90 hari)
WEEKLY (Sunday 02:00 WITA)
- Recalculate operational_flags (vip, loyalty, risk)
- Apply retention policy (delete/expire)
- Weekly activity summary (tenant dashboard)
MONTHLY (1st 03:00 WITA)
- Ambil summary 3 bulan terakhir
- Synthesize
ltm_summary(Thinking model, max 200 kata) - Update
ltm_summary - Archive summary lama (cold storage)
Mode 3 — Onboarding: Tenant LTM Structure Configuration
JSON{ "tenant_id": "tenant_reviv_bali", "structure": { "safety_fields": { "drug_allergies": { "type": "array", "critical": true }, "pregnancy_status": { "type": "boolean", "critical": true }, "g6pd_status": { "type": "enum", "values": ["normal", "deficient", "unknown"], "critical": true } }, "preferences": { "service_preference": { "type": "enum", "values": ["home_visit", "clinic", "telemedicine"] }, "budget_band": { "type": "enum", "values": ["low", "mid", "high", "luxury"] } }, "usage_history": { "iv_sessions_90d": { "type": "integer", "computed": true }, "last_iv_protocol": { "type": "string" } }, "operational_flags": { "vip_tier": { "type": "enum", "values": ["standard", "silver", "gold", "platinum"] } } }, "version": "1.0" }
5.4.5 LTM Injection at Conversation Start
L1 Orchestrator:
- Pull hard fields dari
user_profiles. - Pull
stm_summary. - Semantic search
conversation_summaries(top 3). - Pull
ltm_summarybila tersedia. - Compose context string.
- Inject ke L1 system prompt.
Target injected context: ~500-1500 tokens (vs 5000+ jika full history).
5.4.6 Why This Matters
- Token efficiency: hemat 70-85% untuk sebagian besar percakapan.
- Better personalization: critical info baru langsung dipakai.
- Tenant control: field yang di-track bisa dikonfigurasi.
- Compliance ready: consent + retention baked-in.
5.5 Messaging Pipeline Detail
Pipeline memastikan reliability dan multimodal support.
Flow: Webhook -> Idempotency Guard -> Conflict Detector -> Smart Buffer -> Tenant Loader -> L1
Komponen utama:
- Idempotency Guard: mencegah duplicate processing.
- Admin Conflict Detector: mencegah double reply admin vs AI.
- Smart Buffer: gabungkan text + image + audio dalam burst window.
5.6 Data Layer
Supabase sebagai core backbone.
Menyimpan:
- Tenant configuration
- User profiles & LTM
- Execution logs
- Monitoring metrics
- Wizard sessions
- Idempotency keys
- Message buffers
Semua tabel tenant-scoped dengan RLS aktif.
5.7 Monitoring & Alert System
Monitoring adalah backbone operational excellence.
Granularity
Track per workflow, per node, dan per tenant.
Proactivity
Deteksi issue sebelum user mengeluh.
Actionability
Alert harus memicu tindakan, bukan jadi noise.
5.7.1 Data Points yang Di-Track
Level 1 — Per Workflow Execution
| Metric | Deskripsi | Threshold / Alert |
|---|---|---|
| execution_id | Unique ID trace end-to-end | - |
| tenant_id | Tenant pemicu workflow | - |
| user_id | User berinteraksi | - |
| workflow_name | Workflow / domain agent | - |
| total_duration_ms | Webhook ke response sent | >15s WARNING, >30s CRITICAL |
| overall_status | success / partial_success / failed | failed = CRITICAL |
| total_cost_usd | Akumulasi cost LLM + API | >$0.50 WARNING |
| total_tokens | Total token semua LLM call | >50K INVESTIGATE |
| channel | whatsapp / web / sms | - |
| response_sent | Response berhasil dikirim | false = CRITICAL |
Level 2 — Per Node Process
| Metric | Deskripsi | Threshold / Alert |
|---|---|---|
| node_id | Node name di workflow | - |
| node_type | llm_call / db_query / tool_execution / api_call | - |
| duration_ms | Waktu eksekusi node | LLM >8s, DB >2s, Tool >5s = WARNING |
| cost_usd | Cost per node | - |
| status | success / retry / failed | failed = INVESTIGATE |
| retry_count | Jumlah retry | >2 = INVESTIGATE |
| error_message | Error detail | - |
| output_size_bytes | Ukuran output | >500KB = INVESTIGATE |
Level 3 — Tenant Aggregation
| Metric | Deskripsi | Alert | Tujuan |
|---|---|---|---|
| conversations_today | Total percakapan hari ini | >quota = soft notification | Cost control |
| conversations_mtd | Total month-to-date | approaching quota = upsell signal | Revenue |
| total_cost_mtd | Total biaya bulan ini | >expected = investigate | Margin |
| avg_response_time | P50/P95/P99 | P95 >10s = degradation | SLA |
| closing_rate_mtd | Conversations -> closings | <15% = quality issue | Quality |
| error_rate_1h | % failed execution (1h) | >1% WARN, >5% CRIT | Reliability |
5.7.2 Alert System Architecture
Sistem alert mengevaluasi metrik secara periodik, melakukan dedupe, lalu routing ke kanal notifikasi sesuai severity.
5.7.3 Severity Levels & Routing
| Severity | Condition (Examples) | Notification Channel | Response SLA |
|---|---|---|---|
| CRITICAL | Error rate >5% (1h), DB down, response_sent=false (10 min) | WhatsApp Ops + Email + PagerDuty/Ticket | <5 menit |
| WARNING | Error rate >1%, P95 >10s, cost spike >150%, quota approaching | Email Ops + Dashboard | <30 menit |
| INFO | Deployment done, backup success, daily summary | Dashboard only | - |
5.7.4 Alert Message Template (WhatsApp Ops Group)
TEXT🚨 CRITICAL ALERT Platform: AI CSI Production Severity: CRITICAL Time: 2026-02-14 15:32:18 WITA Issue: Error rate 8.3% (threshold: 5%) Affected: All tenants Duration: Last 15 minutes Details: - Failed executions: 47 / 567 - Primary error: "Gemini API timeout" - Affected workflow: main_cs_agent Action Required: Investigate Gemini API status Dashboard: https://monitor.csi.ai/alerts/ALT-20260214-001 Runbook: https://docs.csi.ai/runbooks/gemini-timeout On-Call: @MasIsan @AGIA
5.7.5 Integrations (Ticketing & Ops Workflow)
Supported integrations:
- Linear (API)
- Jira (Webhook)
- Asana (API)
- Slack (Webhook)
- WhatsApp (Waha API)
- Email (SMTP)
5.7.6 SLA Monitoring & Enforcement
| Metric | Target | Measurement | Penalty / Action |
|---|---|---|---|
| Uptime | 99.5% | Monthly | Credit 10% MRR |
| Response Time (P95) | <10 detik | Per conversation | Internal monitoring |
| Error Rate | <0.5% | Per day | Investigation required |
| Critical Alert Response | <5 menit | Per incident | Escalation if breach |
| Ticket Resolution (P0) | <4 jam | Per ticket | Post-mortem required |
5.8 QA Automation
QA otomatis terjadwal untuk menjaga stabilitas tanpa testing manual harian.
Fokus QA:
- Stabilitas teknis (latency, error, tool failure)
- Konsistensi persona & tone
- Akurasi business logic
- Ketahanan edge cases
- Kualitas LTM recall
5.8.1 Daily Synthetic Test (N8N Cron — 02:00 WITA)
For each active tenant -> for each configured agent:
- Load test scenarios (
test_scenarios) - Send simulated message
- Capture response
- Validate behavior
- Log result
Jika test gagal:
- Mark agent as degraded
- Send alert to ops
- Auto-create incident ticket
- Notify tenant admin jika persist >24 jam
5.8.2 QA Metrics & Logging
Field utama di qa_test_results:
tenant_idagent_namescenario_idresponse_time_msvalidation_passedfailure_reasoncreated_at
5.8.3 QA Dashboard Example
Daily report:
- Pass rate per agent
- Trend response time
- Agent degraded list
- Rekomendasi prompt tuning
5.8.4 Weekly Deep Testing
Includes:
- Load testing (100 concurrent conversations / tenant)
- Edge-case messages
- Integration testing (booking -> payment -> confirmation)
- LTM recall test
- Tool chaining validation
5.9 Prompt Structure Hierarchy (L1 / L2 / L3)
5.9.1 Design Philosophy — Layered Responsibility
Prinsip utama:
- L1 tidak boleh tahu detail domain spesifik
- L2 tidak boleh duplicate orchestration logic
- L3 tidak boleh override safety rules L1
- L1 selalu punya kontrol akhir response
5.9.2 L1 — Main Orchestrator Prompt
Purpose: routing, context management, safety enforcement, tone consistency.
L1 owns:
- Conversation flow
- Tool invocation
- Personalization
- Formatting
- Human escalation
5.9.3 L2 — Sub Agent-Level Prompts
Purpose: execute specific tasks with precision (HOW).
L2 returns structured output (JSON). L1 mengubah output ini menjadi conversational response.
5.9.4 L3 — Hyper-Specialist Prompts (Optional)
Dipakai hanya untuk kasus kompleks saat dipanggil explicit oleh L2.
5.9.5 Hierarchy Decision Flow
User Message -> L1 -> L2 (if needed) -> L3 (if escalated) -> back to L2 -> back to L1 -> send response.
5.10 Human-AI Collaboration Architecture
The Handoff Problem
AI tidak akan handle 100% kasus secara sempurna. Handoff ke human harus seamless tanpa kehilangan context.
Prinsip desain:
- Seamless transition (no reset)
- Full context preservation
- Clear ownership (AI vs Human)
- Reversible control
Toggle Architecture — AI ON/OFF Per User
State logic:
- AI Active -> AI merespons otomatis
- Admin Mode -> AI stop, human takeover
- Auto-Paused -> sistem mendeteksi admin aktif
SQLCREATE TABLE user_ai_mode ( user_id UUID PRIMARY KEY, tenant_id UUID NOT NULL, ai_enabled BOOLEAN DEFAULT true, disabled_by_admin_id UUID, disabled_at TIMESTAMP, reason TEXT, auto_resume_at TIMESTAMP ); CREATE INDEX idx_ai_mode_lookup ON user_ai_mode(user_id, tenant_id);
AI Gate Logic
Sebelum L1 dieksekusi:
- Query
user_ai_mode - Jika
ai_enabled=false-> log message, skip AI, human handle - Jika
ai_enabled=true-> lanjut ke agent
Context Preservation — Memory Continuity
Semua pesan dicatat, baik AI maupun human.
SQLCREATE TABLE messages ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID NOT NULL, tenant_id UUID NOT NULL, direction TEXT NOT NULL, source TEXT NOT NULL, message_text TEXT NOT NULL, handled_by TEXT, ai_paused_auto BOOLEAN DEFAULT false, admin_id UUID, trace_id UUID, cost_usd DECIMAL(10, 6), tokens_used INTEGER, created_at TIMESTAMP DEFAULT NOW() );
Operational guarantees:
- No message loss
- No double reply
- No context reset after resume
- Reversible control anytime
- Clear audit trail
5.11 RAG & Knowledge Retrieval Architecture
5.11.1 Problem Statement
Masalah implementasi awal:
- Chunk terlalu kecil (~1,000 chars)
- topK rendah (4)
- Clinical reasoning chain terpecah
- Main agent terlalu monolithic (42K+ words prompt)
Dampak:
- Continuity rendah
- Reasoning depth tidak stabil
5.11.2 Phase 1 — Improved Embedding RAG
Configuration upgrade:
- Chunk size ~8,000 chars (~1,024 tokens)
- Overlap 200-500 chars
- topK Reviv = 15
- topK agent lain = 4
Outcome:
- Better semantic grouping
- Reduced reasoning break
- Tactical improvement tanpa overhaul
5.11.3 Phase 2 — Deterministic Full-Document Loading
Flow:
- Orchestrator extract keyword
- Keyword match ke JSON tag mapping
- Load full KB file (bukan chunk)
- Inject 6-10 dokumen relevan ke reasoning agent
- Embedding RAG tetap sebagai fallback
5.11.4 Tag-Based Retrieval Structure
Contoh tag di KB file:
- htn
- cardiovascular
- hypertension workup
- lab panel
- chronic disease
5.11.5 Multi-Agent Compatibility
RAG dapat dipanggil oleh orchestrator, domain agent, maupun sub-specialist.
5.11.6 Trade-Off Analysis
| Approach | Continuity | Determinism | Flexibility | Risk |
|---|---|---|---|---|
| Small chunk + topK 4 | Low | Low | High | High miss |
| 8K chunk + topK 15 | Medium-High | Medium | High | Acceptable |
| Full-doc + tag match | Very High | High | Medium | Low miss |
Rekomendasi:
- Phase 1: Re-embed + overlap tuning (immediate)
- Phase 2: Keyword -> full doc loading (strategic)
5.11.7 Monitoring & Token Consideration
Wajib ada:
- Per-agent token tracking
- Historical usage dashboard
- Cost per execution logging
- Model routing optimization
5.11.8 Final Positioning
RAG platform = hybrid retrieval system:
- Deterministic tag matching (primary)
- Full document loading (primary reasoning context)
- Embedding semantic recall (fallback)
- Multi-agent orchestrated reasoning
Dengan pendekatan ini, platform siap untuk medical-grade reasoning, domain regulasi tinggi, dan skala tanpa kehilangan kualitas.