Focused Documentation

Section 5 - Technical Architecture

Halaman ini fokus khusus untuk arsitektur teknis CSI, jadi struktur lebih bersih, mudah dipindai, dan nyaman dipakai saat diskusi engineering.

PART 5 — TECHNICAL ARCHITECTURE

5.1 Architecture Philosophy & Principles

Prinsip desain berikut bersifat non-negotiable dan menjadi fondasi seluruh platform.

No Silent Failure

Setiap error wajib tercapture, punya trace ID, dan masuk monitoring. User tidak boleh menerima error tanpa jejak sistem.

Idempotency First

Message yang sama tidak boleh diproses dua kali. Konflik AI vs admin harus terdeteksi otomatis.

Domain-Agnostic Core

Platform core tidak hardcode per-industri. Logic khusus domain ditempatkan di domain layer.

Composable Over Monolithic

Tidak ada super-agent tunggal. Sub-agent dan tools yang composable lebih maintainable, testable, dan scalable.

LTM by Default

Setiap interaksi memperkaya profile pengguna. Memory adalah fitur fundamental, bukan add-on.

Cost-Conscious Design

Setiap call LLM harus justified, dengan routing model Flash vs Pro/Thinking sesuai kompleksitas.

Tenant Isolation

Data, konfigurasi, dan context tiap tenant harus fully isolated dengan enforcement RLS.

5.2 Tech Stack

Layer	Technology	Rationale
Orchestration	N8N (self-hosted)	Visual, modular, extensible
LLM Primary (Conversational)	Gemini 2.5 Flash / Gemini 3 Flash	Cost-efficient, multimodal
LLM Secondary (Reasoning)	Gemini Pro / Thinking (TBD per use-case)	Complex reasoning & doc generation
Database	Supabase PostgreSQL	Unified DB, RLS built-in
Vector Store / RAG	Supabase Vector (pgvector)	No separate vector DB needed
Frontend	Next.js	SSR, API routes, TypeScript
Channel (Primary)	Waha (WhatsApp)	Production-ready
Channel (Roadmap)	WhatsApp Official, Web Widget, TikTok, Meta API	Q3-Q4 2026 expansion
Monitoring	N8N logs + custom dashboard	Phase 1 monitoring

5.3 Agent Architecture

Arsitektur agent menggunakan 3 layer utama.

Final Architecture

Multi-Layer Agentic System

L0 - Channel Layer

Omnichannel intake and normalization.

Primary production channel

Web Widget

Website embedded entry

TikTok / Meta

Roadmap channels

Webhook Gateway

Unified inbound handler

L1 - Orchestrator

Main reasoning, policy guard, and final response owner.

Main CS Agent

Intent, tool routing, response generation

Moderation Agent

Safety, abuse, crisis and emergency checks

Fallback Agent

Soft fallback and resilient handoff

L2 - Utility + Domain

Reusable sub-agents and domain-specific execution logic.

Utility Sub-Agents

Time, location, transport, pricing, docs

Domain Sub-Agents

Medical, automotive, F&B, commerce logic

L3 - Specialist (Optional)

Deep specialization for high-complexity cases.

Medical Specialist

Sub-specialist escalation (e.g. oncology)

Business Specialist

High-value financing and niche workflows

Data + Memory Layer

Persistence, retrieval, observability, and tenant isolation.

Supabase PostgreSQL

Tenant-scoped operational store with RLS

Vector + LTM

Semantic recall, summaries, and profile memory

5.3.1 L1 — Orchestrator

Fungsi utama:

Inject LTM context
Detect intent
Call sub-agent / tools
Generate final response

Komponen:

Main CS Agent
Moderation Agent
Fallback Agent

L1 bertanggung jawab atas reasoning utama dan koordinasi antar agent.

5.3.2 L2 — Utility & Domain-Specific Layer

Layer ini berisi sub-agent reusable dan domain logic.

5.3.2.1 Utility Sub-Agent

Utility Sub-Agent adalah kumpulan sub-agent modular yang reusable lintas industri. Setiap sub-agent bersifat:

Single responsibility
Stateless (tidak menyimpan state internal)
Structured input/output (JSON schema)
Fully composable oleh L1 Orchestrator

A. Time Management

• TIME_PARSER
• TIME_SLOT_CHECKER
• TIME_CONFLICT_DETECTOR
• TIME_WINDOW_GENERATOR
• TIME_FORMATTER

B. Address & Location

• ADDRESS_PARSER
• ADDRESS_GEOCODER
• ADDRESS_VALIDATOR
• ADDRESS_DISAMBIGUATOR
• MAPS_LINK_GENERATOR

C. Transport & Logistics

• DISTANCE_CALCULATOR
• ETA_CALCULATOR
• ZONE_CLASSIFIER
• ROUTE_OPTIMIZER
• TRAFFIC_ADVISOR

D. Payment & Pricing

• PRICE_CALCULATOR
• PROMO_CODE_VALIDATOR
• PAYMENT_LINK_GENERATOR
• PAYMENT_STATUS_CHECKER
• INVOICE_GENERATOR

E. Document & Content

• DOCS_GENERATOR
• RESPONSE_FORMATTER
• LANGUAGE_MANAGER

A. Time Management Sub-Agent

TIME_PARSER

Purpose: Mengubah ekspresi waktu natural menjadi datetime terstruktur.
Input: Free text ("besok jam 3", "next Monday 4pm", "in 2 hours")
Output: { datetime_utc, datetime_local, timezone, is_ambiguous }
Config: default_timezone per tenant

TIME_SLOT_CHECKER

Purpose: Validasi ketersediaan slot sesuai operating hours & resource.
Input: Parsed datetime + resource_id
Output: { available, reason, next_available, alternatives[] }
Config: operating_hours, holidays, blocked_dates

TIME_CONFLICT_DETECTOR

Purpose: Deteksi tabrakan jadwal.
Input: datetime + duration + resource_id
Output: { conflict, conflicting_booking, suggestions[] }

TIME_WINDOW_GENERATOR

Purpose: Generate daftar slot dalam rentang tanggal.
Input: date_range + duration
Output: available_slots[]
Config: slot_duration, buffer_between_slots

TIME_FORMATTER

Purpose: Format datetime ke teks natural sesuai bahasa.
Input: datetime + language + format_type
Output: Besok jam 15.00 / Tomorrow at 3pm

B. Address & Location Sub-Agent

ADDRESS_PARSER

Purpose: Ekstrak alamat & landmark dari teks bebas.
Input: Free text address
Output: { address_text, landmarks, confidence }

ADDRESS_GEOCODER

Purpose: Konversi alamat menjadi koordinat GPS.
Input: Parsed address
Output: { lat, lng, formatted_address, confidence }
Integration: Google Maps API

ADDRESS_VALIDATOR

Purpose: Validasi alamat dalam service area.
Input: lat/lng + tenant_id
Output: { in_service_area, zone_name, surcharge }

ADDRESS_DISAMBIGUATOR

Purpose: Klarifikasi alamat ambigu.
Output: { candidates[], clarifying_question }

MAPS_LINK_GENERATOR

Purpose: Generate Google Maps link.
Output: { short_link, directions_link }

C. Transport & Logistics Sub-Agent

DISTANCE_CALCULATOR

Purpose: Hitung jarak aktual.
Output: { distance_km }

ETA_CALCULATOR

Purpose: Estimasi waktu tiba berbasis traffic.
Output: { eta_minutes, arrival_time }

ZONE_CLASSIFIER

Purpose: Klasifikasi zona untuk pricing & dispatch.
Output: { zone_id, zone_tier, surcharge }

ROUTE_OPTIMIZER

Purpose: Optimasi urutan kunjungan multi-stop.
Output: { optimized_route[], total_distance, total_time }

TRAFFIC_ADVISOR

Purpose: Update kondisi lalu lintas real-time.
Output: { traffic_status, delay_minutes }

D. Payment & Pricing Sub-Agent

PRICE_CALCULATOR

Purpose: Hitung total harga dengan breakdown.
Input: line_items + zone + promo
Output: { subtotal, fees[], discounts[], tax, total }

PROMO_CODE_VALIDATOR

Purpose: Validasi dan apply promo.
Output: { valid, discount_amount, rejection_reason }

PAYMENT_LINK_GENERATOR

Purpose: Generate payment link.
Output: { payment_url, expires_at }
Integration: Xendit / Stripe / dll

PAYMENT_STATUS_CHECKER

Purpose: Cek status pembayaran.
Output: { status, paid_at }

INVOICE_GENERATOR

Purpose: Generate invoice PDF.
Output: { invoice_url, invoice_number }

E. Document & Content Sub-Agent

DOCS_GENERATOR

Purpose: Generate dokumen profesional dari data terstruktur.
Output: { document_url }

RESPONSE_FORMATTER

Purpose: Adapt output sesuai channel (WA/Web).
Output: formatted_messages[]

LANGUAGE_MANAGER

Purpose: Menjaga tone & konsistensi bahasa sesuai persona tenant.
Output: Polished response text

Utility Sub-Agent bersifat domain-agnostic dan dapat digunakan oleh semua Domain-Specific Sub-Agent tanpa perubahan struktur core.

5.3.2.2 Domain-Specific Sub-Agent

Berisi logic industri tertentu.

Contoh:

Medical booking logic
Automotive lead qualification
F&B order handling

Domain layer hanya mengandung business rules, bukan infrastructure logic.

5.3.3 L3 — Sub-Specialist Agent (Optional)

Digunakan untuk domain dengan sub-specialization mendalam.

Contoh:

Pediatric Oncology (Medical)
High-value Financing (Automotive)

L3 dipanggil hanya jika diperlukan oleh L2 atau L1.

5.4 Long-Term Memory Architecture

LTM (Long-Term Memory) adalah backbone personalization dan reliability platform CSI. Desainnya domain-agnostic: core schema tetap sama lintas industri, sementara detail field mengikuti template per domain (medical menjadi reference implementation paling lengkap dari CepatSehat Addendum).

5.4.1 Core Data Model (Domain-Agnostic)

Hard Fields (JSONB): data penting yang harus bisa dipakai agent secara deterministik.
Summaries + Embeddings: ringkasan percakapan untuk semantic recall (hemat token, tetap relevan).
Hybrid Memory: gabungan structured fields + narrative summaries.

SQL
-- Master table: domain-agnostic
CREATE TABLE user_profiles (
  user_id       UUID PRIMARY KEY,
  tenant_id     UUID NOT NULL REFERENCES tenants(id),
  full_name          TEXT,
  phone_number       TEXT,
  preferred_language TEXT DEFAULT 'id',
  safety_fields      JSONB DEFAULT '{}',
  preferences        JSONB DEFAULT '{}',
  usage_history      JSONB DEFAULT '{}',
  operational_flags  JSONB DEFAULT '{}',
  stm_summary   TEXT,
  ltm_summary   TEXT,
  consent_given          BOOLEAN DEFAULT false,
  data_retention_until   TIMESTAMP,
  updated_at    TIMESTAMP DEFAULT NOW()
);

CREATE TABLE conversation_summaries (
  id                UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id           UUID REFERENCES user_profiles(user_id),
  tenant_id         UUID REFERENCES tenants(id),
  conversation_date DATE NOT NULL,
  summary           TEXT NOT NULL,
  key_topics        TEXT[],
  sentiment         TEXT,
  action_items      JSONB DEFAULT '[]',
  embedding         VECTOR(768),
  created_at        TIMESTAMP DEFAULT NOW()
);

5.4.2 Hard Fields Reference (Medical)

safety_fields (medical)

drug_allergies[] (CRITICAL)
pregnancy_status (CRITICAL)
g6pd_status (CRITICAL)
anticoagulant_use (CRITICAL)
ckd_status, ckd_stage (CRITICAL)
history_anaphylaxis (CRITICAL)
chronic_conditions[]
psychiatric_history, cardiac_history_major

preferences (medical)

service_preference: home_visit / clinic / telemedicine
budget_band: low / mid / high / luxury
needle_anxiety
communication_style
preferred_payment_method

usage_history (medical)

iv_sessions_90d (computed)
last_iv_protocol
last_iv_reaction
pharmacy_orders_30d (computed)
lab_tests_90d (computed)

operational_flags (medical)

vip_tier: standard / silver / gold / platinum
no_show_flag, no_show_count_90d
risk_flags_computed[]
consent_medical_data

5.4.3 Domain Template Pattern (Lintas Industri)

Setiap domain mengikuti 4 bucket yang sama, hanya field definition yang berbeda.

MEDICAL: safety_fields, preferences, usage_history, operational_flags
AUTOMOTIVE: safety_fields, preferences, usage_history, operational_flags
RETAIL / COMMERCE: safety_fields, preferences, usage_history, operational_flags

5.4.4 LTM Pipeline (Hybrid: Live + Scheduled)

Mode 1 — Live Update (Real-Time Capture)

Triggered saat percakapan berjalan (critical safety info, explicit preference, STM mendekati limit).
Event-driven, cost-aware (single extraction call, additive merge, no silent overwrite).

Mode 2 — Scheduled Batch Updates (Cron)

DAILY (01:00 WITA)

Ambil percakapan kemarin
Buat summary 2-3 kalimat (Gemini Flash)
Extract hard fields yang missed
Generate embedding
Insert conversation_summaries
Update stm_summary di user_profiles
Compute usage_history (rolling 30/90 hari)

WEEKLY (Sunday 02:00 WITA)

Recalculate operational_flags (vip, loyalty, risk)
Apply retention policy (delete/expire)
Weekly activity summary (tenant dashboard)

MONTHLY (1st 03:00 WITA)

Ambil summary 3 bulan terakhir
Synthesize ltm_summary (Thinking model, max 200 kata)
Update ltm_summary
Archive summary lama (cold storage)

Mode 3 — Onboarding: Tenant LTM Structure Configuration

JSON
{
  "tenant_id": "tenant_reviv_bali",
  "structure": {
    "safety_fields": {
      "drug_allergies": { "type": "array", "critical": true },
      "pregnancy_status": { "type": "boolean", "critical": true },
      "g6pd_status": { "type": "enum", "values": ["normal", "deficient", "unknown"], "critical": true }
    },
    "preferences": {
      "service_preference": { "type": "enum", "values": ["home_visit", "clinic", "telemedicine"] },
      "budget_band": { "type": "enum", "values": ["low", "mid", "high", "luxury"] }
    },
    "usage_history": {
      "iv_sessions_90d": { "type": "integer", "computed": true },
      "last_iv_protocol": { "type": "string" }
    },
    "operational_flags": {
      "vip_tier": { "type": "enum", "values": ["standard", "silver", "gold", "platinum"] }
    }
  },
  "version": "1.0"
}

5.4.5 LTM Injection at Conversation Start

L1 Orchestrator:

Pull hard fields dari user_profiles.
Pull stm_summary.
Semantic search conversation_summaries (top 3).
Pull ltm_summary bila tersedia.
Compose context string.
Inject ke L1 system prompt.

Target injected context: ~500-1500 tokens (vs 5000+ jika full history).

5.4.6 Why This Matters

Token efficiency: hemat 70-85% untuk sebagian besar percakapan.
Better personalization: critical info baru langsung dipakai.
Tenant control: field yang di-track bisa dikonfigurasi.
Compliance ready: consent + retention baked-in.

5.5 Messaging Pipeline Detail

Pipeline memastikan reliability dan multimodal support.

Flow: Webhook -> Idempotency Guard -> Conflict Detector -> Smart Buffer -> Tenant Loader -> L1

Komponen utama:

Idempotency Guard: mencegah duplicate processing.
Admin Conflict Detector: mencegah double reply admin vs AI.
Smart Buffer: gabungkan text + image + audio dalam burst window.

5.6 Data Layer

Supabase sebagai core backbone.

Menyimpan:

Tenant configuration
User profiles & LTM
Execution logs
Monitoring metrics
Wizard sessions
Idempotency keys
Message buffers

Semua tabel tenant-scoped dengan RLS aktif.

5.7 Monitoring & Alert System

Monitoring adalah backbone operational excellence.

Granularity

Track per workflow, per node, dan per tenant.

Proactivity

Deteksi issue sebelum user mengeluh.

Actionability

Alert harus memicu tindakan, bukan jadi noise.

5.7.1 Data Points yang Di-Track

Level 1 — Per Workflow Execution

Metric	Deskripsi	Threshold / Alert
execution_id	Unique ID trace end-to-end	-
tenant_id	Tenant pemicu workflow	-
user_id	User berinteraksi	-
workflow_name	Workflow / domain agent	-
total_duration_ms	Webhook ke response sent	>15s WARNING, >30s CRITICAL
overall_status	success / partial_success / failed	failed = CRITICAL
total_cost_usd	Akumulasi cost LLM + API	>$0.50 WARNING
total_tokens	Total token semua LLM call	>50K INVESTIGATE
channel	whatsapp / web / sms	-
response_sent	Response berhasil dikirim	false = CRITICAL

Level 2 — Per Node Process

Metric	Deskripsi	Threshold / Alert
node_id	Node name di workflow	-
node_type	llm_call / db_query / tool_execution / api_call	-
duration_ms	Waktu eksekusi node	LLM >8s, DB >2s, Tool >5s = WARNING
cost_usd	Cost per node	-
status	success / retry / failed	failed = INVESTIGATE
retry_count	Jumlah retry	>2 = INVESTIGATE
error_message	Error detail	-
output_size_bytes	Ukuran output	>500KB = INVESTIGATE

Level 3 — Tenant Aggregation

Metric	Deskripsi	Alert	Tujuan
conversations_today	Total percakapan hari ini	>quota = soft notification	Cost control
conversations_mtd	Total month-to-date	approaching quota = upsell signal	Revenue
total_cost_mtd	Total biaya bulan ini	>expected = investigate	Margin
avg_response_time	P50/P95/P99	P95 >10s = degradation	SLA
closing_rate_mtd	Conversations -> closings	<15% = quality issue	Quality
error_rate_1h	% failed execution (1h)	>1% WARN, >5% CRIT	Reliability

5.7.2 Alert System Architecture

Sistem alert mengevaluasi metrik secara periodik, melakukan dedupe, lalu routing ke kanal notifikasi sesuai severity.

5.7.3 Severity Levels & Routing

Severity	Condition (Examples)	Notification Channel	Response SLA
CRITICAL	Error rate >5% (1h), DB down, response_sent=false (10 min)	WhatsApp Ops + Email + PagerDuty/Ticket	<5 menit
WARNING	Error rate >1%, P95 >10s, cost spike >150%, quota approaching	Email Ops + Dashboard	<30 menit
INFO	Deployment done, backup success, daily summary	Dashboard only	-

5.7.4 Alert Message Template (WhatsApp Ops Group)

TEXT
🚨 CRITICAL ALERT

Platform: AI CSI Production
Severity: CRITICAL
Time: 2026-02-14 15:32:18 WITA

Issue: Error rate 8.3% (threshold: 5%)
Affected: All tenants
Duration: Last 15 minutes

Details:
- Failed executions: 47 / 567
- Primary error: "Gemini API timeout"
- Affected workflow: main_cs_agent

Action Required: Investigate Gemini API status

Dashboard: https://monitor.csi.ai/alerts/ALT-20260214-001
Runbook: https://docs.csi.ai/runbooks/gemini-timeout

On-Call: @MasIsan @AGIA

5.7.5 Integrations (Ticketing & Ops Workflow)

Supported integrations:

Linear (API)
Jira (Webhook)
Asana (API)
Slack (Webhook)
WhatsApp (Waha API)
Email (SMTP)

5.7.6 SLA Monitoring & Enforcement

Metric	Target	Measurement	Penalty / Action
Uptime	99.5%	Monthly	Credit 10% MRR
Response Time (P95)	<10 detik	Per conversation	Internal monitoring
Error Rate	<0.5%	Per day	Investigation required
Critical Alert Response	<5 menit	Per incident	Escalation if breach
Ticket Resolution (P0)	<4 jam	Per ticket	Post-mortem required

5.8 QA Automation

QA otomatis terjadwal untuk menjaga stabilitas tanpa testing manual harian.

Fokus QA:

Stabilitas teknis (latency, error, tool failure)
Konsistensi persona & tone
Akurasi business logic
Ketahanan edge cases
Kualitas LTM recall

5.8.1 Daily Synthetic Test (N8N Cron — 02:00 WITA)

For each active tenant -> for each configured agent:

Load test scenarios (test_scenarios)
Send simulated message
Capture response
Validate behavior
Log result

Jika test gagal:

Mark agent as degraded
Send alert to ops
Auto-create incident ticket
Notify tenant admin jika persist >24 jam

5.8.2 QA Metrics & Logging

Field utama di qa_test_results:

tenant_id
agent_name
scenario_id
response_time_ms
validation_passed
failure_reason
created_at

5.8.3 QA Dashboard Example

Daily report:

Pass rate per agent
Trend response time
Agent degraded list
Rekomendasi prompt tuning

5.8.4 Weekly Deep Testing

Includes:

Load testing (100 concurrent conversations / tenant)
Edge-case messages
Integration testing (booking -> payment -> confirmation)
LTM recall test
Tool chaining validation

5.9 Prompt Structure Hierarchy (L1 / L2 / L3)

5.9.1 Design Philosophy — Layered Responsibility

Prinsip utama:

L1 tidak boleh tahu detail domain spesifik
L2 tidak boleh duplicate orchestration logic
L3 tidak boleh override safety rules L1
L1 selalu punya kontrol akhir response

5.9.2 L1 — Main Orchestrator Prompt

Purpose: routing, context management, safety enforcement, tone consistency.

L1 owns:

Conversation flow
Tool invocation
Personalization
Formatting
Human escalation

5.9.3 L2 — Sub Agent-Level Prompts

Purpose: execute specific tasks with precision (HOW).

L2 returns structured output (JSON). L1 mengubah output ini menjadi conversational response.

5.9.4 L3 — Hyper-Specialist Prompts (Optional)

Dipakai hanya untuk kasus kompleks saat dipanggil explicit oleh L2.

5.9.5 Hierarchy Decision Flow

User Message -> L1 -> L2 (if needed) -> L3 (if escalated) -> back to L2 -> back to L1 -> send response.

5.10 Human-AI Collaboration Architecture

The Handoff Problem

AI tidak akan handle 100% kasus secara sempurna. Handoff ke human harus seamless tanpa kehilangan context.

Prinsip desain:

Seamless transition (no reset)
Full context preservation
Clear ownership (AI vs Human)
Reversible control

Toggle Architecture — AI ON/OFF Per User

State logic:

AI Active -> AI merespons otomatis
Admin Mode -> AI stop, human takeover
Auto-Paused -> sistem mendeteksi admin aktif

SQL
CREATE TABLE user_ai_mode (
  user_id UUID PRIMARY KEY,
  tenant_id UUID NOT NULL,
  ai_enabled BOOLEAN DEFAULT true,
  disabled_by_admin_id UUID,
  disabled_at TIMESTAMP,
  reason TEXT,
  auto_resume_at TIMESTAMP
);

CREATE INDEX idx_ai_mode_lookup ON user_ai_mode(user_id, tenant_id);

AI Gate Logic

Sebelum L1 dieksekusi:

Query user_ai_mode
Jika ai_enabled=false -> log message, skip AI, human handle
Jika ai_enabled=true -> lanjut ke agent

Context Preservation — Memory Continuity

Semua pesan dicatat, baik AI maupun human.

SQL
CREATE TABLE messages (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL,
  tenant_id UUID NOT NULL,
  direction TEXT NOT NULL,
  source TEXT NOT NULL,
  message_text TEXT NOT NULL,
  handled_by TEXT,
  ai_paused_auto BOOLEAN DEFAULT false,
  admin_id UUID,
  trace_id UUID,
  cost_usd DECIMAL(10, 6),
  tokens_used INTEGER,
  created_at TIMESTAMP DEFAULT NOW()
);

Operational guarantees:

No message loss
No double reply
No context reset after resume
Reversible control anytime
Clear audit trail

5.11 RAG & Knowledge Retrieval Architecture

5.11.1 Problem Statement

Masalah implementasi awal:

Chunk terlalu kecil (~1,000 chars)
topK rendah (4)
Clinical reasoning chain terpecah
Main agent terlalu monolithic (42K+ words prompt)

Dampak:

Continuity rendah
Reasoning depth tidak stabil

5.11.2 Phase 1 — Improved Embedding RAG

Configuration upgrade:

Chunk size ~8,000 chars (~1,024 tokens)
Overlap 200-500 chars
topK Reviv = 15
topK agent lain = 4

Outcome:

Better semantic grouping
Reduced reasoning break
Tactical improvement tanpa overhaul

5.11.3 Phase 2 — Deterministic Full-Document Loading

Flow:

Orchestrator extract keyword
Keyword match ke JSON tag mapping
Load full KB file (bukan chunk)
Inject 6-10 dokumen relevan ke reasoning agent
Embedding RAG tetap sebagai fallback

5.11.4 Tag-Based Retrieval Structure

Contoh tag di KB file:

htn
cardiovascular
hypertension workup
lab panel
chronic disease

5.11.5 Multi-Agent Compatibility

RAG dapat dipanggil oleh orchestrator, domain agent, maupun sub-specialist.

5.11.6 Trade-Off Analysis

Approach	Continuity	Determinism	Flexibility	Risk
Small chunk + topK 4	Low	Low	High	High miss
8K chunk + topK 15	Medium-High	Medium	High	Acceptable
Full-doc + tag match	Very High	High	Medium	Low miss

Rekomendasi:

Phase 1: Re-embed + overlap tuning (immediate)
Phase 2: Keyword -> full doc loading (strategic)

5.11.7 Monitoring & Token Consideration

Wajib ada:

Per-agent token tracking
Historical usage dashboard
Cost per execution logging
Model routing optimization

5.11.8 Final Positioning

RAG platform = hybrid retrieval system:

Deterministic tag matching (primary)
Full document loading (primary reasoning context)
Embedding semantic recall (fallback)
Multi-agent orchestrated reasoning

Dengan pendekatan ini, platform siap untuk medical-grade reasoning, domain regulasi tinggi, dan skala tanpa kehilangan kualitas.