πŸ•Œ CPS Platform β€” Knowledge Hub

spiritualmessage.org β€” Internal Documentation & Status
v0.18.6 β€” Sefaria Sprint ● Live Updated: March 15, 2026
"A seeker anywhere in the world types a question β€” and receives an answer grounded in Maulana's actual words, with citation they can verify."
β€” The North Star Β· VISION.md
πŸ“Š Corpus Status As of March 15, 2026
9
Books fully processed
of 145 total (6%)
7,397
Enriched paragraphs
glance_text + viral_score
2,256
Videos in DB
11 fully enriched
123K
Video segments
cron OFF β€” pending test
914
Quran cross-refs
528 unique verses
143
Topic categories
semantic search active
Books indexed9 / 145 (6%)
Videos enriched11 / 2,256 (0.5%)
Paragraphs with viral_score5,551 / 7,397 (75%)
⚑ Running Services
Ask Maulana β€” Public Chatbot
ask.spiritualmessage.org Β· 7-strategy retrieval Β· 9 books + 2,256 videos
ask.spiritualmessage.org
PUBLIC LIVE
Annotation Tool v2
port 5000 Β· annotate.spiritualmessage.org Β· admin only
annotate.spiritualmessage.org
ACTIVE
Book Reader
public paragraph URLs Β· /read/{slug}/{ch}/{para} Β· no login
annotate.spiritualmessage.org/read/{slug}/{ch}/{para}
NEW v0.18.0
AI Policy Page
public transparency page Β· attribution rules Β· citation explanation
annotate.spiritualmessage.org/ai-policy
NEW v0.18.5
RagFlow
port 9380 Β· chat.spiritualmessage.org Β· cpsglobal.org
PRODUCTION β€” don't touch
LightRAG Annotations
port 9622 Β· single knowledge graph source
ACTIVE
LiveKit Voice Agent
port 7880 Β· livekit.spiritualmessage.org
ACTIVE
Video Cron
OFF β€” pending 3-video end-to-end test (1 EN + 1 UR + 1 mixed)
CRON OFF
πŸ† Sefaria Sprint β€” March 15, 2026 7 points delivered Β· 18-point plan
v0.18.0
P1 β€” Stable Paragraph Ref IDs + Public Read URLs
Format: {book-slug}:{ch-order}:{para-order} Β· All 7,397 paras addressable Β· Zero migration Β· Inspired by Sefaria sefaria.org/Genesis.1.1
v0.18.1
P6 β€” Activated 3 Unused Centric Types in Chatbot
argument_centric (10 issues) + creation_plan_centric (14 aspects) + timeline_centric (5 eras) β€” were built but never called
v0.18.2
P6 β€” Semantic Centric Matching (Industry Standard)
Replaced keyword detection with embedding similarity Β· "Can Muslims vote?" β†’ political_islam Β· 8/8 hard paraphrase queries pass
v0.18.3
P14 β€” AI Tone Policy + Attribution Rules
"You are Maulana" β†’ "You are a guide presenting his teachings" Β· Never "Islam says X" Β· Always "Maulana argues..."
v0.18.4
P15 β€” Failed Citation Flag [CITATION_UNVERIFIED]
Bad citations no longer silently stripped · Visible orange ⚠ badge · Hover tooltip · Transparency over clean appearance
v0.18.5
P10 β€” AI Trust Layer (Feedback + Policy Page)
"⚠ Bad citation" reason button · AI disclosure badge on every answer · /ai-policy public page
v0.18.6
P5 β€” Topic Citations Ranked by viral_score
Highest-quality paragraphs surface first Β· 1 batch DB query per topic Β· 75% coverage (5,551 paras have viral_score)
🎯 What's Next β€” Priority Order
1
Upload 1 new book β†’ verify full enrichment
Done when: glance_text βœ“, viral_score βœ“, ref URL returns 200, chatbot finds it, centric exports rebuilt
2
Test 3 videos with Soniox API (1 EN + 1 UR + 1 mixed)
Done when: transcription complete, segments in DB, LEARNINGS.md updated
3
Enable video cron
Only after 3-video test passes Β· Soniox STT β†’ Fireflies fallback
4
Process remaining 136 books
Upload β†’ Phase B enrichment β†’ LightRAG β†’ centric exports auto-rebuild
5
Build 50-question Islamic eval dataset (P9)
Foundation for measuring if any improvement is actually working
6
Dual-LLM citation validation (P4)
After answer generation, cheap LLM scores each cited para 1-10. Flag ≀5
7
Update centric exporters to include enriched video segments
argument_centric + creation_plan_centric currently books only
πŸ“ 18-Point Sefaria Comparison Plan RESEARCH/EXPLORATION-PLAN-18-POINTS.md
βœ“
P1 β€” Stable Ref IDs + public paragraph URLs
Done v0.18.0 Β· March 15, 2026
2
P2 β€” Daily Telegram cron (7am IST best_quote)
Deferred Β· All ingredients ready (bot token, chat_id, viral_score)
3
P3 β€” Detection accuracy benchmark (NER F-score)
After full corpus Β· quran_detector + hadith_detector baseline
4
P4 β€” Dual-LLM citation validation
Score each cited para 1-10 Β· Flag ≀5 for review
βœ“
P5 β€” Rank topic citations by viral_score
Done v0.18.6 Β· March 15, 2026
βœ“
P6 β€” Activate 3 unused centric types + semantic matching
Done v0.18.1 + v0.18.2 Β· March 15, 2026
7
P7 β€” Citation verifiability (click to verify)
Partially done via /read/{slug}/{ch}/{para} URLs Β· frontend link pending
8
P8 β€” Embedding benchmark (E5 vs Gemini vs Qwen3)
After full corpus
9
P9 β€” 50-question Islamic eval dataset
Foundation for all quality measurement
βœ“
P10 β€” AI trust layer (feedback + policy page)
Done v0.18.5 Β· March 15, 2026
11
P11 β€” Scholar review workflow
enrichment_review table + /review/enrichment UI
12
P12 β€” Link type taxonomy on Reference table
quotation / commentary / discussion / mention
13
P13 β€” PageRank weight on paragraphs
citation_count = appearances across all 6 centric exports Β· blend into reranking
βœ“
P14 β€” AI tone policy in system prompt
Done v0.18.3 Β· March 15, 2026
βœ“
P15 β€” Failed citation flag [CITATION_UNVERIFIED]
Done v0.18.4 Β· March 15, 2026
16
P16 β€” Islamic calendar (Ramadan + Juma content)
After Telegram cron is running
17
P17 β€” Read-only public API
/api/v1/paragraphs Β· /api/v1/topics Β· /api/v1/search
18
P18 β€” External Linker JS
linker.js for Islamic websites to auto-link Quran/Hadith citations Β· Like Sefaria Linker on 150+ sites
πŸ“‹ Project Context /root/critique/
πŸ”¬ Research Documents /root/critique/RESEARCH/ Β· Created March 15, 2026
πŸ›οΈ Architecture Decision Records NASA / Google / AWS standard Β· /docs/decisions/
001
SQLite over PostgreSQL
Zero config, nightly backup, sufficient for scale
002
Claude Code as Pipeline LLM
Zero API cost for /process and /process_video
003
LightRAG insert_custom_kg()
We control chunking + entity extraction quality
004
512–800 Token Keyword Grouping
Respects chapter boundaries, semantically coherent
005
OpenRouter + DeepSeek Fallback
Cost resilience when primary model unavailable
006
Docker COPY not Bind Mounts
Note: App dir IS bind-mounted for live reload
007
Embedding Similarity over Keywords for Centric Matching
Industry standard Β· match_argument_by_embedding() Β· same pattern as topic_centric
πŸ“š Technical Documentation /root/annotation_tool_v2/docs/
πŸ–₯️ Infrastructure
🌍
Hetzner Cloud
Helsinki Β· 32GB RAM Β· 301GB disk Β· ~$20/mo Β· 147GB used (51%)
πŸ’Ύ
Backup
Nightly β†’ Storage Box BX11 Β· Weekly snapshots
πŸ”’
SSL
Certbot Β· All subdomains Β· Auto-renew
🐳
Docker
annotation_tool_v2 Β· ragflow Β· lightrag Β· memgraph Β· livekit
πŸ“‘
Telegram Bot
@junaid2bot Β· chat_id 1301565858 Β· file transfer + git commands
πŸ”€
Nginx
Reverse proxy Β· 6 subdomains Β· SSL termination