π CPS Platform β Knowledge Hub
spiritualmessage.org β Internal Documentation & Status
v0.18.6 β Sefaria Sprint
β Live
Updated: March 15, 2026
"A seeker anywhere in the world types a question β and receives an answer grounded in Maulana's actual words, with citation they can verify."
β The North Star Β· VISION.md
π
Corpus Status
As of March 15, 2026
9
Books fully processed
of 145 total (6%)
7,397
Enriched paragraphs
glance_text + viral_score
2,256
Videos in DB
11 fully enriched
123K
Video segments
cron OFF β pending test
914
Quran cross-refs
528 unique verses
143
Topic categories
semantic search active
Books indexed
9 / 145 (6%)
Videos enriched
11 / 2,256 (0.5%)
Paragraphs with viral_score
5,551 / 7,397 (75%)
β‘
Running Services
Ask Maulana β Public Chatbot
ask.spiritualmessage.org Β· 7-strategy retrieval Β· 9 books + 2,256 videos
ask.spiritualmessage.org
PUBLIC LIVE
Annotation Tool v2
port 5000 Β· annotate.spiritualmessage.org Β· admin only
annotate.spiritualmessage.org
ACTIVE
Book Reader
public paragraph URLs Β· /read/{slug}/{ch}/{para} Β· no login
annotate.spiritualmessage.org/read/{slug}/{ch}/{para}
NEW v0.18.0
AI Policy Page
public transparency page Β· attribution rules Β· citation explanation
annotate.spiritualmessage.org/ai-policy
NEW v0.18.5
RagFlow
port 9380 Β· chat.spiritualmessage.org Β· cpsglobal.org
PRODUCTION β don't touch
LightRAG Annotations
port 9622 Β· single knowledge graph source
ACTIVE
LiveKit Voice Agent
port 7880 Β· livekit.spiritualmessage.org
ACTIVE
Video Cron
OFF β pending 3-video end-to-end test (1 EN + 1 UR + 1 mixed)
CRON OFF
π
Sefaria Sprint β March 15, 2026
7 points delivered Β· 18-point plan
v0.18.0
P1 β Stable Paragraph Ref IDs + Public Read URLs
Format: {book-slug}:{ch-order}:{para-order} Β· All 7,397 paras addressable Β· Zero migration Β· Inspired by Sefaria sefaria.org/Genesis.1.1
v0.18.1
P6 β Activated 3 Unused Centric Types in Chatbot
argument_centric (10 issues) + creation_plan_centric (14 aspects) + timeline_centric (5 eras) β were built but never called
v0.18.2
P6 β Semantic Centric Matching (Industry Standard)
Replaced keyword detection with embedding similarity Β· "Can Muslims vote?" β political_islam Β· 8/8 hard paraphrase queries pass
v0.18.3
P14 β AI Tone Policy + Attribution Rules
"You are Maulana" β "You are a guide presenting his teachings" Β· Never "Islam says X" Β· Always "Maulana argues..."
v0.18.4
P15 β Failed Citation Flag [CITATION_UNVERIFIED]
Bad citations no longer silently stripped Β· Visible orange β badge Β· Hover tooltip Β· Transparency over clean appearance
v0.18.5
P10 β AI Trust Layer (Feedback + Policy Page)
"β Bad citation" reason button Β· AI disclosure badge on every answer Β· /ai-policy public page
v0.18.6
P5 β Topic Citations Ranked by viral_score
Highest-quality paragraphs surface first Β· 1 batch DB query per topic Β· 75% coverage (5,551 paras have viral_score)
π―
What's Next β Priority Order
1
Upload 1 new book β verify full enrichment
Done when: glance_text β, viral_score β, ref URL returns 200, chatbot finds it, centric exports rebuilt
2
Test 3 videos with Soniox API (1 EN + 1 UR + 1 mixed)
Done when: transcription complete, segments in DB, LEARNINGS.md updated
3
Enable video cron
Only after 3-video test passes Β· Soniox STT β Fireflies fallback
4
Process remaining 136 books
Upload β Phase B enrichment β LightRAG β centric exports auto-rebuild
5
Build 50-question Islamic eval dataset (P9)
Foundation for measuring if any improvement is actually working
6
Dual-LLM citation validation (P4)
After answer generation, cheap LLM scores each cited para 1-10. Flag β€5
7
Update centric exporters to include enriched video segments
argument_centric + creation_plan_centric currently books only
π
18-Point Sefaria Comparison Plan
RESEARCH/EXPLORATION-PLAN-18-POINTS.md
β
P1 β Stable Ref IDs + public paragraph URLs
Done v0.18.0 Β· March 15, 2026
2
P2 β Daily Telegram cron (7am IST best_quote)
Deferred Β· All ingredients ready (bot token, chat_id, viral_score)
3
P3 β Detection accuracy benchmark (NER F-score)
After full corpus Β· quran_detector + hadith_detector baseline
4
P4 β Dual-LLM citation validation
Score each cited para 1-10 Β· Flag β€5 for review
β
P5 β Rank topic citations by viral_score
Done v0.18.6 Β· March 15, 2026
β
P6 β Activate 3 unused centric types + semantic matching
Done v0.18.1 + v0.18.2 Β· March 15, 2026
7
P7 β Citation verifiability (click to verify)
Partially done via /read/{slug}/{ch}/{para} URLs Β· frontend link pending
8
P8 β Embedding benchmark (E5 vs Gemini vs Qwen3)
After full corpus
9
P9 β 50-question Islamic eval dataset
Foundation for all quality measurement
β
P10 β AI trust layer (feedback + policy page)
Done v0.18.5 Β· March 15, 2026
11
P11 β Scholar review workflow
enrichment_review table + /review/enrichment UI
12
P12 β Link type taxonomy on Reference table
quotation / commentary / discussion / mention
13
P13 β PageRank weight on paragraphs
citation_count = appearances across all 6 centric exports Β· blend into reranking
β
P14 β AI tone policy in system prompt
Done v0.18.3 Β· March 15, 2026
β
P15 β Failed citation flag [CITATION_UNVERIFIED]
Done v0.18.4 Β· March 15, 2026
16
P16 β Islamic calendar (Ramadan + Juma content)
After Telegram cron is running
17
P17 β Read-only public API
/api/v1/paragraphs Β· /api/v1/topics Β· /api/v1/search
18
P18 β External Linker JS
linker.js for Islamic websites to auto-link Quran/Hadith citations Β· Like Sefaria Linker on 150+ sites
π
Project Context
/root/critique/
π
VISION.md
Mission, north star, guiding principles
π
CURRENT.md
v0.18.6 status Β· what's done Β· what's next
βοΈ
DECISIONS.md
Architectural decisions with reasoning
π₯
LEARNINGS.md
Hard-won lessons β never repeat these
π
SESSION_CONTEXT.md
AI session handover β read first every session
π€
HANDOVER.md
Server access Β· passwords template Β· bus factor
π¬
Research Documents
/root/critique/RESEARCH/ Β· Created March 15, 2026
π
18-Point Action Plan
Sefaria vs us Β· gap analysis Β· exact done criteria per point
46KB
ποΈ
Sefaria Comparison
Where we beat Sefaria Β· where they beat us Β· 9-stage analysis
26KB
π§
Pipeline Documentation
Evidence-based Β· code-verified Β· every strategy documented
25KB
π
Sefaria Research V2
Corrected facts Β· 775K users Β· 3.3M links Β· Daf Yomi ecosystem
17KB
ποΈ
File Directory
Every file in annotation_tool_v2 with what it does
23KB
πΈοΈ
Interlinking Architecture
6 centric exporters explained Β· which are active vs dormant
9KB
π§
Enrichment Master Plan
The single pillar (source_chain) + 7-phase roadmap Β· Sefaria comparison Β· March 19 2026
13KB
ποΈ
Architecture Decision Records
NASA / Google / AWS standard Β· /docs/decisions/
001
SQLite over PostgreSQL
Zero config, nightly backup, sufficient for scale
002
Claude Code as Pipeline LLM
Zero API cost for /process and /process_video
003
LightRAG insert_custom_kg()
We control chunking + entity extraction quality
004
512β800 Token Keyword Grouping
Respects chapter boundaries, semantically coherent
005
OpenRouter + DeepSeek Fallback
Cost resilience when primary model unavailable
006
Docker COPY not Bind Mounts
Note: App dir IS bind-mounted for live reload
007
Embedding Similarity over Keywords for Centric Matching
Industry standard Β· match_argument_by_embedding() Β· same pattern as topic_centric
π
Technical Documentation
/root/annotation_tool_v2/docs/
πΊοΈ
GUIDE.md
Developer manual β start here
ποΈ
ARCHITECTURE.md
System diagram + data flow
π¬
VIDEO_PIPELINE.md
Pipeline steps, states, recovery
βοΈ
WAR_STORIES.md
NASA-style post-incident log
NASA style
π€
MODEL_SELECTION.md
Per-step model routing guide
ποΈ
DATABASE.md
Schema, migrations, backups
βοΈ
PROCESS_COMMAND.md
/process command internals
π
API_REFERENCE.md
All endpoints documented
π
CHANGELOG.md
v0.18.6 Β· rollback commands per version
π₯οΈ
Infrastructure
π
Hetzner Cloud
Helsinki Β· 32GB RAM Β· 301GB disk Β· ~$20/mo Β· 147GB used (51%)
πΎ
Backup
Nightly β Storage Box BX11 Β· Weekly snapshots
π
SSL
Certbot Β· All subdomains Β· Auto-renew
π³
Docker
annotation_tool_v2 Β· ragflow Β· lightrag Β· memgraph Β· livekit
π‘
Telegram Bot
@junaid2bot Β· chat_id 1301565858 Β· file transfer + git commands
π
Nginx
Reverse proxy Β· 6 subdomains Β· SSL termination