CPS Platform — Knowledge Hub

📊 Corpus Status As of March 15, 2026

9

Books fully processed

of 145 total (6%)

7,397

Enriched paragraphs

glance_text + viral_score

2,256

Videos in DB

11 fully enriched

123K

Video segments

cron OFF — pending test

914

Quran cross-refs

528 unique verses

143

Topic categories

semantic search active

Books indexed9 / 145 (6%)

Videos enriched11 / 2,256 (0.5%)

Paragraphs with viral_score5,551 / 7,397 (75%)

⚡ Running Services

Ask Maulana — Public Chatbot

ask.spiritualmessage.org · 7-strategy retrieval · 9 books + 2,256 videos

ask.spiritualmessage.org

PUBLIC LIVE

Annotation Tool v2

port 5000 · annotate.spiritualmessage.org · admin only

annotate.spiritualmessage.org

ACTIVE

Book Reader

public paragraph URLs · /read/{slug}/{ch}/{para} · no login

annotate.spiritualmessage.org/read/{slug}/{ch}/{para}

NEW v0.18.0

AI Policy Page

public transparency page · attribution rules · citation explanation

annotate.spiritualmessage.org/ai-policy

NEW v0.18.5

RagFlow

port 9380 · chat.spiritualmessage.org · cpsglobal.org

PRODUCTION — don't touch

LightRAG Annotations

port 9622 · single knowledge graph source

ACTIVE

LiveKit Voice Agent

port 7880 · livekit.spiritualmessage.org

ACTIVE

Video Cron

OFF — pending 3-video end-to-end test (1 EN + 1 UR + 1 mixed)

CRON OFF

🏆 Sefaria Sprint — March 15, 2026 7 points delivered · 18-point plan

v0.18.0

P1 — Stable Paragraph Ref IDs + Public Read URLs

Format: {book-slug}:{ch-order}:{para-order} · All 7,397 paras addressable · Zero migration · Inspired by Sefaria sefaria.org/Genesis.1.1

v0.18.1

P6 — Activated 3 Unused Centric Types in Chatbot

argument_centric (10 issues) + creation_plan_centric (14 aspects) + timeline_centric (5 eras) — were built but never called

v0.18.2

P6 — Semantic Centric Matching (Industry Standard)

Replaced keyword detection with embedding similarity · "Can Muslims vote?" → political_islam · 8/8 hard paraphrase queries pass

v0.18.3

P14 — AI Tone Policy + Attribution Rules

"You are Maulana" → "You are a guide presenting his teachings" · Never "Islam says X" · Always "Maulana argues..."

v0.18.4

P15 — Failed Citation Flag [CITATION_UNVERIFIED]

Bad citations no longer silently stripped · Visible orange ⚠ badge · Hover tooltip · Transparency over clean appearance

v0.18.5

P10 — AI Trust Layer (Feedback + Policy Page)

"⚠ Bad citation" reason button · AI disclosure badge on every answer · /ai-policy public page

v0.18.6

P5 — Topic Citations Ranked by viral_score

Highest-quality paragraphs surface first · 1 batch DB query per topic · 75% coverage (5,551 paras have viral_score)

🎯 What's Next — Priority Order

1

Upload 1 new book → verify full enrichment

Done when: glance_text ✓, viral_score ✓, ref URL returns 200, chatbot finds it, centric exports rebuilt

2

Test 3 videos with Soniox API (1 EN + 1 UR + 1 mixed)

Done when: transcription complete, segments in DB, LEARNINGS.md updated

3

Enable video cron

Only after 3-video test passes · Soniox STT → Fireflies fallback

4

Process remaining 136 books

Upload → Phase B enrichment → LightRAG → centric exports auto-rebuild

5

Build 50-question Islamic eval dataset (P9)

Foundation for measuring if any improvement is actually working

6

Dual-LLM citation validation (P4)

After answer generation, cheap LLM scores each cited para 1-10. Flag ≤5

7

Update centric exporters to include enriched video segments

argument_centric + creation_plan_centric currently books only

📐 18-Point Sefaria Comparison Plan RESEARCH/EXPLORATION-PLAN-18-POINTS.md

✓

P1 — Stable Ref IDs + public paragraph URLs

Done v0.18.0 · March 15, 2026

2

P2 — Daily Telegram cron (7am IST best_quote)

Deferred · All ingredients ready (bot token, chat_id, viral_score)

3

P3 — Detection accuracy benchmark (NER F-score)

After full corpus · quran_detector + hadith_detector baseline

4

P4 — Dual-LLM citation validation

Score each cited para 1-10 · Flag ≤5 for review

✓

P5 — Rank topic citations by viral_score

Done v0.18.6 · March 15, 2026

✓

P6 — Activate 3 unused centric types + semantic matching

Done v0.18.1 + v0.18.2 · March 15, 2026

7

P7 — Citation verifiability (click to verify)

Partially done via /read/{slug}/{ch}/{para} URLs · frontend link pending

8

P8 — Embedding benchmark (E5 vs Gemini vs Qwen3)

After full corpus

9

P9 — 50-question Islamic eval dataset

Foundation for all quality measurement

✓

P10 — AI trust layer (feedback + policy page)

Done v0.18.5 · March 15, 2026

11

P11 — Scholar review workflow

enrichment_review table + /review/enrichment UI

12

P12 — Link type taxonomy on Reference table

quotation / commentary / discussion / mention

13

P13 — PageRank weight on paragraphs

citation_count = appearances across all 6 centric exports · blend into reranking

✓

P14 — AI tone policy in system prompt

Done v0.18.3 · March 15, 2026

✓

P15 — Failed citation flag [CITATION_UNVERIFIED]

Done v0.18.4 · March 15, 2026

16

P16 — Islamic calendar (Ramadan + Juma content)

After Telegram cron is running

17

P17 — Read-only public API

/api/v1/paragraphs · /api/v1/topics · /api/v1/search

18

P18 — External Linker JS

linker.js for Islamic websites to auto-link Quran/Hadith citations · Like Sefaria Linker on 150+ sites

📋 Project Context /root/critique/

🌟

VISION.md

Mission, north star, guiding principles

📍

CURRENT.md

v0.18.6 status · what's done · what's next

⚖️

DECISIONS.md

Architectural decisions with reasoning

🔥

LEARNINGS.md

Hard-won lessons — never repeat these

🔄

SESSION_CONTEXT.md

AI session handover — read first every session

🤝

HANDOVER.md

Server access · passwords template · bus factor

🔬 Research Documents /root/critique/RESEARCH/ · Created March 15, 2026

📐

18-Point Action Plan

Sefaria vs us · gap analysis · exact done criteria per point

46KB

🏛️

Sefaria Comparison

Where we beat Sefaria · where they beat us · 9-stage analysis

26KB

🔧

Pipeline Documentation

Evidence-based · code-verified · every strategy documented

25KB

📖

Sefaria Research V2

Corrected facts · 775K users · 3.3M links · Daf Yomi ecosystem

17KB

🗂️

File Directory

Every file in annotation_tool_v2 with what it does

23KB

🕸️

Interlinking Architecture

6 centric exporters explained · which are active vs dormant

9KB

🧭

Enrichment Master Plan

The single pillar (source_chain) + 7-phase roadmap · Sefaria comparison · March 19 2026

13KB

🏛️ Architecture Decision Records NASA / Google / AWS standard · /docs/decisions/

001

SQLite over PostgreSQL

Zero config, nightly backup, sufficient for scale

002

Claude Code as Pipeline LLM

Zero API cost for /process and /process_video

003

LightRAG insert_custom_kg()

We control chunking + entity extraction quality

004

512–800 Token Keyword Grouping

Respects chapter boundaries, semantically coherent

005

OpenRouter + DeepSeek Fallback

Cost resilience when primary model unavailable

006

Docker COPY not Bind Mounts

Note: App dir IS bind-mounted for live reload

007

Industry standard · match_argument_by_embedding() · same pattern as topic_centric

📚 Technical Documentation /root/annotation_tool_v2/docs/

🗺️

GUIDE.md

Developer manual — start here

🏗️

ARCHITECTURE.md

System diagram + data flow

🎬

VIDEO_PIPELINE.md

Pipeline steps, states, recovery

⚔️

WAR_STORIES.md

NASA-style post-incident log

NASA style

🤖

MODEL_SELECTION.md

Per-step model routing guide

🗄️

DATABASE.md

Schema, migrations, backups

⚙️

PROCESS_COMMAND.md

/process command internals

🔌

API_REFERENCE.md

All endpoints documented

📝

CHANGELOG.md

v0.18.6 · rollback commands per version

🖥️ Infrastructure

🌍

Hetzner Cloud

Helsinki · 32GB RAM · 301GB disk · ~$20/mo · 147GB used (51%)

💾

Backup

Nightly → Storage Box BX11 · Weekly snapshots

🔒

SSL

Certbot · All subdomains · Auto-renew

🐳

Docker

annotation_tool_v2 · ragflow · lightrag · memgraph · livekit

📡

Telegram Bot

@junaid2bot · chat_id 1301565858 · file transfer + git commands

🔀

Nginx

Reverse proxy · 6 subdomains · SSL termination

🕌 CPS Platform — Knowledge Hub