# Interlinking Architecture — How Books Connect to Each Other
# Evidence: direct code reading of all 6 centric exporter files
# March 2026

---

## THE CORE IDEA (in plain words)

Every paragraph gets a unique ID + rich metadata.
During processing, we detect what that paragraph REFERENCES (Quran, Hadith, year, other book).
We also label what it IS ABOUT (topic/concept, argument, creation plan aspect).

The centric exporters are the INTERLINKING LAYER.
They answer: "Every paragraph across ALL 145 books that touches THIS thing — collect them together."
Result: one JSON file per unique "thing" — verse, hadith, topic, era, argument, creation plan aspect.
These JSON files are what the chatbot reads to answer cross-book questions.

---

## PIPELINE FLOW

```
DOCX upload
    ↓
Paragraph table (each para gets unique ID + order_index + section_title)
    ↓
Reference detection → Reference table
    ├── ref_type = 'quran'    → surah + ayah
    ├── ref_type = 'hadith'   → collection + number
    ├── ref_type = 'year'     → year + era
    └── ref_type = 'book'     → book_title + subtype
    ↓
Phase B enrichment → back onto Paragraph table
    ├── emotional_tags        (21 input emotions)
    ├── creation_plan         (14 aspects)
    ├── argument_mapping      (10 contemporary issues)
    ├── reasoning_flow        (logical structure)
    └── viral_score           (7 dimensions + best_quote)
    ↓
CENTRIC EXPORTERS → data/exports/ (JSON files, one per unique "thing")
    ├── verse_centric/        → one file per Quran verse (e.g. 2-255.json)
    ├── hadith_centric/       → one file per hadith (e.g. bukhari-6481.json)
    ├── topic_centric/        → one file per concept (e.g. patience.json)
    ├── timeline_centric/     → one file per era (e.g. early_islamic.json)
    ├── argument_centric/     → one file per issue (e.g. political_islam.json)
    └── creation_plan_centric/→ one file per aspect (e.g. test_of_life.json)
    ↓
CHATBOT RETRIEVAL — reads these JSON files
    Strategy 3 (topic_centric):   match query → topic JSON → paragraph_ids
    Strategy 5 (interlink hop):   initial paras → their refs → verse/hadith JSONs → more paragraph_ids
```

---

## THE 6 CENTRIC EXPORTERS — COMPREHENSIVE TABLE

| Exporter | File | Export Dir | Groups By | Source Table | What Each JSON Contains | Used By |
|----------|------|------------|-----------|--------------|------------------------|---------|
| **Verse-centric** | verse_centric_exporter.py | data/exports/verse_centric/ | (surah, ayah_start) | Reference (ref_type='quran') | Verse text + ALL paragraphs from ALL books that cite this verse + ALL video segments that mention it. Includes reasoning_flow per paragraph. | Chat Strategy 5 (interlink hop via verse refs) |
| **Hadith-centric** | hadith_centric_exporter.py | data/exports/hadith_centric/ | (collection, hadith_number) | Reference (ref_type='hadith') | Hadith Arabic text + English translation + sunnah_url + ALL paragraphs from ALL books citing this hadith + ALL video segments. Includes match_score. | Chat Strategy 5 (interlink hop via hadith refs) |
| **Topic-centric** | topic_centric_exporter.py | data/exports/topic_centric/ | Entity.name (CONCEPT type) | Entity (entity_type='CONCEPT') | Concept name + taxonomy category (from maulana_taxonomy.yaml) + ALL paragraphs where this concept appears + ALL video segments. Includes reasoning_flow + section_title per paragraph. | Chat Strategy 3 (semantic topic match → top 10 para_ids from JSON) |
| **Timeline-centric** | timeline_centric_exporter.py | data/exports/timeline_centric/ | Era (7 defined eras) | Reference (ref_type='year') | Era label + ALL year references across ALL books for that era + event_data JSON per ref + 2-paragraph context window before/after each reference. Groups: ancient → pre_islamic → early_islamic → medieval → colonial → modern → contemporary | Article pages (timeline_article.html). Not yet in chat retrieval. |
| **Argument-centric** | argument_centric_exporter.py | data/exports/argument_centric/ | 10 contemporary issues | Paragraph.argument_mapping (JSON field) | Issue name + description + ALL paragraphs mapped to this issue + argument_type per paragraph (refutation/reframing/evidence/analogy/historical_example/logical_reasoning/practical_guidance) + match_type + key_point + confidence | Article pages (argument_article.html). Not yet in chat retrieval. |
| **Creation plan-centric** | creation_plan_exporter.py | data/exports/creation_plan_centric/ | 14 creation plan aspects | Paragraph.creation_plan (JSON field) | Aspect name + description + ALL paragraphs that touch this aspect across ALL books + best_quote + reasoning_flow | Article pages (creation_plan_article.html). Not yet in chat retrieval. |

---

## THE 10 CONTEMPORARY ISSUES (argument_centric)

| Issue Key | Display Name | What It Captures |
|-----------|-------------|-----------------|
| political_islam | Political Islam | Islam as spiritual path vs political ideology |
| terrorism_violence | Terrorism & Violence | Peace-based responses to extremism |
| gender_equality | Gender & Equality | Women's roles, rights, spiritual equality |
| interfaith_relations | Interfaith Relations | Dialogue and coexistence between religions |
| modernity_tradition | Modernity & Tradition | Reconciling faith with modern life |
| meaning_purpose | Meaning & Purpose | Finding life's meaning through spiritual awareness |
| peace_conflict | Peace & Conflict | Peaceful resolution, non-violent activism |
| freedom_speech | Freedom of Speech | Expression, blasphemy, intellectual freedom |
| science_religion | Science & Religion | Harmony between scientific inquiry and faith |
| suffering_evil | Suffering & Evil | Why suffering exists, how faith responds |

---

## THE 7 HISTORICAL ERAS (timeline_centric)

| Era Key | Label | Year Range |
|---------|-------|-----------|
| ancient | Ancient | before 570 CE |
| pre_islamic | Pre-Islamic Arabia | 570–622 CE |
| early_islamic | Early Islamic | 622–750 CE |
| medieval | Medieval | 750–1500 CE |
| colonial | Colonial Era | 1500–1800 CE |
| modern | Modern Era | 1800–1950 CE |
| contemporary | Contemporary | 1950–present |

---

## THE 14 CREATION PLAN ASPECTS (creation_plan_centric)

Defined in paragraph_enricher.py as CREATION_PLAN_ASPECTS list:

| Aspect Key | Meaning |
|-----------|---------|
| purpose_of_life | Why humans exist |
| this_world_is_a_test | Life as a test of character |
| man_created_for_paradise | This world is not the destination |
| free_will_and_accountability | Freedom to choose + consequences |
| positive_response_to_negativity | Maulana's core strategy: respond constructively |
| nature_as_sign_of_god | Creation points to the Creator |
| patience_as_strategy | Sabr as an active intelligent response |
| dawah_over_politics | Spiritual mission over political action |
| discovering_god_through_creation | Intellectual journey to faith |
| hereafter_as_motivation | Eternal reward as driver of action |
| law_of_nature_cause_effect | God works through natural laws |
| opportunity_greater_than_problem | Reframing difficulties |
| intellectual_development_through_adversity | Hardship builds wisdom |
| destiny_vs_circumstances | What is fixed vs what we control |

---

## WHICH ARE ACTIVE IN CHATBOT vs ARTICLE PAGES ONLY

| Centric Type | In Chat Retrieval? | In Article Pages? | Notes |
|-------------|-------------------|------------------|-------|
| verse_centric | YES — Strategy 5 | YES — verse_article.html | Core cross-book connection via shared Quran verse |
| hadith_centric | YES — Strategy 5 | YES — hadith_article.html | Core cross-book connection via shared hadith |
| topic_centric | YES — Strategy 3 | YES — topic_article.html | Most-used in chat. 143 topics. Loaded by slug. |
| timeline_centric | NOT YET | YES — timeline_article.html | Could be added to chat for historical questions |
| argument_centric | NOT YET | YES — argument_article.html | Could be added to chat for "what does Islam say about X" |
| creation_plan_centric | NOT YET | YES — creation_plan_article.html | Could be added to chat for theological questions |

---

## WHAT MAKES THIS ARCHITECTURE POWERFUL

Paragraph P123 in book "Patience and Positive Thinking" cites Quran 2:153.
Paragraph P456 in book "God Arises" also cites Quran 2:153.
These two paragraphs are from DIFFERENT books, written years apart.
But both appear in verse_centric/2-153.json.

When a user asks about Quran 2:153, Strategy 4 finds P123 directly.
Strategy 5 then hops via verse_centric/2-153.json and finds P456.
Result: the chatbot answers by drawing from BOTH books simultaneously.

This is the interlinking policy — not a link between books, but a link through shared content.

Same logic for hadith: "Which books discuss Bukhari 6481?" → hadith_centric/bukhari-6481.json answers instantly.
Same for topics: "All paragraphs about tawakkul across 145 books" → topic_centric/tawakkul.json.

---

## GAP: 3 CENTRIC TYPES NOT YET IN CHATBOT

timeline_centric, argument_centric, creation_plan_centric are built and exported
but NOT fed into any chat retrieval strategy.

Adding them would enable:
- "What does Maulana say about terrorism?" → argument_centric/terrorism_violence.json → instant answer
- "What happened in early Islamic era across all books?" → timeline_centric/early_islamic.json
- "Where does Maulana discuss patience as strategy?" → creation_plan_centric/patience_as_strategy.json

These are the 3 richest untapped retrieval signals in the system.

---

Evidence: all 6 centric exporter files read directly, March 2026
