May 16, 2026

Bridging the Language Gap Through Technology

In an unprecedented challenge to translate the 1446 AH (2025 CE) Arafah sermon into 40 living languages both audio and text in record time, a strategic partnership crystallized between Misraj, the Iqra Educational Endowment, and the General Presidency for the Affairs of the Grand Mosque and the Prophet's Mosque. Our goal was to deliver the first hybrid translation pipeline of its kind. The project fused advanced neural machine translation via multiple AI agents, an intelligent linguistic judge (LLM as a Judge) for automated evaluation, a tiered human arbitration system, and automated linkage to authorized translation databases of the meanings of the Holy Quran all operating through the Khitab platform's pipelines. The result was a comprehensive multimedia package in forty languages, with high religious accuracy and a 65% reduction in cost compared to traditional translation methodologies. This achievement sets a new standard for institutional da'wah translation.

The Challenge

The Arafah sermon represents the pinnacle of the annual Islamic da'wah discourse, delivered before millions of pilgrims and followed by more than two billion Muslims worldwide. In 2025, the sermon ran for 22 minutes and contained a uniquely complex linguistic fabric that posed extraordinary challenges for translation systems:

High-register sermon text: Rich in classical Arabic eloquence, dense metaphor, intricate grammatical structures, and precise historical and jurisprudential contexts.

Interweaving of Quranic verses, hadith, and prophetic supplications: Passages requiring accurate translations that cannot rely on word-for-word or phrase-by-phrase literalism.

High contextual sensitivity: Doctrinal and jurisprudential concepts (tawhid, halal and haram, the objectives of Islamic law) that cannot tolerate approximation or literal rendering.

The core problem the project faced: How within a window of mere hours after the sermon's delivery could one produce an accurate, real-time translation into 40 languages that preserves religious meanings, avoids semantic hallucination in Quranic verses, hadiths, and supplications, while making outputs available in both audio and text, and at a sustainable operational cost?

Traditional approaches whether pure human translation (requiring 40 specialist translators and weeks of work) or direct machine translation (producing catastrophic errors in religious texts) were individually incapable of meeting this challenge. From this need, the hybrid pipeline was born: a methodology combining the power of generative AI with the rigor of human reference arbitration.

Methodology: The Innovative Hybrid Translation Pipeline

The project's methodology rested on seven interlinked, sequentially timed phases, designed to form an integrated iterative loop of analysis, translation, evaluation, and refinement.

Phase 1: Structural Analysis and Semantic Chunking

In segmenting the Arabic sermon text approximately 12,400 words we applied an intelligent semantic chunking process governed by strict criteria:

Identifying complete units of meaning: The rhetorical structure of the sermon was analyzed introduction, doctrinal themes, social themes, supplications, conclusion using a specialized discourse analysis model. The text was divided into 187 chunks, each carrying a complete, self-contained meaning suitable as an independent translation unit, with an average length of 65 words per chunk.

Preserving horizontal context: Each chunk was linked to the broader context of its thematic section via contextual tags appended to the chunk, such as: [Theme: Tawhid], [Style: Encouragement], [Audience: General].

Technical significance: Large Language Models (LLMs) suffer from limitations in their effective context window and are susceptible to what is known as "Lost in the Middle" a phenomenon where the model's attention to text in the middle of lengthy inputs diminishes. Intelligent semantic chunking transformed the task from "translating a long document" to "translating 187 focused semantic units," reducing hallucination and context errors by an estimated 58% in our preliminary tests compared to directly translating the full text.

Phase 2: Isolation of Authoritative Religious Texts (Isolation & Authoritative Mapping)

This phase represented the methodological core of the innovation for ensuring religious accuracy, and it proceeded along two tracks:

Track 1 Automated Extraction: A text classifier, powered by a precise language model, was developed to scan all 187 chunks and identify those containing:

Quranic verses, through pattern-matching against Quranic structural forms.
Prophetic hadiths, by searching for chains of transmission, prophetic narration formulas, and comparing against a common hadith database.
Established prophetic supplications, with confirmed attribution to the Prophet or the pious predecessors.

The process isolated 31 chunks (16.6% of total chunks) containing texts of established religious authority.

Track 2 API-Based Authoritative Mapping: Rather than allowing AI models to translate these texts themselves which produces catastrophically literal translations in religious contexts a direct programmatic mapping mechanism (API Mapping) was implemented with:

Translations of the meanings of the Quran: Extracted Quranic verses with their surah and verse numbers identified automatically were linked to the database of the Tafsir Center for Quranic Studies (authorized translations in 40 languages) and the King Fahd Complex for the Printing of the Holy Quran, to retrieve the standardized, authorized translation of each complete verse in every target language.
Translations of prophetic hadiths: A similar mechanism was applied, linking to the multilingual "Encyclopedia of Prophetic Hadith" database, to retrieve the standardized translations of the nine hadiths cited in the sermon.

Why did we not rely on traditional Retrieval-Augmented Generation (RAG)? Because traditional RAG retrieves reference texts to "inspire" the model, but does not bind it to them. In our project, the mechanism was engineered so that the reference translation of a sacred text functions as a hard constraint injected into the prompt architecture as: "The following is the authorized translation of this verse/hadith; you must use it verbatim in its position and make no alterations."

Phase 3: Initial Translation by Multiple AI Agents (Multi-Agent LLM Translation)

To avoid the inherent bias of a single model and to maximize output quality, a competitive plurality strategy was applied through four initial translation agents working in parallel:

Agent	Competitive Advantage
Agent A	High contextual coherence, sophisticated metaphor handling
Agent B	Superior speed, long contextual memory (one-million-token window)
Agent C	Precision in constrained instructions, adherence to reference translations
Agent D	Specialization in low-resource languages

Operational mechanism: Each agent received the pre-segmented Arabic chunk along with:

The chunk's contextual tags.
The mandatory reference translation for any sacred text (where applicable).
Detailed system prompt instructions defining the role of the da'wah translator and the religious translation constraints.

Each agent produced an independent translation of the single chunk, yielding for each chunk four candidate translations in the target language.

Phase 4: Automated Evaluation "The Intelligent Linguistic Judge" (LLM as a Judge)

This phase constituted an advanced automated quality-control layer. A neutral language model was deployed in the role of "judge," competitively evaluating the four candidate translations for each chunk. A precise numerical evaluation rubric was developed comprising the following criteria with relative weightings:

Criterion	Weight	Operational Definition
Religious & Reference Accuracy	40%	Alignment with reference verse/hadith translations (where applicable) and accuracy in conveying religious concepts
Linguistic Fluency	25%	Grammatical and stylistic naturalness in the target language; avoiding awkwardness and borrowed structures
Contextual Alignment	20%	Coherence with the broader sermon context and the attached contextual tags
Handling of Metaphor & Rhetoric	15%	Conveying figurative and rhetorical meaning without distorting literal translation

Comparative mechanism: The judge model was supplied with:

The original Arabic chunk.
The mandatory reference translation (for verses/hadiths) this was precisely where the judge compared each agent's proposed translation against the reference translation retrieved automatically in Phase 2, detecting any deviations.
The four candidate translations (source-blinded to avoid bias).
The evaluation rubric.

The judge model issued a numerical score (0–100) for each translation, along with a brief explanatory report on its reasoning. The highest-scoring translation advanced to the next phase. In 8% of cases where scores converged (a margin of fewer than 3 points), both top translations were flagged for human review.

Phase 5: Human-in-the-Loop Refinement

This was the project's pivotal turning point for quality. An iterative refinement loop was designed that embedded specialized human expertise at the core of the technical pipeline.

Part 1 Strategic Sample Selection for Human Arbitration: An initial strategic sample of 35 chunks per language (18.7% of the total) was selected for preliminary evaluation by specialist translation reviewers, based on three objective criteria:

Chunks with low evaluation scores: any chunk scoring below 80/100.
Chunks with high inter-agent variance: where the four agents' translations differed substantially in terminology choices.
Novel or contemporaneous chunks: addressing current issues potentially underrepresented in model training data (e.g., "cybersecurity" and "digital currencies," both of which appeared in the 2025 sermon).

Part 2 Specialist Network and Qualitative Analysis: Through the strategic partnership with the Iqra Educational Endowment and the General Presidency for the Affairs of the Grand Mosque and the Prophet's Mosque, access was secured to a network of 87 linguistic and religious specialists covering all forty languages. Selected chunks were presented via a unified arbitration platform, where each specialist provided:

A numerical rating of the translation.
Categorized qualitative annotations: (alternative word choice, structural rephrasing, cultural sensitivity note, religious error).

Sample qualitative annotations received (examples):

In Swahili translation: "The proposed term 'utakatifu' for 'sanctification' carries Christian theological connotations; the more accurate term is 'kutakasa.'"
In Japanese translation: "The phrasing is too direct; it requires a higher keigo honorific register befitting the station of the sermon."

Part 3 Reverse Prompt Engineering and Few-Shot Calibration: This step represented the technical translation of specialist feedback into actionable automated improvements:

Converting annotations into few-shot examples: The most frequently recurring qualitative annotations (e.g., avoid terms with contrary religious connotations, elevate the register in Asian languages) were converted into pairs of "Arabic chunk / incorrect translation ← corrected translation" and integrated as few-shot examples into the system prompts.
Rerunning the iterative loop: Chunks that had received annotations were re-entered with the improved system prompts back into the translation agents, then the linguistic judge, and the new results were compared against the previous versions. This cycle produced an average improvement of 9.5 percentage points in evaluation scores.

Phase 6: Final Human Sign-off and Approval

After completing the refinement cycles, the fully translated text across all forty languages was submitted to a network of certified final reviewers (a minimum of two reviewers per language). This phase constituted a final certification review under a "green light" protocol:

Fast Track: For chunks scoring ≥ 90 from the automated judge and approved by the human reviewer without annotations. (These constituted 73% of all chunks.)
Review Track: For chunks receiving minor annotations (word correction, phrasing adjustment). Corrections were executed and resubmitted for approval in under one hour.
Redo Track: In only 3.2% of total chunks, a complete re-translation was requested with specific instructions, executed via the expedited technical pipeline.

Quantitative and Qualitative Results

The significance of measuring this project's impact lies as much in the clarity of the gap it closed as in the numbers it achieved. Results are therefore presented in a comparative framework that foregrounds the transformative value of the hybrid pipeline.

The Pre-Existing Gaps

Prior to this methodical technical intervention, the translation landscape for the Arafah sermon suffered from three interlocking structural gaps.

The first gap was the scarcity and fragmentation of translated da'wah content: the sermon was available in a limited 10–15 languages only, through scattered media channels and uncoordinated individual efforts, with no systematic audio availability accompanying the text.

The second gap was more severe: recurring catastrophic translation errors in Quranic verses and hadiths. Direct machine translation when used produced distorted, literal renderings of Quranic verses and prophetic hadiths, entirely disconnected from the standardized authorized translations reviewed by specialized scholarly bodies. This resulted in serious distortions of religious meanings at the very heart of the da'wah message.

The third gap was prohibitive cost and chronic slowness. Full human translation into 40 languages would have required by standard estimates a team of 40 to 80 specialist translators, a delivery timeline of 3 to 6 weeks, and a total cost exceeding $180,000 USD, rendering the project prior to the hybrid pipeline neither technically nor economically sustainable.

The Results

The hybrid pipeline overturned this equation entirely, producing quantitative and qualitative results that together constitute a landmark in institutional da'wah translation.

Quantitatively, the project delivered a comprehensive multimedia translation package spanning 40 languages in text and audio, comprising 7,480 text segments and an equal number of audio segments, with a total operational delivery time of no more than 18 hours from receipt of the approved Arabic text to final handover of the complete package. The hybrid pipeline enabled a 65% reduction in total cost compared to pure human translation placing large-scale da'wah translation within the bounds of financial sustainability for the first time.

In quality indicators, the average final evaluation score awarded by the intelligent linguistic judge across all forty languages reached 92.4 out of 100, while religious accuracy in sacred texts per the final human review reached 97.3%, compared to only 52.2% using direct machine translation without the reference mapping mechanism: a full improvement of 45 percentage points. These indicators were crowned by an overall satisfaction rate of 94% among the certified final reviewers from the Iqra Educational Endowment's network.

Conclusion and Key Lessons

This project demonstrated in practical and measurable terms that the strategic partnership between the Iqra Educational Endowment and the General Presidency for the Affairs of the Grand Mosque and the Prophet's Mosque (representing institutional da'wah depth, access to specialists, and referential credibility) and advanced technology (offering speed and scalability) can create a pioneering model that redefines the very concept of "institutional da'wah translation."

Five Core Lessons Learned

AI is an accelerator and effective enabler, not a replacement. Language models can accomplish 80% of the work with remarkable efficiency, but the remaining 20% representing religious accuracy and cultural appropriateness is what makes the difference between "acceptable machine translation" and "trustworthy da'wah translation." This critical fraction inevitably requires human reference arbitration.
The Intelligent Linguistic Judge is scalable quality assurance. Deploying LLM as a Judge provided an objective, consistent evaluation layer across all languages impossible to replicate humanly at this scale. Yet the judge's effectiveness is contingent on the precision of the evaluation rubric and the quality of the human feedback used to calibrate it.
Mandatory reference binding (not retrieval) is the key to religious safety. The difference between giving the model an authorized translation "for inspiration" (RAG) and binding it to that translation verbatim (Hard Constraint) is the difference between critical accuracy and catastrophic error in translating sacred texts.
The iterative "translate–evaluate–refine" loop is the true engine of quality. Improvement did not come from the first pass, but from iterative accumulation: each human-AI refinement cycle added 5–9 percentage points to output quality.
Institutional partnership multiplies impact. The Iqra Educational Endowment proved itself a strategic partner in enabling access to specialists and conferring referential credibility on the outputs ensuring the adoption and sustainability of the solution.

Stay up-to-date with the latest industry insights and updates on our work by visiting our blog

From Chatbot to Enterprise Agent: Why Arab Organizations Need a Context and Governance Layer Before Scaling AI

June 29, 2026

The Imperative of AI Governance: Evolving Regulations and the Rise of Agentic AI

June 14, 2026

Heraclitus, the Salmon And Why Customer Experience Is No Longer a Choice

From the Current to the Expectation: Why Customer Experience Has Shifted from an Operational Priorit…

May 20, 2026