About · Provenance

About the hadith data

Every hadith you read here arrives through a chain — from the Prophet ﷺ down through narrators, scholars, editors, and finally the digital corpora we ingest. This page is the short version of that chain.

Corpus: 16 readable collections (5 with per-scholar gradings) · 48,826 hadiths · 65,939 per-scholar rulings across 9 muhaddithūn.

Sources & licences

We do not maintain our own manuscript archive. The corpus is assembled from three external sources, each carrying its own licence (or, in one case, its own licence gap) and its own strengths.

Matn & translation

LK Hadith Corpus
Altammami, Atwell & Alsalka · Leeds University + King Saud University

Arabic matn, English translation, chapter / section titles, and the per-hadith legacy editorial grade (the "Sunnah.com" chip you see on some readers) all come from the LK Hadith Corpus — the Leeds + King Saud (LK) Arabic-English Parallel Corpus of Authentic Hadith assembled by Altammami et al. at the University of Leeds. Bukhari is manually annotated; the other collections are automatically annotated from the underlying source material.

LK does not explicitly identify an upstream source, but its text and editorial grades align exactly with sunnah.com's rendering of the same hadiths — sunnah.com is the almost-certain upstream, though LK itself carries the academic provenance we rely on.

Cite as: Altammami, S., Atwell, E., and Alsalka, A. The Arabic–English Parallel Corpus of Authentic Hadith. IJASAT / IMAN 2019 (Dec 27–28, 2019). The repository carries no explicit licence; we use it under the LK authors' stated citation request and academic exchange norms, and will update this page if the maintainers publish a formal licence.

Per-scholar gradings

fawazahmed0/hadith-api
The Unlicense (public domain)

The open-data hadith mirror published by fawazahmed0/hadith-api. It carries Arabic matns byte-identical to sunnah.com (our sample: 48 of 50 matched exactly, with the remaining two differing only in trailing whitespace or quote marks) plus a per-hadith grade decision from each of the 9 scholars listed below.

The grade rulings themselves cite specific scanned editions hosted on al-maktaba.org — each (scholar × collection) pairing is anchored to a book ID, surfaced in the app as the source work on every grading row.

Canonical numbering (Muslim) & sharḥ

OpenITI
CC BY-SA 4.0 (per-text — see openiti.org)

An independent digital corpus of Arabic Islamic texts. We use it for two things. First, as the source of Sahih Muslim's canonical hadith numbering: the pre-clean branch of OpenITI/0275AH preserves Muḥammad Fuʾād ʿAbd al-Bāqī's edition numbering as parenthesised markers; the cleaned master branch stripped them. Aligning each LK matn against this source gives us the print edition's number for almost every hadith — sample-verified against sunnah.com, with two minor outliers tracking a known numbering disagreement between two ʿAbd al-Bāqī printings (the muqaddima and the late Book of Qadar). Second, as the source of two classical commentaries ingested as the reader's sharḥ rail: al-Nawawī's al-Minhāj on Ṣaḥīḥ Muslim and Ibn Ḥajar al-ʿAsqalānī's Fatḥ al-Bārī on Ṣaḥīḥ al-Bukhārī.

It is also used as a second lineage for cross-checking fawazahmed0 matns — the original B1.0a use. Licences vary per text in the corpus; see openiti.org for each text's terms.

Collections & editions

Every hadith collection available on TalibNotes, graded collections first. "With gradings" is the number of hadiths in that collection that carry at least one per-scholar ruling. Ṣaḥīḥ al-Bukhārī and Ṣaḥīḥ Muslim are present for reading but have no per-scholar column — the compilers' sahih screening is universally accepted, so scholarly grade lists for these two collections are not maintained in the fawazahmed0 corpus.

CollectionHadithsWith gradings
Sunan an-Nasa'iسنن النسائي5,7045,625
Sunan Abi Dawudسنن أبي داود5,2835,084
Sunan Ibn Majahسنن ابن ماجه4,4134,287
Jami' at-Tirmidhiجامع الترمذي4,2203,774
Muwatta Malikموطأ مالك1,9851,779
Sahih al-Bukhariصحيح البخاري7,669
Sahih Muslimصحيح مسلم7,527
Mishkat al-Masabihمشكاة المصابيح4,427
Sunan al-Darimiسنن الدارمي2,757
Bulugh al-Maramبلوغ المرام1,767
Al-Adab Al-Mufradالأدب المفرد1,326
Riyad as-Salihinرياض الصالحين1,217
Shama'il at-Tirmidhiالشمائل المحمدية401
An-Nawawi's Forty + Ibn Rajab's Ziyādātالأربعون النووية وزيادات ابن رجب50
Forty Hadith Qudsiالأحاديث القدسية40
Shah Waliullah's Fortyأربعون شاه ولي الله40

Al-Nawawī's Forty, the Qudsī Forty, and Shāh Walīullāh's Forty are readable but deliberately excluded from per-scholar grading ingest — matn-based mapping adds no signal for short canonical compilations already universally memorised, so they carry "—" in the gradings column.

Grading scholars (muhaddithūn)

The 9 scholars whose rulings fawazahmed0 ingested. Each ruling is attributed to a specific published work so different editions by the same scholar do not collapse into each other.

  • Zubair ʿAlī Zaʾī· 1957–2013 CE

    زبير علي زئي

    18,649 rulings

    Primary works: Tahqiqi Sunan an-Nasa'i (Zubair Ali Zai) (5,580); Tahqiqi Sunan Abi Dawud (Zubair Ali Zai) (5,041); Tahqiqi Sunan Ibn Maja (Zubair Ali Zai) (4,275)

  • Muḥammad Nāṣir al-Dīn al-Albānī· 1914–1999 CE

    محمد ناصر الدين الألباني

    18,536 rulings

    Primary works: Sahih/Daif Sunan an-Nasa'i (Albani) (5,596); Sahih/Daif Sunan Abi Dawud (Albani) (5,063); Sahih/Daif Sunan Ibn Maja (Albani) (4,254)

  • Shuʿayb al-Arnaʾūṭ· 1928–2016 CE

    شعيب الأرناؤوط

    6,254 rulings

    Primary works: Tahqiq Sunan Ibn Maja (Arnaut) (3,206); Tahqiq Sunan Abi Dawud (Arnaut) (3,048)

  • ʿAbd al-Fattāḥ Abū Ghuddah· 1917–1997 CE

    عبد الفتاح أبو غدة

    5,570 rulings

    Primary works: Sunan an-Nasa'i (Abu Ghuddah ed.) (5,570)

  • Muḥammad Muḥyī al-Dīn ʿAbd al-Ḥamīd· 1900–1972 CE

    محمد محيي الدين عبد الحميد

    4,984 rulings

    Primary works: Sunan Abi Dawud (Muhyi al-Din Abdul Hamid ed.) (4,984)

  • Muḥammad Fuʾād ʿAbd al-Bāqī· 1882–1968 CE

    محمد فؤاد عبد الباقي

    4,260 rulings

    Primary works: Sunan Ibn Maja (Fuad Abd al-Baqi ed.) (4,260)

  • Aḥmad Muḥammad Shākir· 1892–1958 CE

    أحمد محمد شاكر

    3,581 rulings

    Primary works: Tahqiq Sunan al-Tirmidhi (Ahmad Shakir) (3,581)

  • Bashshār ʿAwwād Maʿrūf· b. 1940 CE

    بشار عواد معروف

    2,326 rulings

    Primary works: tirmidhi (Bashar Awad Maarouf — unspecified source) (2,326)

  • Salīm al-Hilālī· b. 1957 CE

    سليم الهلالي

    1,779 rulings

    Primary works: Muwatta Malik (Salim al-Hilali ed.) (1,779)

About the hadith numbering

Two equally first-class numbering systems, switchable from settings.

Every hadith you read here carries two reference numbers, both of which point to the same Arabic matn. Neither is “the” number; they answer different questions, so the reader lets you choose which one rides on top. The print-edition canonical number (ʿAbd al-Bāqī for the Sahihayn, Ibn Mājah and the Muwaṭṭaʾ; Bashshār ʿAwwād for al-Tirmidhī; Muḥyī al-Dīn for Abū Dāwūd; Abū Ghuddah for al-Nasāʾī) is the default primary badge across the site — the same number you will find cited in scholarly works and on sunnah.com. The LK Hadith Corpus number is the alternate view. Use Settings → Numbering in the reader's settings drawer to switch between the two modes.

Print-edition number

The number readers see cited externally

The hadith number from each collection's widely-cited print edition. This is the number you will find quoted in scholarly works, sharḥ texts, classroom handouts, and on sunnah.com (which follows the same print editions and serves the same numbers).

Per-collection editions: Muḥammad Fuʾād ʿAbd al-Bāqī for al-Bukhārī (Dār Ṭūq al-Najāh), Muslim (Dār Iḥyāʾ al-Turāth al-ʿArabī), Ibn Mājah, and the Muwaṭṭaʾ; Bashshār ʿAwwād Maʿrūf (Dār al-Gharb al-Islāmī) for al-Tirmidhī; Muḥammad Muḥyī al-Dīn ʿAbd al-Ḥamīd (Maktabat al-ʿAṣriyya) for Abū Dāwūd; ʿAbd al-Fattāḥ Abū Ghudda for al-Nasāʾī (al-Mujtabā).

LK Hadith Corpus number

The academic citation we ship under

The position of the hadith inside the Leeds + King Saud (LK) academic corpus — the dataset Altammami, Atwell & Alsalka assembled and that we cite as the immediate source of every matn and translation on this site (Altammami et al., IJASAT / IMAN 2019). Stable across editions; useful when you want the same numbering scheme our underlying tooling uses for grading, sharḥ alignment, and search.

The two numbers agree in some collections and diverge in others. They line up on ≈ 68 % of Ṣaḥīḥ al-Bukhārī hadiths, ≈ 44 % of Sunan Ibn Mājah, and as little as 3 % of Ṣaḥīḥ Muslim. When they differ, the reader shows the primary number per the active mode and the other as a muted subtitle so you always have both in view.

Provenance. For Sahih Muslim, the print number is sourced from the OpenITI/0275AH pre-clean branch (a digitisation of ʿAbd al-Bāqī's Dār Iḥyāʾ al-Turāth edition; CC BY-SA), which preserves the print's parenthesised hadith numbering. For the other 8 mapped collections, the print number is derived by pairing each LK matn against fawazahmed0/hadith-api — which mirrors the print editions — using the matn-based Jaccard mapping described in the methodology section. We are migrating each collection in turn to a direct OpenITI re-derivation against its own print edition.

Deep links. /hadith/<slug>/r/<number> resolves the print-edition number (e.g. /hadith/tirmidhi/r/3386). Anchors in shared links stay stable regardless of which numbering mode you are viewing in. For the ~1.1 % of print numbers that don't resolve to a unique LK row, the reader falls back to the print-edition matn itself; those rows carry an attribution badge and don't show grades or sanad breakdowns (those features depend on LK-corpus tooling).

LK numbering is unavailable for seven supplementary compilations (Sunan al-Dārimī, Shamāʾil, Riyāḍ al-Ṣāliḥīn, Bulūgh al-Marām, al-Adab al-Mufrad, Mishkāt, and Shāh Walīullāh's Forty) where no external print-edition mapping has been ingested — those collections show the LK number directly, with no toggle.

How to cite

For classroom handouts, footnotes, and papers. Cite the print edition so a reader with the physical book can follow you; mention TalibNotes only where the secondary aggregation matters.

Short reference Ṣaḥīḥ Muslim 1 (ʿAbd al-Bāqī ed.). The collection name plus the print-edition number plus a short editor tag is what matches print and what a reader can look up on sunnah.com.

Full citation Muslim, Ṣaḥīḥ Muslim, no. 1 (Muḥammad Fuʾād ʿAbd al-Bāqī ed., Dār Iḥyāʾ al-Turāth al-ʿArabī, Beirut), via TalibNotes: talibnotes.com/hadith/muslim/1/1.

Academic citation (LK) — when citing the dataset rather than the hadith, use the LK paper: Altammami, Atwell & Alsalka, The Arabic–English Parallel Corpus of Authentic Hadith, IJASAT / IMAN 2019. The LK number is the corpus-internal identifier you will want to reference in that context.

Arabic text note. Matns are minimally vocalized throughout the corpus; full tashkīl is not ingested. Ṣaḥīḥ al-Bukhārī is the one collection whose text was manually annotated by the LK team — the rest are automatically segmented and should be cross-checked against a print copy for anything high-stakes.

Methodology

How our hadiths line up with upstream sources, how we decide when a match is real, and what we do when it isn't.

Matn-based matching. We do not assume fawazahmed0's per-collection hadith number lines up with our own. Numbering conventions drift between editions, and an off-by-one in one chapter can silently misattribute every ruling from that chapter onwards. Instead we compare the normalised Arabic matn of each hadith token-by-token (Jaccard similarity after tashkīl strip, stopword removal, and quote normalisation) and require both a similarity threshold and a gap to the second-best candidate before we accept a pairing.

Independent cross-check. Once the pairing set is built, we re-validate a random slice against the same hadith in OpenITI — a corpus with independent editorial lineage — so a systematic drift in our primary sources would not pass silently.

QA pipeline. Five progressive automated passes over the ambiguous bucket (edit-distance on tokens, stopword-adjusted overlap, normalised SHA comparison, length parity, and a final human-curated allow-list), followed by a manual review of remaining edge cases before any row is ingested. Acceptance gates per collection: unique + ambiguous ≥ 60%, no_match ≤ 40%, ambiguous ≤ 5%. Every collection surfaced on this page cleared all three gates before any row was ingested.

Full write-up and per-collection QA sheets live in the repo under docs/hadith-enrichment/.

What this page doesn't cover (yet)

  • Rijāl biographies. Narrator-level lookups are not part of the corpus. No current plans to add them.
  • Sharḥ (classical commentary). Two sharḥ rails ship today, both keyed to the ʿAbd al-Bāqī canonical numbering: al-Nawawī's al-Minhāj on Ṣaḥīḥ Muslim and Ibn Ḥajar al-ʿAsqalānī's Fatḥ al-Bārī on Ṣaḥīḥ al-Bukhārī. Other sharḥ source works (ʿUmdat al-Qārī, Tuḥfat al-Aḥwadhī, ʿAwn al-Maʿbūd, the Sindī ḥāshiyas, etc.) are planned as follow-up passes.
  • Takhrīj clusters. Cross-references ("also narrated by...") are stashed in the grading metadata but not yet surfaced as a browseable graph. Also planned.
  • Per-ruling attributions. Sunnah.com's parentheticals like "(Al-Albānī)" and "(Darussalam)" were dropped by their CSV export. We point readers to the per-scholar chips instead — but we may recover these parentheticals later to tighten the legacy "Sunnah.com" chip's provenance.

Typography

The hadith reader ships two Arabic typefaces, switchable from the page-settings drawer.

  • KFGQPC Uthman Taha Naskh — the default. The King Fahd Glorious Qur'ān Printing Complex's Naskh cut. Used across the reader for matn and sharḥ body.
  • Kitab — by Khaled Hosny / The Katib Project Authors, derived from SIL Scheherazade. Distributed under the SIL Open Font License. A more classical Naskh tuned for running prose; we ship it from nuqayah/kitab-font under its OFL terms; the licence text lives at /fonts/hadith/Kitab-OFL.txt.

Corrections

Spot a misattribution, a broken pairing, a typo in a grade label, or a source we should cite differently? Please tell us. Good corrections help everyone who reads this corpus after you.

Open the contact form →

Last updated .