WorldCat Discovery release notes, Arabic language search and sort
Release Date: November 2023
Introduction
The following release notes are for Arabic language searching and sorting support in WorldCat Discovery, completed NOVEMBER 2023
WorldCat Discovery now includes the following enhancements to searching and sorting for the Modern Standard Arabic language that a native speaker expects:
Zoeken
- We normalize terms so they are treated the same with and without diacritics, definite articles, prefixes and kashida.
- We maintain lists of protected words (that are not normalized), stop words and word stems (for automatic, default matching on variant forms of a word).
Sort
- We remove leading articles and normalize hamza (especially with letter “ا “) before applying Unicode collation.
These Arabic language searching and sorting improvements complement WorldCat Discovery’s Arabic user interface, introduced in May 2021.
Modern Standard Arabic search and sort features
Zoeken
Your library’s users searching for words or phrases in Modern Standard Arabic now get search results that meet the expectations of native Arabic speakers. These include:
Diacritics
WorldCat Discovery returns the same results regardless of whether users enter search terms with or without diacritics such as hamza “ء” or madda “آ”. For example, we treat the following as equivalent:
- “ى” and ”ي”,
- “ه” and ” ة ”
- “آ” “ا” “إ” “أ”
Definite articles and prefixes
For title searches, results are the same with or without preceding definite articles or prefixes. For example:
- Definite articles are dropped: “ال“
- Prefixes are ignored: “ب” “ك” “ف” “لل”
Kashida
WorldCat Discovery treats characters the same whether elongated or not. For example:
- With kashida: “مـديـنــــــــــــــــة “
- Without kashida: “مدينة”
The following table provides further examples:
Arabic Query Input |
Comments |
Meaning in English |
Expected Results |
---|---|---|---|
انسان |
word without diacritics and leading article |
human |
4 queries with same output |
إنسان |
word with diacritics and no leading article |
human |
|
الانسان |
word without diacritics and with leading article |
The human |
|
الإنسان |
word with diacritics and leading article |
The human |
|
|
|
|
|
مدينة |
Same word meaning written in 3 variations |
City |
3 queries with same output |
مَدِينَة |
City |
||
مدينه |
City |
||
|
|
|
|
مـديـنــــــــــــــــة |
With and without the "Kashida" |
City |
2 queries with same output |
مدينة |
City |
||
|
|
|
|
آنسة |
With and without "Madda" on the first letter A |
Ms. |
2 queries with same output |
انسة |
Ms. |
||
|
|
|
|
المدينة |
Word with leading article |
The city |
2 queries with same output |
والمدينة |
Word with leading article + and "و" |
And the city |
|
|
|
|
|
بالمدينة |
3 different variations with prefixes |
In the city |
3 queries with same output |
فالمدينة |
Then the city |
||
للمدينة |
For the city |
Protected words
OCLC maintains a list of Modern Standard Arabic words that are protected from the above normalization processes and are preserved unchanged. Normalization, such as removing definite articles or prefixes, would either change the meaning of these words or render them meaningless.
Example:
- المانيا (Germany) Without the leading article, the word مانيا has no meaning.
Stop words
OCLC maintains a list of Modern Standard Arabic stop words that WorldCat Discovery ignores for search matching because they occur so commonly across records that they do not help users select or distinguish between records.
Stemming
OCLC maintains a list of Modern Standard Arabic word stems that assist with automatic, default matching on variant forms of a word.
Sort
Alphabetic sorting of Modern Standard Arabic search results now meets native Arabic speakers’ expectations.
Alphabetic sorting is used on WorldCat Discovery search results for:
Sort
- Author (A-Z)
- Title (A-Z)
Facet
- Author/Creator
Alphanumeric sorting of call numbers is used on the item details page for:
- Browse the Shelf
Before applying sorting to Modern Standard Arabic, we remove leading articles and normalize hamza, especially with letter “ا “. We then sort Modern Standard Arabic author, title, and call number fields using the default collation order of the Unicode Collation Algorithm that we apply for all scripts and languages.
Title (A-Z) sorting example
Searching for the term المقالات (the articles) in the title ti: index |
||
---|---|---|
Record no. |
Title |
|
1 |
الاجتماعي وعالمه الممزق: مقالات في فلسفة اجتماعية |
Sorting by dropping the leading article and the hamza |
2 |
الاجتماعي وعالمه الممزق: مقالات في فلسفة اجتماعية . |
|
3 |
أدب الحياة |
|
4 |
الأدبيات العصرية في سبيل التاج: مقالات في الأدب والثقافة والحياة |
|
5 |
استنطاق النص: مقالات في السرد العربي |
|
6 |
الإسلام والغرب: مقالات ودراسات مختارة |
|
7 |
التنبيهات والحقيقة مقالات إضافية حول الفلسفة والديموقراطيّة |
|
8 |
التنبيهات والحقيقة مقالات إضافية حول الفلسفة والديموقراطيّة |
|
9 |
التنبيهات والحقيقة مقالات إضافية حول الفلسفة والديموقراطيّة |
|
10 |
حاء: مقالات |
|
11 |
حماة الوطن: مقالات مختارة 2002 |
|
12 |
الخواص الأسلوبية في مقالات أحمد بهاء الدين |
|
13 |
عرض كتاب: التنبيهات والحقيقة؛ مقالات إضافية حول الفلسفة و الديمقراطية |
|
14 |
العلاقات الدولية وجائحة كورونا: قصة قصيرة وأربع مقالات |
|
15 |
فن النثر الحديث: تحليل مقالات وقصص قصيرة |
Important links
Product website
More product information can be found here.
Office Hours
Support websites
Support information for this product and related products can be found at:
- WorldCat Discovery support resources
- WorldCat Discovery training
- Informatie over releases
- OCLC customer support
- Browser compatibility chart
If you have additional questions, please contact OCLC Customer Service by calling 1-800-848-5800 or 1-614-793-8682 Monday – Friday 7 a.m. – 9p.m. ET, or email support@oclc.org. For support enquiries in the UK and Ireland, please contact the Support Desk by calling +44-(0)114-281 60 42 or e-mailing support-uk@oclc.org. Support is available between the hours of 09:00 and 17:30 (UK Time).
Include Request ID with problem reports
When reporting an issue with WorldCat Discovery, it is extremely helpful to include the Request ID. The Request ID is found at the bottom of the screen on which the issue occurred. Including this information allows us to directly trace what happened on the request we are troubleshooting.