Use Tamil script data for cataloging items in the Tamil language. Use Tamil script data the same way you use other non-Latin script data in the client.
See Work with international records and Guidelines for contributing non-Latin script bibliographic records to WorldCat for details specific to non- Latin scripts. See also general procedures describing how to:
Caution: MARC-8 character verification (Edit > MARC-8 Characters > Verify) is not appropriate for verifying Tamil characters. There is no MARC-8 character set for Tamil. Using this command for Tamil results in marking all Tamil characters as invalid. The OCLC system validates Tamil characters when you validate a record.
Because Tamil script is not included in MARC-8 character sets, you must export and import records in the UTF-8 Unicode character set (settings for export are in Tools > Options > Export, click Record Characteristics, and settings for import are in File > Import Records, click Record Characteristics). If you export or import using the MARC-8 character set, non-MARC-8 characters are retained in Numeric Character Reference (NCR) notation only.
Unicode is the universal character encoding scheme for written characters and text. It defines a consistent way of encoding multi-script text that enables the exchange of text data internationally.
Unicode provides for three encoding forms: a 32-bit form (UTF-32), a 16-bit form (UTF-16), and an 8-bit form (UTF-8, designed for use with ASCII-based systems).
Connexion client began supporting Tamil script with Unicode version 4.0.0.
If your system default language is not Tamil, you can install the Tamil language in Windows. When you install Tamil, Windows provides an input keyboard for entering Tamil script. See more about input methods for languages that use non-Latin scripts.
Tamil characters are defined in Unicode 4.0 (coded in the range U+0B80 to U+0BFF).
Unicode 4.0 coding for Tamil characters does not include most glyphs that are formed when consonants alone (implicit "a" suppressed) are combined with independent vowels.
Caution: Microsoft Arial Unicode MS font, which is recommended for general use in the client, does not support the Tamil digits, numerics, and symbols that are coded in Unicode 4.0.
The client adds the following data to ‡c of field 066 in Tamil records to indicate the presence of Tamil characters:
See the ALA-LC Romanization Table for Tamil on the Library of Congress website.
See general procedures and search techniques for searching WorldCat.
Tamil independent vowels are characters that stand on their own. Each has a unique Unicode code value.
Tamil consonants contain an implicit "a" vowel sound. A modifier called Virama (or Pulli) added to the consonant glyph represents the consonant alone with no vowel. If the consonant alone is combined with an independent vowel (not implicit "a"), the vowel becomes dependent and the visual form changes.
The OCLC system indexes each Tamil consonant separately in all three of the following forms based on text:
Note: Tamil Unicode 4.0 codes are not in collating order. The default sort order for search results, alphabetical sorting by Latin script, is recommended if romanized (Latin-equivalent) data is included in the record. The sort order option is in Tools > Options > International or <Alt><T><O>).