This page is part of the book SQL-99 Complete, Really, by Peter Gulutzan & Trudy Pelzer. The authors have graciously allowed us to reproduce the contents of the book here. Because the book is about the SQL-99 standard, the contents of this and other pages in the book may not directly apply to MariaDB. Use the navigation bar to navigate the book.

A Schema may contain zero or more Collations. An SQL Collation is a set of rules that determines the result when characters from a Character set are compared. Collations are dependent on some Schema the <Collation name> must be unique within the Schema the Collation belongs to. User-defined Collations are created and dropped using standard SQL statements.

In SQL, a Collation may be a Collation defined by a national or international standard, by your DBMS or by a user of SQL-data.

Standard-defined Collations are collating sequences predefined for a character repertoire by some standards body. The SQL Standard requires a DBMS to provide a default Collation (based on the character repertoire order) for each of the standard-defined Character sets it supports. In each case, the default Collation has the PAD SPACE characteristic.

Implementation-defined Collations are collating sequences predefined for a character repertoire by your DBMS. These Collations may have either the PAD SPACE characteristic or the NO PAD characteristic. The SQL Standard requires a DBMS to provide a default Collation, called SQL_TEXT, for the SQL_TEXT Character set.

[NON-PORTABLE] The complete set of predefined Collations provided by a DBMS is non-standard because the SQL Standard allows implementors to include support for other Collations, in addition to the required ones. It also requires implementors (a) to define the names for the standard-defined and (except for SQL_TEXT) the implementation-defined Collations it provides and (b) to define the PAD characteristic for each implementation-defined Collations it provides. [OCELOT Implementation] The OCELOT DBMS that comes with this book provides a SQL_TEXT Collation that follows the order of the Unicode Form-of-use codes. The SQL_TEXT Collation has the PAD SPACE characteristic and is the default Collation for both the SQL_TEXT Character set and the UNICODE Character set. It also provides eleven other predefined Collations, all with the PAD SPACE characteristic. They are:

SQL_CHARACTERdefault Collation for Character set SQL_CHARACTER
ASCII_GRAPHICdefault Collation for Character set GRAPHIC_IRV
LATIN1default Collation for Character set LATIN1
ASCII_FULLdefault Collation for Character set ISO8BIT
CODEPAGE_0437default Collation for Character set CODEPAGE_0437
Collation CODEPAGE_0850default Collation for Character set CODEPAGE_0850
Collation CODEPAGE_1250default Collation for Character set CODEPAGE_1250
Collation CODEPAGE_1251default Collation for Character set CODEPAGE_1251
Collation CODEPAGE_1253default Collation for Character set CODEPAGE_1253
Collation CODEPAGE_1254default Collation for Character set CODEPAGE_1254
Collation CODEPAGE_1257default Collation for Character set CODEPAGE_1257

The OCELOT DBMS allows pre-defined Collations to apply for any Character set (this is SQL-92; it's no longer legal in SQL3). The pre-defined Collations that apply for all Character sets are: ALBANIA, CROATIAN, CZECH, DUTCH, ESTONIAN, GERMAN, HUNGARIAN, LATVIAN, LITHUANIAN, NORDIC, POLISH, ROMANIAN, SLOVAK, SLOVENIAN, SPANISH, TURKISH, UKRAINIAN, WELSH, and CASE_INSENSITIVE. For other European languages there is no need for a special Collation the repertoire order is sufficient.

The pre-defined Collations provided by your DBMS belong to INFORMATION_SCHEMA. The SQL special grantee, PUBLIC, always has a USAGE Privilege on every predefined Collation provided by your DBMS.

A Collation is defined by a descriptor that contains three pieces of information:

  1. The <Collation name>, qualified by the <Schema name> of the Schema it belongs to.
  2. The name of the Character set on which the Collation operates.
  3. Whether the NO PAD or the PAD SPACE characteristic applies to the Collation.

User-defined Collations may belong to any Schema owned by the creator. To create a Collation, use the CREATE COLLATION statement (either as a stand-alone SQL statement or within a CREATE SCHEMA statement). CREATE COLLATION specifies the enclosing Schema, names the Collation and defines the Collation's Character set and PAD characteristic. To destroy a Collation, use the DROP COLLATION statement. To change an existing Collation, drop and then redefine it.

There is a one-to-many association between Character sets and Collations: one Character set can have many possible Collations defined for it, although only one can be its default Collation.

The default Collation for a Character set is the Collation that will be used to compare characters belonging to that Character set in the absence of an explicit specification to the contrary and can either be a Collation defined for the Character set or the Form-of-use encoding scheme for that Character set's repertoire that is, the default Collation for a Character set can be defined to be the order of the characters in the character repertoire. (For example, in the 7-bit ASCII character set, the decimal code for the letter A is 65 and the decimal code for the letter B is 66. This is a happy coincidence; it allows your DBMS to discover that 'A' is less than 'B' by merely executing a CMP: a machine-code numeric comparison. And that's what is meant by "the order of the characters in the repertoire". Note that, since the decimal code for the letter a is 97, it follows that 'A' is less than 'a' that is, character repertoire order specifies a case-sensitive collating sequence.)

Collation names

A <Collation name> identifies a Collation. The required syntax for a <Collation name> is:

<Collation name> ::=
[ <Schema name>. ] unqualified name

A <Collation name> is a <regular identifier> or a <delimited identifier> that is unique (for all Collations) within the Schema it belongs to. The <Schema name> which qualifies a <Collation name> names the Schema that the Collation belongs to and can either be explicitly stated, or it will default to INFORMATION_SCHEMA; that is, an unqualified <Collation name> is always assumed to belong to INFORMATION_SCHEMA even if a CREATE COLLATION statement is part of a CREATE SCHEMA statement. (User-defined Collations may not belong to INFORMATION_SCHEMA. Therefore, when defining, using or dropping a user-defined Collation, always provide an explicit <Schema name> qualifier for the <Collation name>.)

Here are some examples of possible <Collation name>s:

SQL_TEXT
-- a predefined <Collation name>

SCHEMA_1.COLLATION_1
-- a simple qualified user-defined <Collation name>

CATALOG_1.SCHEMA_1.COLLATION_1
-- a fully qualified user-defined <Collation name>

Form-of-use conversion names

A <Form-of-Use conversion name> identifies a character repertoire's encoding scheme the one-to-one mapping scheme between each character in the repertoire and a set of internal codes (usually 8-bit values) that define how the repertoire's characters are encoded as numbers. These codes are also used to specify the order of the characters within the repertoire and so can be used to specify the default Collation for a Character set. Supported Forms-of-use are all predefined by your DBMS and thus belong to INFORMATION_SCHEMA. SQL provides no ability to define your own Forms-of-use. The required syntax for a <Form-of-use conversion name> is:

<Form-of-use conversion name> ::=
[ INFORMATION_SCHEMA. ] unqualified name

A <Form-of-use conversion name> is a <regular identifier> or a <delimited identifier> that is unique (for all Forms-of-user) within INFORMATION_SCHEMA. The <Schema name> which qualifies a <Form-of-use conversion name> names the Schema that the Form-of-use belongs to and can either be explicitly stated, or it will default to INFORMATION_SCHEMA: the only Schema that may own a Form-of-use.

Here are some examples of possible <Form-of-use conversion name>s:

FORM_1
-- a possible <Form-of-use conversion name>

INFORMATION_SCHEMA.FORM_1
-- a simple qualified possible <Form-of-use conversion name>

CATALOG_1.INFORMATION_SCHEMA.FORM_1
-- a fully qualified possible <Form-of-use conversion name>

[NON-PORTABLE] The Forms-of-use available for use is non-standard because the SQL Standard requires implementors to define which (if any) Forms-of-use they will explicitly provide. [OCELOT Implementation] The OCELOT DBMS that comes with this book provides no explicit Forms-of- use.

If you want to restrict your code to Core SQL, don't use any <Collation name>s or <Form-of-use conversion name>s.

Comments

Comments loading...