MySQL Character Set and Collations

MySQL query that fails because of differing collations.

Here’s how to resolve those issues.

PreWebHost.com

PreWebHost.com

In MySQL, each server, database, table and column can have its own character set and collation.You can see what character sets and collations your server has like so:

mysql> show character set;

 Charset  Description  Default collation  Maxlen
 big5  Big5 Traditional Chinese  big5_chinese_ci 2
 dec8  DEC West European  dec8_swedish_ci 1
 cp850  DOS West European  cp850_general_ci 1
 hp8  HP West European  hp8_english_ci 1
 koi8r  KOI8-R Relcom Russian  koi8r_general_ci 1
 latin1  cp1252 West European  latin1_swedish_ci 1
 latin2  ISO 8859-2 Central European  latin2_general_ci 1
 swe7  7bit Swedish  swe7_swedish_ci 1
 ascii  US ASCII  ascii_general_ci 1
 ujis  EUC-JP Japanese  ujis_japanese_ci 3
 sjis  Shift-JIS Japanese  sjis_japanese_ci 2
 hebrew  ISO 8859-8 Hebrew  hebrew_general_ci 1
 tis620  TIS620 Thai  tis620_thai_ci 1
 euckr  EUC-KR Korean  euckr_korean_ci 2
 koi8u  KOI8-U Ukrainian  koi8u_general_ci 1
 gb2312  GB2312 Simplified Chinese  gb2312_chinese_ci 2
 greek  ISO 8859-7 Greek  greek_general_ci 1
 cp1250  Windows Central European  cp1250_general_ci 1
 gbk  GBK Simplified Chinese  gbk_chinese_ci 2
 latin5  ISO 8859-9 Turkish  latin5_turkish_ci 1
 armscii8  ARMSCII-8 Armenian  armscii8_general_ci 1
 utf8  UTF-8 Unicode  utf8_general_ci 3
 ucs2  UCS-2 Unicode  ucs2_general_ci 2
 cp866  DOS Russian  cp866_general_ci 1
 keybcs2  DOS Kamenicky Czech-Slovak  keybcs2_general_ci 1
 macce  Mac Central European  macce_general_ci 1
 macroman  Mac West European  macroman_general_ci 1
 cp852  DOS Central European  cp852_general_ci 1
 latin7  ISO 8859-13 Baltic  latin7_general_ci 1
 cp1251  Windows Cyrillic  cp1251_general_ci 1
 cp1256  Windows Arabic  cp1256_general_ci 1
 cp1257  Windows Baltic  cp1257_general_ci 1
 binary  Binary pseudo charset  binary 1
 geostd8  GEOSTD8 Georgian  geostd8_general_ci 1
 cp932  SJIS for Windows Japanese  cp932_japanese_ci 2
 eucjpms  UJIS for Windows Japanese  eucjpms_japanese_ci 3

+———-+—————————–+———————+——–+

mysql> show collation like ‘latin1%’;

 Collation  Charset  Id  Default  Compiled  Sortlen
 latin1_german1_ci  latin1 5  Yes 1
 latin1_swedish_ci  latin1 8  Yes  Yes 1
 latin1_danish_ci  latin1 15  Yes 1
 latin1_german2_ci  latin1 31  Yes 2
 latin1_bin  latin1 47  Yes 1
 latin1_general_ci  latin1 48  Yes 1
 latin1_general_cs  latin1 49  Yes 1
 latin1_spanish_ci  latin1 94  Yes 1

FYI, the _ci, _cs and _bin endings indicate “case insensitive”, “case sensitive” and “binary”, respectively. Binary is, perforce, case sensitive.

To see the default settings for your server, check out the variables:

+———-+—————————–+———————+——–+

mysql> show variables like ‘char%’;

 Variable_name  Value
 character_set_client  latin1
 character_set_connection  latin1
 character_set_database  latin1
 character_set_filesystem  binary
 character_set_results  latin1
 character_set_server  latin1
 character_set_system  utf8
 character_sets_dir  /Applications/xampp/xamppfiles/share/mysql/charsets/

8 rows in set (0.01 sec)

+———-+—————————–+———————+——–+

mysql> show variables like ‘coll%’;

 Variable_name  Value
 collation_connection  latin1_swedish_ci
 collation_database  latin1_swedish_ci
 collation_server  latin1_swedish_ci

3 rows in set (0.00 sec)

So, you can specify the character set and collation right down to the column. A collation is specific to a given character set, so you can’t compare two fields from different character sets, or from different collations. In fact, that’s where we started: MySQL rapped my knuckles for trying to compare fields with different collations.

Let’s set up a collation collision for demonstration purposes.

drop table if exists a;

create table a (flda1 varchar(255)) engine=myisam collate=latin1_swedish_ci;

drop table if exists b;
create table b (fldb1 varchar(50) not null) engine=myisam collate=latin1_general_ci;

insert into a values (‘aaa’);
insert into b values (‘bbb’);

So if we did a select comparing the two fields, we’d get:

mysql> select * from a, b where a.flda1 = b.fldb1;
ERROR 1267 (HY000): Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (latin1_general_ci,IMPLICIT) for operation ‘=’
mysql>

Great. Now, to fix it, we want to set the collation of table b to latin1_swedish_ci. You’ll note that getting a “show create table b”, produces the following DDL script:

CREATE TABLE `b` (`fldb1` varchar(50) collate latin1_general_ci NOT NULL) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci

Clearly we’ll need to change both the table and the field’s collations:

alter table b character set latin1 collate latin1_swedish_ci,
modify column fldb1 varchar(50) character set latin1 collate latin1_swedish_ci NOT NULL

You can see the syntax for alter at http://dev.mysql.com/doc/refman/5.0/en/alter-table.html. After which we get a create script of:

CREATE TABLE `b` (`fldb1` varchar(50) NOT NULL) ENGINE=MyISAM DEFAULT CHARSET=latin1

And now, when we execute the select script:

mysql> select * from a, b where a.flda1 = b.fldb1;
Empty set (0.20 sec)

Our error message is gone and the script actually executes.  It’s Done !

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.