[OpenSIPS-Users] UTF8 in MySQL database

Phil Vandry vandry at TZoNE.ORG
Mon Mar 30 17:15:51 CEST 2009


On Wed, Mar 25, 2009 at 09:03:35AM +0100, Jacek Konieczny wrote:
> that is not a sane way to do things. The problems will start as soon as
> someone will try to process this data as 'latin1' (according to the
> declaration on the database), when it is not latin1.

Agreed. And the database would be perfectly within its rights to reject
or corrupt any byte in the range 0x80 to 0x9f if the encoding is latin1
(those bytes are not used in latin1), so you cannot even count on binary
transparency. (I doubt MySQL actually does this, though.)

> But, back to my original question, as can understand that 'latin1' is ok
> for some or even most people. My my question was: is there any specific,
> technical reason, that 'utf8' is forbidden? I don't think OpenSIPs does

I don't know why you are getting a problem with UTF-8 but there is one
issue with MySQL and UTF-8 that's worth mentioning (it's not related to
OpenSIPS). The MySQL docs do draw attention to this point.

If you have a CHAR(n) column (not a VARCHAR column) and your table is
using a fixed-length record (usually, myisam with no VARCHAR columns),
the CHAR column must reserve 3*n bytes with UTF-8 but requires only n
bytes with latin1 or ASCII.

(Actually it should be 4*n, not 3*n, but MySQL's support for UTF-8 is
crippled and only supports characters up to U+00FFFF, and that means it
never needs more than 3 bytes to encode one character.)

-Phil



More information about the Users mailing list