[OpenSIPS-Users] UTF8 in MySQL database

Bogdan-Andrei Iancu bogdan at voice-system.ro
Tue Mar 31 10:44:42 CEST 2009


Hi Phil,

Phil Vandry wrote:
> On Wed, Mar 25, 2009 at 09:03:35AM +0100, Jacek Konieczny wrote:
>   
>> that is not a sane way to do things. The problems will start as soon as
>> someone will try to process this data as 'latin1' (according to the
>> declaration on the database), when it is not latin1.
>>     
>
> Agreed. And the database would be perfectly within its rights to reject
> or corrupt any byte in the range 0x80 to 0x9f if the encoding is latin1
> (those bytes are not used in latin1), so you cannot even count on binary
> transparency. (I doubt MySQL actually does this, though.)
>
>   
>> But, back to my original question, as can understand that 'latin1' is ok
>> for some or even most people. My my question was: is there any specific,
>> technical reason, that 'utf8' is forbidden? I don't think OpenSIPs does
>>     
>
> I don't know why you are getting a problem with UTF-8 but there is one
> issue with MySQL and UTF-8 that's worth mentioning (it's not related to
> OpenSIPS). The MySQL docs do draw attention to this point.
>
> If you have a CHAR(n) column (not a VARCHAR column) and your table is
> using a fixed-length record (usually, myisam with no VARCHAR columns),
> the CHAR column must reserve 3*n bytes with UTF-8 but requires only n
> bytes with latin1 or ASCII.
>
> (Actually it should be 4*n, not 3*n, but MySQL's support for UTF-8 is
> crippled and only supports characters up to U+00FFFF, and that means it
> never needs more than 3 bytes to encode one character.)
>   
So, more or less it is about the table size - what is not clear for me 
(from what you say) is why for a char(n) you need n bytes when using  
latin1 charset? it means it supports only 256 chars? because according 
to mysql docs, the latin1 supports a lot of non-standard chars (extended 
codes).

Regards,
Bogdan




More information about the Users mailing list