PHP&MySQL与UTF8终极处理方案

作为开发人员,我们有时会遇到编码问题,这里的编码并不是coding,而是语言编码。
到目前为止,我们的世界已经出现了比如我们经常看到的问题就是乱码问题,其实这个并不是一个很棘手的问题,只要我们设定一个标准的编码(比如UTF-8),数据库设定好UTL-8编码,PHP默认编码设定好这个编码,前端HTML也在中设定好编码,乱码问题一般都不会出现。

下面简要对utf8做一个介绍,毕竟它算编码界的天之骄子也算是目前最流行的编码方式:

Unicode is a widely-used computing industry standard that defines a comprehensive mapping of unique numeric code values to the characters in most of today’s written character sets to aid with system interoperability and data interchange.

UTF-8 is a variable-width encoding that can represent every character in the Unicode character set. It was designed for backward compatibility with ASCII and to avoid the complications of endianness and byte order marks in UTF-16 and UTF-32. UTF-8 has become the dominant character encoding for the World Wide Web, accounting for more than half of all Web pages.

UTF-8 encodes each character using one to four bytes. The first 128 characters of Unicode correspond one-to-one with ASCII, making valid ASCII text also valid UTF-8-encoded text. It is for this reason that systems that are limited to use of the English character set are insulated from the complexities that can otherwise arise with UTF-8.

For example, the Unicode hexidecimal code for the letter A is U+0041, which in UTF-8 is simply encoded with the single byte 41. In comparison, the Unicode hexidecimal code for the character is U+233B4, which in UTF-8 is encoded with the four bytes F0 A3 8E B4.

而,如果你是一个PHP+MySQL开发者,为了保证我们的编码没有问题,作为指挥官我们该如何做出规定,才能让它们能完美搭配起来呢?

MySQL需要做什么?
找到你的MySQL配置文件(Linux为my.cnf,windows下为my.ini),在各个[xxx]区域中编辑或添加成如下模样

[client]
default-character-set=utf8
[mysql]
default-character-set=utf8
[mysqld]
character-set-client-handshake = false #force encoding to uft8
character-set-server=utf8
collation-server=utf8_general_ci

[mysqld_safe]
default-character-set=utf8

重新启动MySQL服务器,
保证出现如下的结果即可
966372A8-22CE-442C-B25D-BFFC15A11D40

PHP需要做什么?
配置php.ini

default_charset = "utf-8";

Note:最好使用phpinfo()验证下,是否已经设定成功
coding
1.在有内容输出时,最好在PHP文件头有这么一句

header('Content-Type: text/html; charset=utf-8');

2.如果要使用PHP生成XML,每一个XML文件头应加上这么一句

<?xml version="1.0" encoding="UTF-8"?>

3.既然要使用PHP生成XML,那还要注意一点,在生成XML之前,需要使用如下的函数处理一下

function utf8_for_xml($string) {
    return preg_replace('/[^\x{0009}\x{000a}\x{000d}\x{0020}-\x{D7FF}\x{E000}-\x{FFFD}]+/u',
                        ' ', $string);
  }
$safeString = utf8_for_xml($yourUnsafeString);  

Note:原因在于,函数中的这些你看到的utf8编码XML并不认识,或者说冲突

4.如果你使用了类似于mb_strlen()和htmlspecialchars()等函数时,最好指定编码参数

mb_strlen($str, 'UTF-8');

htmlspecialchars($str, ENT_NOQUOTES, "UTF-8");

HTML需要做什么?
1.HTML文档标签加上

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

2.FORM表单中加上如下属性

<form accept-charset="utf-8">

好,这里来一条分隔线,上面是各个部分都设置了自己的编码,但是当我们的PHP与MySQL在交流的时候,怎么知道对方使用哪种编码呢?这就好像两个外国人在交流一样,所以我们要让他们使用人类世界的英语才行。

PHP连接MySQL时确定交流编码类型
1.如果你还在使用PHP5.5之前的版本

$link = mysql_connect('localhost', 'user', 'password');
mysql_set_charset('utf8', $link);

2.如果PHP5.5.0以上,mysql_set_charset已经被弃用了,所以你要这样设置

$mysqli = new mysqli("localhost", "my_user", "my_password", "test");
/* check connection */
if (mysqli_connect_errno()) {
    printf("Connect failed: %s\n", mysqli_connect_error());
    exit();
}    
/* change character set to utf8 */
if (!$mysqli->set_charset("utf8")) {
    printf("Error loading character set utf8: %s\n", $mysqli->error);
} else {
    printf("Current character set: %s\n", $mysqli->character_set_name());
}

$mysqli->close();

额外说明(持续更新)
1.IDE或者你的编辑器一定要设定是以UTF-8的方式打开/保存文件,并且不要带BOM头,尤其是在引入第三方库的时候
一段与UTF8-BOM签名斗争的历史

2.如果使用了sphinx搜索引擎,也要设定编码类型
打开sphinx.conf,分别在索引部分和数据源中增加如下配置

#在索引index加
charset_type = utf-8
#在source中增加
sql_query_pre = SET CHARACTER_SET_RESULTS=UTF-8
sql_query_pre = SET NAMES UTF-8

总结
其实编码问题并不是很难应对的问题,只要设定好一个标准,各个部分遵循好这个标准,基本上就不会出现类似于乱码的问题啦~

参考网址
http://www.toptal.com/php/a-utf-8-primer-for-php-and-mysql
字符编码笔记:ASCII,Unicode和UTF-8

PHP&MySQL与UTF8终极处理方案
Tags:             

发表评论

电子邮件地址不会被公开。 必填项已用*标注