PHP UTF-8 cheatsheet

By | 03 July 2006 | 76 comments


When we started building DropSend, we decided to support all languages worldwide from the start. The interface is currently in English only, but the application can send, store, sort and process your data whatever language you want. As a result, we have a good number of customers out east.

To support worldwide languages, you need to use UTF-8 encoding for your web pages, emails and application, rather than ISO 8859-1 or another common western encoding, since these don't support characters used in languages such as Japanese and Chinese.

Happily, UTF-8 is transparent to the core Latin characterset, so you won't need to convert all your data to start using UTF-8. But there are a number of other issues to deal with. In particular, because UTF-8 is a multibyte encoding, meaning one character can be represented by more one or more bytes. This causes trouble for PHP, because the language parses and processes strings based on bytes, not characters, and makes mincemeat multibyte strings - for example, by splitting characters 'in half', bodging up regular expressions, and rendering email unreadable.

There are a number of great articles online about UTF-8 and how it works - Joel Spolski's comes to mind - but very few about how to actually get it working with PHP and iron out all the bugs. So, here to save you the time we put in, is a quick cheatsheet and info about a few common issues.

1. Update your database tables to use UTF-8

CREATE DATABASE db_name
CHARACTER SET utf8
DEFAULT CHARACTER SET utf8
COLLATE utf8_general_ci
DEFAULT COLLATE utf8_general_ci
;

ALTER DATABASE db_name
CHARACTER SET utf8
DEFAULT CHARACTER SET utf8
COLLATE utf8_general_ci
DEFAULT COLLATE utf8_general_ci
;

ALTER TABLE tbl_name
DEFAULT CHARACTER SET utf8
COLLATE utf8_general_ci
;

2. Install the mbstring extension for PHP

Windows: download the dll if it's not in your PHP extensions folder, and uncomment the relevant line in your php.ini file: extension=php_mbstring.dll
Linux: yum install php-mbstring

3. Configure mbstring

Do this in php.ini, httpd.conf or .htaccess. (Remember to prepend these with 'php_value ' in httpd.conf or .htaccess.)

mbstring.language		= Neutral	; Set default language to Neutral(UTF-8) (default)
mbstring.internal_encoding = UTF-8 ; Set default internal encoding to UTF-8
mbstring.encoding_translation = On ; HTTP input encoding translation is enabled
mbstring.http_input = auto ; Set HTTP input character set dectection to auto
mbstring.http_output = UTF-8 ; Set HTTP output encoding to UTF-8
mbstring.detect_order = auto ; Set default character encoding detection order to auto
mbstring.substitute_character = none ; Do not print invalid characters
default_charset = UTF-8 ; Default character set for auto content type header

4. Deal with non-multibyte-safe functions in PHP

The fast-and-loose way to do this is with the following php configuration:

mbstring.func_overload	= 7 ; All non-multibyte-safe functions are overloaded with the mbstring alternatives

But there are problems with this. php.net has a warning about this potentially affecting the whole server. And even if this isn't an issue for you, mbstring can make a mess of binary strings.

So, a better route is to search your application code for the following functions, and replace them with mbstring's 'slot-in' alternatives:

mail()		-> mb_send_mail()
strlen() -> mb_strlen()
strpos() -> mb_strpos()
strrpos() -> mb_strrpos()
substr() -> mb_substr()
strtolower() -> mb_strtolower()
strtoupper() -> mb_strtoupper()
substr_count() -> mb_substr_count()
ereg() -> mb_ereg()
eregi() -> mb_eregi()
ereg_replace() -> mb_ereg_replace()
eregi_replace() -> mb_eregi_replace()
split() -> mb_split()

5. Sort out HTML entities

The htmlentities() function doesn't work automatically with multibyte strings. To save time, you'll want to create a wrapper function and use this instead:

/**
* Encodes HTML safely for UTF-8. Use instead of htmlentities.
*
* @param string $var
* @return string
*/
function html_encode($var)
{
return htmlentities($var, ENT_QUOTES, 'UTF-8') ;
}

6. Check content-type headers

Check through your code for any text-based content-type headers, and append the UTF-8 charset, so the browser knows what it's working with:

header('Content-type: text/html; charset=UTF-8') ;

You should also repeat this at the top of HTML pages:

<meta http-equiv="Content-type" value="text/html; charset=UTF-8" />

7. Update email scripts

Email can be tricky. You'll need to update the content-type for any emails and text-based mime parts to use UTF-8 encoding. You'll also need to alter the way in which headers are encoded to use UTF-8. mbstring provides a function mb_encode_mimeheader() to handle this for you, but it does make a mess of address lists: you'll need to encoding the name and address parts seperately, then compile them into an address list.

Be sure to encode the subject and other headers too - Korean speakers will tend to put Korean text for the subject.

9. Check binary files and strings

Finally, double check any binary files and strings handled by PHP, particularly uploads, downloads and encryption. In some cases it may be necessary to revert to ASCII just before a download or processing a binary string.

Topics: MySQL PHP UTF8

Comments

IS 09/08/2006 16:14

Anyone can guide me how to convert the UTF8 (三寶山) to chinese character. Someone told me to use mb_string or iconv but I don't know how to used this.

Sal Randolph 11/08/2006 20:51 - Visit »

You are a god. Seriously. I can't thank you enough for this incredibly helpful page. I had naively thought just setting up my database for unicode would be enough, and was dismayed to see a page of chinese text turned into question marks! After working through your checklist, chinese is chinese again! Happiness. (bow)

Kris 14/09/2006 18:31

Yes I agree, that's the way it should be done!

Ricky 12/10/2006 01:37

I originally thought making multi-language websites was merely a copy and paste solution, and then discovered all the fun that is UTF-8.

This is by far the most useful PHP mbstring write up I've come across. Many thanks.

johnszot 17/10/2006 05:36

so stumped. PHP is still spitting out '?????'s from MySQL's utf collated fields. i'm sure it's a rookie mistake - but i've triple-checked the stuf fon this site (which is helpful despite my problem).......trouble shooting tips?

Bakyt Niyazov 28/10/2006 17:22

Thank you! You've really helped me!!!

amagondes 14/11/2006 18:37

johnszot, i'm not sure but check you browsers character encoding. I had the same problem and for some reason my browser was not doing the right thing

Daniel 17/11/2006 13:36 - Visit »

hey guys.. try this: http://people.w3.org/rishida/scripts/uniview/conversion

nicolas 01/12/2006 12:54

Hi! My name is Nicolas and I'm from Argentina! I found your explanation really useful, but still, I have a doubt about emails; I need to send emails in different languages (chinese, english, french, for example) and I'm having problems with special characters (for example "á"). The encoding I'm using, it's UTF-8, that works perfect with chinese, but characters like the one I mentioned before, are not displayed... Do you have any Idea for solving this??? Thanks!!!

PS: Sorry for mi english!

Helen 28/01/2007 04:11

PHP Guru,

Could you please help me how to change the default UTF-8 (charset) to GB2312? Although I set GB2312 in the php file like , the reponse still is UTF-8.

Your help and advice are appreciated.

Helen

loch 12/02/2007 16:59

I did steps above and still get buncha ???

In phpmyadmin, I see the utf-8 char just fine yet when displayed on a web page, i see only ???

the web page header info is set to utf-8

kajetan 14/03/2007 23:41

Great. I'm about to convert my site to UTF-8 and this will save me hours if not days of trial and error. Thanx.

artoodetoo 22/03/2007 10:36 - Visit »

For Russian-reading users may be useful http://punbb.ru/viewtopic.php?id=1222

Misha 25/03/2007 00:52

Amazing guide!!! Thanks!

I have just one question about the upload bit script - I do not understand in which cases we should fix our upload scripts.

Thanks!

johnny 25/03/2007 17:37

Really good guide, but I still have the problem with the '???' characters. Like 'loch' user above, everything displays nice in the phpmyadmin interface, but when i try to run the application I get a lot of '???' instead of greek characters. All I'm trying to do in my test page is query the database (using pear::MDB2) and then show the results like this:

Name Text Author Category $article[article_name] $article[article_text] $article[author_name] $article[category_name] "; } ?>

Any help???

johnny 25/03/2007 17:51

OOPS, mistake above... :) The code is:

<table border=1> <tr> <th>Name</th> <th>Text</th> <th>Author</th> <th>Category</th> </tr> <?php foreach ($articles as $article) { echo "<tr> <td>$article[article_name]</td> <td>$article[article_text]</td> <td>$article[author_name]</td> <td>$article[category_name]</td> </tr>"; } ?> </table>

Shaun 28/03/2007 14:01 - Visit »

I am trying to output my data to a text file that then needs to be read into a separate system, i can get the foreign characters to appear on the html pages with out a problem, using the same code but with fopen, fwrite etc to write the text file, when i open the text file in notpad the characters are mumbo jumbo. I can open this file in MSword and set the encoding to utf8 and save the file, but i would like php to know that the file being saved should be utf8. Any suggestions? I have followed everything here and tried utf8_encode and various other suggestions but to no avail.

Jason Lefkowitz 09/04/2007 21:05 - Visit »

"Try to learn some english man ..."

Am I the only one amused to see this comment attached to a post about properly internationalizing your code?

Isn't it ironic... doncha think...

faye 12/04/2007 12:01

this article is a great help... by the way, i would like to ask how to read files with japanese characters in it and display on the screen.. i am working on this problem but i couldn't seem to find the solution... help would be gladly appreciated...

Chris Bloom 25/04/2007 05:51 - Visit »

Thanks for this! It just saved me days worth of trial and error. For what it's worth, if you're importing Unicode text from Windows (via file upload) and you want to convert it to UTF-8 (Windows Unicode is actually UTF-16), use:

$string = mb_convert_encoding($string, 'UTF-8', 'UTF-16');

See http://us.php.net/manual/en/ref.mbstring.php#50298 for more info.

Dan 26/04/2007 00:27 - Visit »

Great article, but when I add the configuration settings in my .htaccess file, I get an error 500. Any idea why?

BB 26/04/2007 08:52

i'm using mb_convert_encoding to convert uft8 to big5, but every conversion will have "?" at the beginning of the string. For example: ?情在人間. May i know why??

DD 22/05/2007 21:57 - Visit »

You can try to trim the string before using mb_convert_encoding, then "?" will be gone.

Nick Nettleton 30/05/2007 22:28

A couple of tips if you're still seeing a lot of '???':

1. View source - if the characters appear correctly in the source code, then you're probably not html encoding correctly, as in Johnny's case above. You should always use the html_encode() function above to encode plain text content as you output it to a web page.

2. Use the View > Character Encoding menu in your web browser to see if it understands that you are working in UTF-8. If not, review your HTTP headers and meta tags as above.

3. If things are still wrong, your characters are getting mashed somewhere in transit - run through your code starting at the point of communication with the database, printing out key variables at each point, to see if you can find the source of the error.

Petronel 08/06/2007 11:53 - Visit »

I am so happy that what I've did in the past month alone gaved me the same results readed now in this article ;)

Callum 08/06/2007 17:55

Thank you so much for this. I've been looking for practical, easy-to-understand advice on the different things to bare in mind when using UTF-8 in PHP projects for ages.

One minor thing, shouldn't the name of the attribute in the meta tag be "content", not "value"? It probably works either way, but I thought I'd mention it in case it didn't.

Moeh Bass 11/06/2007 02:37

PHP Bug #34776 mb_convert_encoding() - wrong convertion from UTF-16 (problem with BOM) http://bugs.php.net/bug.php?id=34776

Do you have any idea about fixes for this bug? Was it fixed? Does it matter alot?

Callum 18/06/2007 13:43

For part 3 (Configure mbstring), I tried doing this in PHP using ini_set(), instead of doing it in .htaccess/php.ini/httpd.conf. (I realise those methods are probably quicker but there are various reasons why its easier for me to set the options in PHP in a few of my projects.) They all worked fine apart from "mbstring.encoding_translation". I set that to "On", but it didn't work; when I called ini_get() just after the ini_set(), the value was still "0".

Do you have any idea why this might be? And more importantly, because I have a few situations when I cannot use .htaccess etc, could you explain to me the importance of the encoding_translation setting? I mean, can I get by without it; is there a workaround I could use in my code to manually translate HTTP input? Perhaps a bit of code I could put at the top of my script that would just convert everything? (And what is it exactly that needs translating - file uploads? form submissions?)

Any advice much appreciated.

Claudia 21/06/2007 09:06

Just to add to all the other tips: If the database is utf8 and your website is utf8 and you still see a lot of question marsk/hollow squares you might need to change the connection encoding for MySQL. Often this is still set to latin. See here: http://www.mysql.org/doc/refman/4.1/en/charset-connection.html

Leander 30/06/2007 20:30

Great page! Thanks! It has helped me, but I'm still missing something. When I insert, for example, greek text in the database. How should I do this? Do I need to use html_encode? Or maybe utf8_encode/decode?

Because it works for greek characters, but not all of them. Κι ότι άτομα αλλάζοντας πιθανότητες is displayed as: Κι ?τι ?τομα αλλ?ζοντας πιθαν?τητες is it because characters like: όά don't exist in htmlentities?

I'm probably doing something wrong, because thai, chinese, japanese don't show up at all.

Hope you can help me! Thanks!

Yacahuma 03/07/2007 15:21

I was trying to read an xml service that was generating spanish characters and was getting invalid characters error by the simplexml_load_file

I created this to fix the problem

file: proxy.php $xml = implode('', file('http://address/xml_service.php')); header("Content-Type: text/html;charset=ISO-8859-1"); print "\n"; echo utf8_encode($xml);

file: reader.php ... $uri='http://localhost/proxy.php'; $s = simplexml_load_file($uri); ... This was faster than using simplexml_load_string

Leander 04/07/2007 21:35

Just wanted to thank you! Everything works fine and there are no problems with any language so far. Japanese, Thai, Russian, Turkish, Chinese, Greek and Arabian all work, without showing question marks!

Thank you so much!

fkhan 12/07/2007 00:56

Great info! I would add one thing. To prevent ???? (ISO-8859-1) characters from being returned by MySQL I had to perform this query after the initial database connection/selection: mysql_query("SET NAMES 'utf8'")

Lee McLaughlin 30/07/2007 04:29 - Visit »

Great tutorial thanks and also that was a little gem of info left by **fkhan** sorted me right out!

gia 23/08/2007 21:46

Yeah I had everything set and this is what I had missing: PHP created latin1 connections by default. To fix that just call after connection:

mysql_query("SET NAMES 'utf8'")

Ivan Chu 11/09/2007 19:38

The good page. Пасиб ))

yair 10/11/2007 23:19

gia, your last post saved meeeeeee! thanks!!!

Dave Gregory 18/11/2007 16:16

Really really useful, thanks! Just thought I'd share something that has really messed me up on this...

My web host, ninja legend that he is, installed suhosin (www.hardened-php.net/suhosin/) to save us from the big bad monsters. Unfortunately, it didn't want to play nicely with the mbstring.encoding_translation php_flag recommended here.

Suhosin was throwing errors like this into my logs: [error] ALERT - COOKIE variable name begins with disallowed whitespace - dropped variable ' PHPSESSID' (attacker '', file '.php') It was also dropping sessions (logins, etc) left right and centre. Annoying since I was relying heavily on sessions for functionality.

I only discovered this was related to mbstring by mistake when I started a new project (different path, different .htaccess file!) and came round to applying these fixes all over again. Suddenly everything broke. It was about 5am so I sobbed a bit and went to bed... and as I slept, the Lord sent unto me a vision, saying "It worked, then you did all that mbstring stuff to fix your special characters, and now it doesn't. Oh, and 'disallowed whitespace' implies a character encoding problem. Join the dots, my son." Long story short, I woke up and took out all the settings one by one until I found the culprit: mbstring.encoding_translation.

I don't know for sure why Nick's suggested this one; I suspect it's to do with surviving random client browser settings, but luckily for me I'm not really doing i18n, just special-character-dodging. Anyway, hopefully this will save someone a bit of pain.

viral 02/12/2007 18:13 - Visit »

Many thanks for this superb info.

The only line missing was ... mysql_query("SET NAMES 'utf8'")

just after the mysql connection.

All problem solved at one shot !!!

Thanks buddy, fkhan

Ben 13/12/2007 10:31 - Visit »

A most excellent walk-through. Had an intranet solution sorted for multi-language within half a day!

A big thanks to **fkhan** above as well for pointing out that you also need to add mysql_query("SET NAMES 'utf8'"); To your PHP code after you connect and select the DB for the first time in a script! Worked like a treat! Thanks

Ben 14/12/2007 11:20 - Visit »

Oh and also... pay attention to step 8... that's the most important ;)

Stefan 06/03/2008 12:25

Followed all instructions, pointed everything possible to utf-8, but still ?????? when updating the database through a form.

Appeared in the end in my case that

mysql_query("SET NAMES 'utf8'");

was not enough. It had to be:

mysql_query("SET NAMES 'utf8' COLLATE 'utf8_general_ci'");

Glad that this nightmare has been solved!

Sandbergen 05/01/2008 11:57

You might not always have the permissions to set the correct settings in the database as your website may be running on a shared webserver with many different users. This could imply that after you have set everything as described, you still aren't getting the proper results. This little piece of code might do the trick:

first connect to the database with: $conn = mysql_connect($host, $user, $pass); mysql_select_db($db_name);

then execute those 2 queries, only the second really matters though: mysql_query("SET CHARACTER SET utf8"); mysql_query("SET NAMES utf8");

Tom 29/02/2008 14:08 - Visit »

I'm also having troubles with UTF8 vs Latin1 with passing € variables through PHP.

thx for the info I'll give it a try

SlyBaby 14/02/2008 14:42

For: Callum - 18 June 2007 13:43 mbstring.encoding_translation and mbstring.language can be set in PHP_INI_PERDIR witch means they cannot be set in scripts. All other settings (from those discussed) are PHP_INI_ALL witch means they can be set everywhere, including scripts. default values: - mbstring.encoding_translation = "0" - mbstring.language = "neutral" that's why, encoding didn't work, and language seemed to work. So, for those two, they must be set in an htaccess to witch it's pretty safe to assume you have access to. source: http://www.php.net/manual/en/ini.php#ini.list

to the author: great article, very helpfull, keep'em coming :).

Sean Kealn 15/02/2008 18:23 - Visit »

Hello, i need urgent help!

I was trying to send emails with RUSSIAN and GREEK subjects, however i couldnt. Could any one help me regarding for it ?

Thanks!

Hendricus 24/03/2008 12:04

php_value mbstring.func_overload 7

instead of

php_value func_overload 7

Code blindness :)

Hendricus 24/03/2008 11:19

Hmmz I found out that;

echo ini_get("mbstring.func_overload"); // returns 0 eventhough set to 7 in htaccess

and that;

ini_set("mbstring.func_overload", 7); echo ini_get("mbstring.func_overload"); // returns 0 eventhough set to 7 in htaccess AND in PHP

Any thoughts on this then?

Hendricus 24/03/2008 11:08

Thanx for this article... been having troubles with UTF-8 encoding quite a bit! Since I read this article I've been playing around with it and testing stuff and found out something weird. I did the settings thru .htaccess;

### Set default language to Neutral(UTF-8) (default) php_value mbstring.language "Neutral" ### Set default internal encoding to UTF-8 php_value mbstring.internal_encoding "UTF-8" ### HTTP input encoding translation is enabled php_value mbstring.encoding_translation "On" ### Set HTTP input character set dectection to auto php_value mbstring.http_input "auto" ### Set HTTP output encoding to UTF-8 php_value mbstring.http_output "UTF-8" ### Set default character encoding detection order to auto php_value mbstring.detect_order "auto" ### Do not print invalid characters php_value mbstring.substitute_character "none" ### Default character set for auto content type header php_value default_charset "UTF-8" ### Use multibyte functions by default, so strtoupper automaticall becomes mb_strtoupper php_value func_overload "7"

I load an external file like this;

$data = file_get_contents("flatfile.html"); //ISO 8859-1 contents

now if I; echo strtoupper(utf8_encode(nl2br($data))); All characters get uppercased, EXCEPT for accented chars like é è ä etc etc.

but if I; echo mb_strtoupper(utf8_encode(nl2br($data)), "utf-8"); It uppercases all chars, even the accented ones...

But I thought the .htaccess settings; php_value func_overload "7" default_charset "UTF-8" would automaticallu make php use mb_ functions witg utf-8??

Any thoughts on this?

Michael Robinson 12/06/2008 06:08 - Visit »

Thank you so much for this!

I blundered through setting up a database for my Chinese Idiom Database, and ran into a load of problems!

I ended up changing all fields that could contain Chinese to "binary", because this was the only format that consistently displayed Chinese, instead of nonsense.

I really like the tip "Check the browser knows which encoding the file is" made by Nick Nettleton - I was puzzling over why a file that was output by a php script was displaying strange characters! I know I could put the meta tag at the top of the file, but this file needs to have only links, as it forms the input for a flash animation.

Anyway, I'll surely be bookmarking this,

Thanks!

Davide Romanini 27/06/2008 13:01

"UTF-8 is transparent to the core Latin characterset" is not true: only the first 127 codepoints are the same. For example the character 'à' is a valid Latin character (0xE0) but have a totally different rapresentation in UTF-8 (0xC3 0xA0). That means, if you have Latin characters >127 in you php scripts, you should also recode the script itself in UTF-8, otherwise you'll have problems (ex: a simple print 'à' will throw garbage in the page).

Kay 29/06/2008 00:54

Hi, your advice has been very helpful. Thank you for sharing your knowledge!

I'm creating a website with PHP and MySQL which is to support English and Chinese.

I'm wondering if you could help on a problem I have:

If I select something from the db to be displayed on a webpage, the Chinese shows up fine.

If I insert something into the db via an html form, the Chinese shows as gibberish when I view it with phpMyAdmin.

If I insert something into the db via phpMyAdmin, and then view that record in phpMyAdmin, it shows up as the Chinese character (non-gibberish).

My question is, should the Chinese show up as the actual character in the db or is it alright that it's showing as gibberish when I view it via phpMyAdmin?

Thanks for your time, Kay

MD Nur Hossain 29/06/2008 19:51

cool article for dynamic multi-language support, but got problem when I used

mysql_query("SET NAMES 'utf8' COLLATE 'utf8_general_ci'");

except above query it works fine with IE and FF (safari doesn't work). any body describe plz, why the above query makes problem? thanks in advanced

umpirsky 01/07/2008 07:54 - Visit »

You forgot parse_str -> mb_parse_str.

Veeru 20/08/2008 10:51

Hi there, I have been experimenting with Unicode and php, but i got no where, this post looks promising, but being a beginner with unicode, can somebody provide simple "hello world" kind of examples with unicode?

I just need to know how to show chinese or thai or korean, text on my web page. How can i do it? Is it possible to store language strings in a text file and show them on a html page?

I am looking at developing a multi-lingual website; what is the easiest way to switch between output languages?. Any help is very much appreciated.

Thanks

Itzco 21/08/2008 10:03 - Visit »

Good guide, I work in Thailand and Vietnam so we get a lot of weird issues with encoding so I just want to add the ones that comes to my mind:

1) As someone already noted: When you connect to the database you need to set collation and names mysql_query("SET NAMES 'utf8'"); mysql_query("SET NAMES 'utf8' COLLATE 'utf8_general_ci'");

Still sometimes we get garbage on generated files or even on the browser;

Some of this recommendations are also useful in case you are sending images or any other kind of file that is not just HTML where spaces are ignored.

2) Verify that all your PHP files contain NO spaces before or after the 3) If you are using templates verify that all files (this can apply also to php files) are saved in UTF without BOM format, this can be set in many editors like notepad++ (BOM is evil be careful!) 4) In case you cannot be sure all spaces are gone you can use caching to be sure nothing is sent before your content: ob_start(); then u can delete everything in the buffer before sending your content: ob_clean(); ** All previous ones might look unrelated but believe me, they are not, when you are working UTF8, save all your file correctly using UTF8 without BOM

5) Don't forget to change the encoding for your ajax XML

6) And lastly an error on redirections that I think most people will never see

Content Encoding Error (content_encoding_error)

“Server response could not be decoded using encoding type returned by server.

This is typically caused by a Web Site presenting a content encoding header of one type, and then encoding the data differently.”

This problem occurs when compression is activated and the content has different encodings, normally occurs when u try to redirect to another page.

Before your redirection call: @ini_set('zlib.output_compression', 'Off'); header("Location: whereveryouwanttogo");

Ok, hope this can help anyone

Thomas Steven 01/09/2008 11:47 - Visit »

I was using the MDB2 library to connect to my MySQL database, and I had to do the following to make the connection work correctly :

$this->dbh =& MDB2::connect($dsn,$options); $this->dbh->setCharset('utf8'); // set the connection charset to utf8

I guess other libraries may have the same issues. Without this I was seeing question marks occasionally.

david 16/09/2008 12:29 - Visit »

For me, adding this right after connecting to my database did the trick!

mysql_query("SET NAMES 'utf8'");

(Of course, I set the encoding as instructed in #1 above)

Thanks for the help!

Zaenal 11/11/2008 17:58 - Visit »

Using PDO: Changing the connection to UTF can be configured when creating new PDO instance.

$dsn = "mysql:host=anyhost;dbname=anydbname"; $user = "dbuser"; $pass = "dbpassword"; $options = array(PDO::MYSQL_ATTR_INIT_COMMAND => " SET NAMES utf8, time_zone = '+07:00' ");

try { $dbh = new PDO($dsn, $user, $pass, $options); } catch (PDOException $e) { echo 'Connection failed: ' . $e->getMessage(); }

Adding [SET] time_zone to your local timezone is best practice.

Hope this help, @zaenal

Alex 08/12/2008 16:41

Hi guys... If you still have some problems by using UTF8 and getting '???????' in browser or in your databases try to update your ODBC driver to version 5.1 http://dev.mysql.com/downloads/connector/odbc/ IT HAS HELLPED ME...

Ferdy 20/12/2008 10:45 - Visit »

Some of you may be interested in my article "Building Unicode LAMP applications":

http://ferdychristant.com/blog/articles/DOMM-7LDBXK

It is kind of similar to this article really, but maybe there's something new in it for you :)

cq 20/06/2009 13:03

You should also check mb_internal_encoding

Ramon Fincken 27/07/2009 19:40 - Visit »

Excellent post, I used it yesterday for some RSS/XML > mysql parsing!

Linked @ http://www.ramonfincken.com/permalink/topic165.html

Totti 11/11/2009 11:29 - Visit »

helpful post.. thanks bro

Duncan 23/11/2009 16:36 - Visit »

Great write up. I have just worked out that I needed

mysql_query('SET NAMES utf8');

just after the connect on the PHP side of things to get things working,

It's been said before, but it's crucial :)

Peace one and all, and thanks.

TMG 02/02/2010 04:43

Make sure you set your client side encoding if you're pulling from a mysql DB:

mysqli_set_charset($dbc, 'utf8'); // for utf8 on the client side

Mark 07/02/2010 02:32

Excellent and much-needed cheatsheet. Thanks!

You mention mbstring alternatives for ereg*. I'm assuming preg* are non-issues since they support the /u flag. Is that accurate?

Matthias K. 08/02/2010 16:35

Short and precise - Thank you very much!

Nevertheless, I hope PHP6 will be out soon, thus easing the UTF-8 pain a little...

Jide Otuyelu 17/07/2010 06:04 - Visit »

These are all great practical secrets to properly display utf8 encoded characters. Thanks for contributing!

Oliver 20/07/2010 10:52

Thank you for this short and sweet article.

You recommend -with good reasons- not to use mbstring.func_overload and rather replace the string functions in the application code. But what if I'm using many 3rd party PHP libraries like PEAR packages, Smarty, Zend framework, etc.? I know my own code very well, I also know the very most of all other contributors' code, but I can't go through all the 3rd party libraries and check if their respective authors were thinking of binary or textual strings at each function call. I'm afraid that most of the 3rd party authors haven't ever published whether or not their code is compatible with the overloading particularly (some might even never have thought of it).

So - would you still, even in this situation, generally abandon the option of overloading? Thank you.

Charly 13/08/2010 07:21 - Visit »

Thanks, i spent2 weeks dealing with this problem after a server migration!!!!

Jeroen 28/08/2010 13:40

The HTML meta element has no value attribute, but it requires a content attribute. So the HTML content-type header should be:

<meta http-equiv="Content-type" content="text/html; charset=UTF-8" />

Fat Billy 19/10/2010 09:20

Thanks for the article, very helpful.

In Propel I also had to add;

$con = Propel::getConnection(); $con->query("set names utf8");

crstntnc 28/11/2010 21:38 - Visit »

I'm building a site that uses Romanian as the default language. In this language, characters like ș and ț are quite frequent and PHP's standard string functions are having a hard time dealing with them. Even though I had already activated the mbstring extension, I was still having issues. Step #3 in your cheatsheet saved my day!

Thank you!

HjalteTan 06/09/2011 12:11 - Visit »

I just canged my user table to use utf8 sicns we're a danish group of people and som of us have æ ø or å in our name. But now my login dont work, any idears? Is there somthing i need to change to get it to work again? Befor i canged it, it was set to latin1_swedish_ci.

Please reply. Thanks.

Regards Hjalte Tan. www.ireporternews.org - Administrator

NickNettleton 18/10/2011 10:37 - Visit »

@HjalteTan - If your latin1 fields already included some characters such as æ ø or å, these may not have converted correctly. These days, when we need to convert a table, we convert if first to binary, then to UTF8, which solves the problem.

Nick

Other posts discussing this page