Unicode
Unlike English, many languages are written using a non-ASCII character format (e.g., Kangi) and cannot be represented with the ASCII character set. By expanding the number of bytes from one to two bytes, these special language characters can be represented. Yes, if 8 there are 4 million characters in a newpaper, it would take 8 megabytes to represent it in Unicode. However, with today's gigabyte computers, it's not a problem.
UTF-8 is an ASCII-preserving encoding method that is defined in the Unicode specs. Simply stated, it allows an 8-bit version of Unicode that is consistent with ASCII. If you want more details, just Google it.
If you're going to write web apps or write apps that might be read in countries that don't use ASCII as their native character set, I'd use Unicode. If you're writing code for embedded systems where you're lucky to have 32K of memory, you've got to economize on each byte. The circumstances more or less dictate which you will use.
__________________
Jack Purdum, Ph.D.
Author: Beginning C# 3.0: Introduction to Object Oriented Programming (and 14 other programming texts)
|