chapter 5 : unicode versus ascii

ew6014 · February 18th, 2010, 11:21 PM

hi ...

ive got a question

page 103
unicode uses two bytes per character and permits over 65,000 individual characters.

what does it mean ??? if i were to display an entire newspaper which contains multiple pages say in japanese will that mean each character takes up 2 bytes? .. wont the application just crash if say there were 4-5million characters?

and also , what is UTF-8 in relation to unicode? is one better than the other

DrPurdum · February 19th, 2010, 01:08 AM

Unlike English, many languages are written using a non-ASCII character format (e.g., Kangi) and cannot be represented with the ASCII character set. By expanding the number of bytes from one to two bytes, these special language characters can be represented. Yes, if 8 there are 4 million characters in a newpaper, it would take 8 megabytes to represent it in Unicode. However, with today's gigabyte computers, it's not a problem.

UTF-8 is an ASCII-preserving encoding method that is defined in the Unicode specs. Simply stated, it allows an 8-bit version of Unicode that is consistent with ASCII. If you want more details, just Google it.

If you're going to write web apps or write apps that might be read in countries that don't use ASCII as their native character set, I'd use Unicode. If you're writing code for embedded systems where you're lucky to have 32K of memory, you've got to economize on each byte. The circumstances more or less dictate which you will use.