Wrox Programmer Forums

Need to download code?

View our list of code downloads.

Go Back   Wrox Programmer Forums > Java > Java and JDK > Java Basics
Password Reminder
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read
Java Basics General beginning Java language questions that don't fit in one of the more specific forums. Please specify what version.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the Java Basics section of the Wrox Programmer to Programmer discussions. This is a community of tens of thousands of software programmers and website developers including Wrox book authors and readers. As a guest, you can read any forum posting. By joining today you can post your own programming questions, respond to other developers’ questions, and eliminate the ads that are displayed to guests. Registration is fast, simple and absolutely free .
DRM-free e-books 300x50
Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old March 12th, 2007, 10:29 AM
Authorized User
Join Date: Feb 2006
Location: , , .
Posts: 18
Thanks: 0
Thanked 0 Times in 0 Posts
Default character set problem

We are getting ¿ stored into our Oracle 10g database that is using WE8ISO8859P1 character set.

Problem is caused by the following:

Microsoft released software (in particularly MS Word) before considering any ANSI or ISO standard (although they claimed so).
At that time of pioneering graphical interface - they were the standard. Since then things changed. Microsoft initially targeted US marked, however very soon they wanted to expanded to Europe. For that they needed to get standardized character set instead of one initially being in use. Microsoft re-mapped character set in newer applications using Windows-1252 character set which is compatible with ISO-8851-1 (we are using in our Java Web applications). That cleared obstacles to forward on European market, where extended characters are necessary (like in French, Dutch, German languages...)

What happened with the initial character codes before Microsoft agreed with ISO to standardize characters? Well - Nothing.

So what are the consequences of that?

If we are using Microsoft Word document in conjunction with one of the oldest character set (universe) the ice-age character mapping is still there. So when we "cut and paste" the content to another application, characters sets are not mapped anymore. Especially French language "is sensitive to this" The Microsoft character set in Word decimally coded 146 ( ' ) is very often used in French.

Therefore if the text is generated in MS Word using old character mapping (universe) which we do, and using method of "cutting and paste" we are mismatching character set interpreted by other, newer applications.

Initially in the MS words (apostrophe) ' had code 191; then later after implementing Windows 1252 character set being moved to code 146 in accordance to ISO. Well ISO threat character coded 191 as ¿. So if you are using MS Word universe character set, ' looks like ' but in newer applications or ISO compatible applications it looks like ¿.

So we scratched out heads and start thinking what would be the right solution for us...

write a patch in Java to correct character mapping and in addition to it to eliminate displaying control characters which would mess up "the look and feel" of content displayed. Does anyone know of a java class that already does this ?


Reply With Quote
  #2 (permalink)  
Old March 15th, 2007, 12:16 PM
Authorized User
Join Date: Mar 2007
Location: , , USA.
Posts: 24
Thanks: 0
Thanked 0 Times in 0 Posts

Check the String class's methods http://java.sun.com/javase/6/docs/ap...ng/String.html

Reply With Quote

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Same character set output by pressing any key.... anilsaroliya Visual Basic 2005 Basics 1 May 18th, 2007 11:20 AM
Character set muthukumar Oracle 0 November 8th, 2005 12:49 AM
IMP-00016: required character set conversion not saravananedu Oracle 1 September 7th, 2004 05:00 AM
How to set DOM character set sonicDace XML 0 May 27th, 2004 08:52 AM
XMLDOM default character set sonicDace XML 0 May 21st, 2004 03:09 PM

All times are GMT -4. The time now is 12:09 PM.

Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
© 2013 John Wiley & Sons, Inc.