Wrox Programmer Forums
Go Back   Wrox Programmer Forums > Java > Java and JDK > Java Basics
|
Java Basics General beginning Java language questions that don't fit in one of the more specific forums. Please specify what version.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the Java Basics section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
 
Old March 12th, 2007, 10:29 AM
Authorized User
 
Join Date: Feb 2006
Posts: 18
Thanks: 0
Thanked 0 Times in 0 Posts
Default character set problem

We are getting ¿ stored into our Oracle 10g database that is using WE8ISO8859P1 character set.

Problem is caused by the following:

Microsoft released software (in particularly MS Word) before considering any ANSI or ISO standard (although they claimed so).
At that time of pioneering graphical interface - they were the standard. Since then things changed. Microsoft initially targeted US marked, however very soon they wanted to expanded to Europe. For that they needed to get standardized character set instead of one initially being in use. Microsoft re-mapped character set in newer applications using Windows-1252 character set which is compatible with ISO-8851-1 (we are using in our Java Web applications). That cleared obstacles to forward on European market, where extended characters are necessary (like in French, Dutch, German languages...)

What happened with the initial character codes before Microsoft agreed with ISO to standardize characters? Well - Nothing.

So what are the consequences of that?

If we are using Microsoft Word document in conjunction with one of the oldest character set (universe) the ice-age character mapping is still there. So when we "cut and paste" the content to another application, characters sets are not mapped anymore. Especially French language "is sensitive to this" The Microsoft character set in Word decimally coded 146 ( ' ) is very often used in French.

Therefore if the text is generated in MS Word using old character mapping (universe) which we do, and using method of "cutting and paste" we are mismatching character set interpreted by other, newer applications.

Initially in the MS words (apostrophe) ' had code 191; then later after implementing Windows 1252 character set being moved to code 146 in accordance to ISO. Well ISO threat character coded 191 as ¿. So if you are using MS Word universe character set, ' looks like ' but in newer applications or ISO compatible applications it looks like ¿.

So we scratched out heads and start thinking what would be the right solution for us...

write a patch in Java to correct character mapping and in addition to it to eliminate displaying control characters which would mess up "the look and feel" of content displayed. Does anyone know of a java class that already does this ?
Thanks

Brendon

 
Old March 15th, 2007, 12:16 PM
Authorized User
 
Join Date: Mar 2007
Posts: 24
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Check the String class's methods http://java.sun.com/javase/6/docs/ap...ng/String.html






Similar Threads
Thread Thread Starter Forum Replies Last Post
Same character set output by pressing any key.... anilsaroliya Visual Basic 2005 Basics 1 May 18th, 2007 11:20 AM
Character set muthukumar Oracle 0 November 8th, 2005 12:49 AM
IMP-00016: required character set conversion not saravananedu Oracle 1 September 7th, 2004 05:00 AM
How to set DOM character set sonicDace XML 0 May 27th, 2004 08:52 AM
XMLDOM default character set sonicDace XML 0 May 21st, 2004 03:09 PM





Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.