p2p.wrox.com Forums

p2p.wrox.com Forums (http://p2p.wrox.com/)
-   Perl (http://p2p.wrox.com/perl-196/)
-   -   UTF8 to XML Entity Conversion (http://p2p.wrox.com/perl/84634-utf8-xml-entity-conversion.html)

mnagaraj1983 August 8th, 2011 05:21 AM

UTF8 to XML Entity Conversion
 
Hi Experts,

How to convert the UTF8 characters to XML Unicode entities, Is there any possibility to do this with perl Scripting.

Please give me the solution/idea, waiting for your reply.

Regards,
Nagaraj

chorny August 9th, 2011 04:18 AM

Quote:

Originally Posted by mnagaraj1983 (Post 275298)
How to convert the UTF8 characters to XML Unicode entities, Is there any possibility to do this with perl Scripting.

Do you use any XML writing module? If yes, it may use such conversion automatically.

Otherwise, try HTML::Entities - I don't think that HTML and XML differ here.

mnagaraj1983 August 9th, 2011 10:38 AM

Hi,

Thanks for the suggestion.

I exporting the XML files from the InDesign applications, so that special characters are appeared as UTF 8 characters in XML, I want to convert the UTF8 character to XML unicode entities using the Perl (XML postProcess) script.

Need one more clarifications is the HTML::entities module will convert the UTF to XML name entity or unicode entities

Kindly give me any example for this.?

Regards,
Nagaraj

chorny August 9th, 2011 10:44 AM

Quote:

Originally Posted by mnagaraj1983 (Post 275342)
Need one more clarifications is the HTML::entities module will convert the UTF to XML name entity or unicode entities
any example for this.?

If I understand correctly, it will use numeric codes only if you ask it. See HTML::Entities documentation.

mnagaraj1983 August 10th, 2011 03:25 AM

Hi,

Thanks for the information, I try my best and let u know the result.

Regards,
Nagaraj

mnagaraj1983 August 11th, 2011 07:27 AM

Hi,

I had problem with UTF8 character to XML number entites replacements.

$mystring = "molecules consisting of a larger α chain, which is associated with a smaller β chain in a ";

$CurrentText = HTML::Entities::encode_numeric($mystring);

Output:

molecules consisting of a larger α chain, which is associated with a smaller β chain

RequiredOutput:

molecules consisting of a larger α chain, which is associated with a smaller β chain

Anyone help me how to solve this types of issues.

Regards,
Nagaraj

chorny August 11th, 2011 09:34 AM

Quote:

Originally Posted by mnagaraj1983 (Post 275412)

I had problem with UTF8 character to XML number entites replacements.

$mystring = "molecules consisting of a larger α chain, which is associated with a smaller β chain in a ";

$CurrentText = HTML::Entities::encode_numeric($mystring);

Output:

molecules consisting of a larger α chain, which is associated with a smaller β chain

RequiredOutput:

molecules consisting of a larger α chain, which is associated with a smaller β chain

You need to indicate that source of your program contains utf-8:
Code:

use utf8;
use HTML::Entities;
$mystring = "molecules consisting of a larger α chain, which is associated with a smaller β chain in a ";

$CurrentText = HTML::Entities::encode_numeric($mystring);
print $CurrentText;

If you receive that string from some other code, for ex. a CPAN module, it is more complex. You will need to see how this module works with utf-8.

mnagaraj1983 August 11th, 2011 10:48 PM

Hi Chorny,

exactly you told right man, my source of the program is not 100% utf8 characters. so that the problem was occured. if we are dont' use the utf8 in the program, the special characters are converted in to the equalent hex values. i.e &#CE;± etc.

you are right, i tested with some more special characters in my program.

and let you know.

Regards,
Nagaraj


All times are GMT -4. The time now is 02:15 AM.

Powered by vBulletin®
Copyright ©2000 - 2018, Jelsoft Enterprises Ltd.
2013 John Wiley & Sons, Inc.