Wrox Programmer Forums
Go Back   Wrox Programmer Forums > Open Source > Perl
|
Welcome to the p2p.wrox.com Forums.

You are currently viewing the Perl section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
 
Old August 8th, 2011, 05:21 AM
Registered User
 
Join Date: Jul 2011
Posts: 9
Thanks: 2
Thanked 0 Times in 0 Posts
Default UTF8 to XML Entity Conversion

Hi Experts,

How to convert the UTF8 characters to XML Unicode entities, Is there any possibility to do this with perl Scripting.

Please give me the solution/idea, waiting for your reply.

Regards,
Nagaraj
 
Old August 9th, 2011, 04:18 AM
Authorized User
 
Join Date: Aug 2009
Posts: 23
Thanks: 0
Thanked 2 Times in 2 Posts
Default

Quote:
Originally Posted by mnagaraj1983 View Post
How to convert the UTF8 characters to XML Unicode entities, Is there any possibility to do this with perl Scripting.
Do you use any XML writing module? If yes, it may use such conversion automatically.

Otherwise, try HTML::Entities - I don't think that HTML and XML differ here.
 
Old August 9th, 2011, 10:38 AM
Registered User
 
Join Date: Jul 2011
Posts: 9
Thanks: 2
Thanked 0 Times in 0 Posts
Default

Hi,

Thanks for the suggestion.

I exporting the XML files from the InDesign applications, so that special characters are appeared as UTF 8 characters in XML, I want to convert the UTF8 character to XML unicode entities using the Perl (XML postProcess) script.

Need one more clarifications is the HTML::entities module will convert the UTF to XML name entity or unicode entities

Kindly give me any example for this.?

Regards,
Nagaraj
 
Old August 9th, 2011, 10:44 AM
Authorized User
 
Join Date: Aug 2009
Posts: 23
Thanks: 0
Thanked 2 Times in 2 Posts
Default

Quote:
Originally Posted by mnagaraj1983 View Post
Need one more clarifications is the HTML::entities module will convert the UTF to XML name entity or unicode entities
any example for this.?
If I understand correctly, it will use numeric codes only if you ask it. See HTML::Entities documentation.
 
Old August 10th, 2011, 03:25 AM
Registered User
 
Join Date: Jul 2011
Posts: 9
Thanks: 2
Thanked 0 Times in 0 Posts
Default

Hi,

Thanks for the information, I try my best and let u know the result.

Regards,
Nagaraj
 
Old August 11th, 2011, 07:27 AM
Registered User
 
Join Date: Jul 2011
Posts: 9
Thanks: 2
Thanked 0 Times in 0 Posts
Default

Hi,

I had problem with UTF8 character to XML number entites replacements.

$mystring = "molecules consisting of a larger α chain, which is associated with a smaller β chain in a ";

$CurrentText = HTML::Entities::encode_numeric($mystring);

Output:

molecules consisting of a larger α chain, which is associated with a smaller β chain

RequiredOutput:

molecules consisting of a larger α chain, which is associated with a smaller β chain

Anyone help me how to solve this types of issues.

Regards,
Nagaraj
 
Old August 11th, 2011, 09:34 AM
Authorized User
 
Join Date: Aug 2009
Posts: 23
Thanks: 0
Thanked 2 Times in 2 Posts
Default

Quote:
Originally Posted by mnagaraj1983 View Post

I had problem with UTF8 character to XML number entites replacements.

$mystring = "molecules consisting of a larger α chain, which is associated with a smaller β chain in a ";

$CurrentText = HTML::Entities::encode_numeric($mystring);

Output:

molecules consisting of a larger α chain, which is associated with a smaller β chain

RequiredOutput:

molecules consisting of a larger α chain, which is associated with a smaller β chain
You need to indicate that source of your program contains utf-8:
Code:
use utf8;
use HTML::Entities;
$mystring = "molecules consisting of a larger α chain, which is associated with a smaller β chain in a ";

$CurrentText = HTML::Entities::encode_numeric($mystring);
print $CurrentText;
If you receive that string from some other code, for ex. a CPAN module, it is more complex. You will need to see how this module works with utf-8.
The Following User Says Thank You to chorny For This Useful Post:
mnagaraj1983 (August 11th, 2011)
 
Old August 11th, 2011, 10:48 PM
Registered User
 
Join Date: Jul 2011
Posts: 9
Thanks: 2
Thanked 0 Times in 0 Posts
Default

Hi Chorny,

exactly you told right man, my source of the program is not 100% utf8 characters. so that the problem was occured. if we are dont' use the utf8 in the program, the special characters are converted in to the equalent hex values. i.e &#CE;± etc.

you are right, i tested with some more special characters in my program.

and let you know.

Regards,
Nagaraj





Similar Threads
Thread Thread Starter Forum Replies Last Post
Entity Conversion vengatatindia Word VBA 0 February 15th, 2008 05:52 AM
entity conversion orlyyefet XSLT 3 July 29th, 2007 03:14 PM
entity conversion (TidyCOM) Kabe XML 4 September 16th, 2005 07:55 AM
xml <!ENTITY..... anpham ASP.NET 1.0 and 1.1 Basics 0 June 28th, 2005 09:02 PM
xml parameter entity yengzhai XML 2 April 10th, 2005 12:54 PM





Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.