Wrox Programmer Forums

Need to download code?

View our list of code downloads.

Go Back   Wrox Programmer Forums > Open Source > Perl
Password Reminder
Register
Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read
Welcome to the p2p.wrox.com Forums.

You are currently viewing the Perl section of the Wrox Programmer to Programmer discussions. This is a community of tens of thousands of software programmers and website developers including Wrox book authors and readers. As a guest, you can read any forum posting. By joining today you can post your own programming questions, respond to other developersí questions, and eliminate the ads that are displayed to guests. Registration is fast, simple and absolutely free .
DRM-free e-books 300x50
Reply
 
Thread Tools Display Modes
  #1 (permalink)  
Old August 8th, 2011, 06:21 AM
Registered User
Points: 37, Level: 1
Points: 37, Level: 1 Points: 37, Level: 1 Points: 37, Level: 1
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Jul 2011
Posts: 9
Thanks: 2
Thanked 0 Times in 0 Posts
Default UTF8 to XML Entity Conversion

Hi Experts,

How to convert the UTF8 characters to XML Unicode entities, Is there any possibility to do this with perl Scripting.

Please give me the solution/idea, waiting for your reply.

Regards,
Nagaraj
Reply With Quote
  #2 (permalink)  
Old August 9th, 2011, 05:18 AM
Authorized User
Points: 93, Level: 1
Points: 93, Level: 1 Points: 93, Level: 1 Points: 93, Level: 1
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Aug 2009
Location: Moldova
Posts: 23
Thanks: 0
Thanked 2 Times in 2 Posts
Default

Quote:
Originally Posted by mnagaraj1983 View Post
How to convert the UTF8 characters to XML Unicode entities, Is there any possibility to do this with perl Scripting.
Do you use any XML writing module? If yes, it may use such conversion automatically.

Otherwise, try HTML::Entities - I don't think that HTML and XML differ here.
Reply With Quote
  #3 (permalink)  
Old August 9th, 2011, 11:38 AM
Registered User
Points: 37, Level: 1
Points: 37, Level: 1 Points: 37, Level: 1 Points: 37, Level: 1
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Jul 2011
Posts: 9
Thanks: 2
Thanked 0 Times in 0 Posts
Default

Hi,

Thanks for the suggestion.

I exporting the XML files from the InDesign applications, so that special characters are appeared as UTF 8 characters in XML, I want to convert the UTF8 character to XML unicode entities using the Perl (XML postProcess) script.

Need one more clarifications is the HTML::entities module will convert the UTF to XML name entity or unicode entities

Kindly give me any example for this.?

Regards,
Nagaraj
Reply With Quote
  #4 (permalink)  
Old August 9th, 2011, 11:44 AM
Authorized User
Points: 93, Level: 1
Points: 93, Level: 1 Points: 93, Level: 1 Points: 93, Level: 1
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Aug 2009
Location: Moldova
Posts: 23
Thanks: 0
Thanked 2 Times in 2 Posts
Default

Quote:
Originally Posted by mnagaraj1983 View Post
Need one more clarifications is the HTML::entities module will convert the UTF to XML name entity or unicode entities
any example for this.?
If I understand correctly, it will use numeric codes only if you ask it. See HTML::Entities documentation.
Reply With Quote
  #5 (permalink)  
Old August 10th, 2011, 04:25 AM
Registered User
Points: 37, Level: 1
Points: 37, Level: 1 Points: 37, Level: 1 Points: 37, Level: 1
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Jul 2011
Posts: 9
Thanks: 2
Thanked 0 Times in 0 Posts
Default

Hi,

Thanks for the information, I try my best and let u know the result.

Regards,
Nagaraj
Reply With Quote
  #6 (permalink)  
Old August 11th, 2011, 08:27 AM
Registered User
Points: 37, Level: 1
Points: 37, Level: 1 Points: 37, Level: 1 Points: 37, Level: 1
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Jul 2011
Posts: 9
Thanks: 2
Thanked 0 Times in 0 Posts
Default

Hi,

I had problem with UTF8 character to XML number entites replacements.

$mystring = "molecules consisting of a larger α chain, which is associated with a smaller β chain in a ";

$CurrentText = HTML::Entities::encode_numeric($mystring);

Output:

molecules consisting of a larger α chain, which is associated with a smaller β chain

RequiredOutput:

molecules consisting of a larger α chain, which is associated with a smaller β chain

Anyone help me how to solve this types of issues.

Regards,
Nagaraj
Reply With Quote
  #7 (permalink)  
Old August 11th, 2011, 10:34 AM
Authorized User
Points: 93, Level: 1
Points: 93, Level: 1 Points: 93, Level: 1 Points: 93, Level: 1
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Aug 2009
Location: Moldova
Posts: 23
Thanks: 0
Thanked 2 Times in 2 Posts
Default

Quote:
Originally Posted by mnagaraj1983 View Post

I had problem with UTF8 character to XML number entites replacements.

$mystring = "molecules consisting of a larger α chain, which is associated with a smaller β chain in a ";

$CurrentText = HTML::Entities::encode_numeric($mystring);

Output:

molecules consisting of a larger α chain, which is associated with a smaller β chain

RequiredOutput:

molecules consisting of a larger α chain, which is associated with a smaller β chain
You need to indicate that source of your program contains utf-8:
Code:
use utf8;
use HTML::Entities;
$mystring = "molecules consisting of a larger α chain, which is associated with a smaller β chain in a ";

$CurrentText = HTML::Entities::encode_numeric($mystring);
print $CurrentText;
If you receive that string from some other code, for ex. a CPAN module, it is more complex. You will need to see how this module works with utf-8.
Reply With Quote
The Following User Says Thank You to chorny For This Useful Post:
mnagaraj1983 (August 11th, 2011)
  #8 (permalink)  
Old August 11th, 2011, 11:48 PM
Registered User
Points: 37, Level: 1
Points: 37, Level: 1 Points: 37, Level: 1 Points: 37, Level: 1
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Jul 2011
Posts: 9
Thanks: 2
Thanked 0 Times in 0 Posts
Default

Hi Chorny,

exactly you told right man, my source of the program is not 100% utf8 characters. so that the problem was occured. if we are dont' use the utf8 in the program, the special characters are converted in to the equalent hex values. i.e &#CE;± etc.

you are right, i tested with some more special characters in my program.

and let you know.

Regards,
Nagaraj
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Entity Conversion vengatatindia Word VBA 0 February 15th, 2008 05:52 AM
entity conversion orlyyefet XSLT 3 July 29th, 2007 04:14 PM
entity conversion (TidyCOM) Kabe XML 4 September 16th, 2005 08:55 AM
xml <!ENTITY..... anpham ASP.NET 1.0 and 1.1 Basics 0 June 28th, 2005 10:02 PM
xml parameter entity yengzhai XML 2 April 10th, 2005 01:54 PM



All times are GMT -4. The time now is 01:59 AM.


Powered by vBulletin®
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
© 2013 John Wiley & Sons, Inc.