 |
Welcome to the p2p.wrox.com Forums.
You are currently viewing the Perl section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
|
|
|
|

August 8th, 2011, 05:21 AM
|
|
Registered User
|
|
Join Date: Jul 2011
Posts: 9
Thanks: 2
Thanked 0 Times in 0 Posts
|
|
UTF8 to XML Entity Conversion
Hi Experts,
How to convert the UTF8 characters to XML Unicode entities, Is there any possibility to do this with perl Scripting.
Please give me the solution/idea, waiting for your reply.
Regards,
Nagaraj
|
|

August 9th, 2011, 04:18 AM
|
|
Authorized User
|
|
Join Date: Aug 2009
Posts: 23
Thanks: 0
Thanked 2 Times in 2 Posts
|
|
Quote:
Originally Posted by mnagaraj1983
How to convert the UTF8 characters to XML Unicode entities, Is there any possibility to do this with perl Scripting.
|
Do you use any XML writing module? If yes, it may use such conversion automatically.
Otherwise, try HTML::Entities - I don't think that HTML and XML differ here.
|
|

August 9th, 2011, 10:38 AM
|
|
Registered User
|
|
Join Date: Jul 2011
Posts: 9
Thanks: 2
Thanked 0 Times in 0 Posts
|
|
Hi,
Thanks for the suggestion.
I exporting the XML files from the InDesign applications, so that special characters are appeared as UTF 8 characters in XML, I want to convert the UTF8 character to XML unicode entities using the Perl (XML postProcess) script.
Need one more clarifications is the HTML::entities module will convert the UTF to XML name entity or unicode entities
Kindly give me any example for this.?
Regards,
Nagaraj
|
|

August 9th, 2011, 10:44 AM
|
|
Authorized User
|
|
Join Date: Aug 2009
Posts: 23
Thanks: 0
Thanked 2 Times in 2 Posts
|
|
Quote:
Originally Posted by mnagaraj1983
Need one more clarifications is the HTML::entities module will convert the UTF to XML name entity or unicode entities
any example for this.?
|
If I understand correctly, it will use numeric codes only if you ask it. See HTML::Entities documentation.
|
|

August 10th, 2011, 03:25 AM
|
|
Registered User
|
|
Join Date: Jul 2011
Posts: 9
Thanks: 2
Thanked 0 Times in 0 Posts
|
|
Hi,
Thanks for the information, I try my best and let u know the result.
Regards,
Nagaraj
|
|

August 11th, 2011, 07:27 AM
|
|
Registered User
|
|
Join Date: Jul 2011
Posts: 9
Thanks: 2
Thanked 0 Times in 0 Posts
|
|
Hi,
I had problem with UTF8 character to XML number entites replacements.
$mystring = "molecules consisting of a larger α chain, which is associated with a smaller β chain in a ";
$CurrentText = HTML::Entities::encode_numeric($mystring);
Output:
molecules consisting of a larger α chain, which is associated with a smaller β chain
RequiredOutput:
molecules consisting of a larger α chain, which is associated with a smaller β chain
Anyone help me how to solve this types of issues.
Regards,
Nagaraj
|
|

August 11th, 2011, 09:34 AM
|
|
Authorized User
|
|
Join Date: Aug 2009
Posts: 23
Thanks: 0
Thanked 2 Times in 2 Posts
|
|
Quote:
Originally Posted by mnagaraj1983
I had problem with UTF8 character to XML number entites replacements.
$mystring = "molecules consisting of a larger α chain, which is associated with a smaller β chain in a ";
$CurrentText = HTML::Entities::encode_numeric($mystring);
Output:
molecules consisting of a larger α chain, which is associated with a smaller β chain
RequiredOutput:
molecules consisting of a larger α chain, which is associated with a smaller β chain
|
You need to indicate that source of your program contains utf-8:
Code:
use utf8;
use HTML::Entities;
$mystring = "molecules consisting of a larger α chain, which is associated with a smaller β chain in a ";
$CurrentText = HTML::Entities::encode_numeric($mystring);
print $CurrentText;
If you receive that string from some other code, for ex. a CPAN module, it is more complex. You will need to see how this module works with utf-8.
|
|
The Following User Says Thank You to chorny For This Useful Post:
|
|
|

August 11th, 2011, 10:48 PM
|
|
Registered User
|
|
Join Date: Jul 2011
Posts: 9
Thanks: 2
Thanked 0 Times in 0 Posts
|
|
Hi Chorny,
exactly you told right man, my source of the program is not 100% utf8 characters. so that the problem was occured. if we are dont' use the utf8 in the program, the special characters are converted in to the equalent hex values. i.e &#CE;± etc.
you are right, i tested with some more special characters in my program.
and let you know.
Regards,
Nagaraj
|
|
 |