Wrox Programmer Forums
Go Back   Wrox Programmer Forums > XML > XSLT
|
XSLT General questions and answers about XSLT. For issues strictly specific to the book XSLT 1.1 Programmers Reference, please post to that forum instead.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the XSLT section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
 
Old March 6th, 2006, 06:03 PM
Registered User
 
Join Date: Feb 2006
Posts: 4
Thanks: 0
Thanked 0 Times in 0 Posts
Default Difficult question about ampersands and the like

I have a rather complicated issue. I have a database application that allows users to fill out forms with data. The forms consist of a bunch of metadata, and some major chunks of text data.

The application generates XML when the user presses a button, and the XML is published to a webserver, where I run an XSLT transformation on the XML. This was all working great (in development), and then I brought up a user record and found a problem.

The user had entered an & symbol in the big text section, this screwed up the XML because it was expecting something like & not &... So the XSLT doesn't work either.

Turns out this was the tip of the iceberg. The users have also been pasting HTML into the big text field, and the output they are getting today from a homegrown perl "parser" is spitting that back out to look mostly like the html.

The problem with the html is with tags like <li> and <br> which aren't terminated.

Has anyone here had to work with a problem like this, and how do you overcome it? I can't just replace the & with and, because some of the & are in names like Jones&Smith plumbing, etc.. and its important for this application to keep names as is and not modify them.

Anyway, my team and I are in a conundrum, maybe we haven't had enough caffeine, but we can't seem to figure out a good working solution to this problem. Hoping you folks may have had a similar experience, and could point me down the right path.
 
Old March 6th, 2006, 06:09 PM
mhkay's Avatar
Wrox Author
 
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
Default

Your first priority is to plug the hole that lets dirty data into your database; the second priority is to clean the existing data.

The simplest way to plug the hole is to wrap the input data in a CDATA section. Note that this means users won't be able to enter HTML markup and have it interpreted as markup; instead, <li> will appear on the browser screen as "<li>". If you want to allow users to enter HTML markup, then your best bet might be to put it through the HTMLTidy program before storing it in the database.

This is also the way to clean your data. Take each record, process it using HTMLTidy, and then write it back.



Michael Kay
http://www.saxonica.com/
Author, XSLT Programmer's Reference and XPath 2.0 Programmer's Reference
 
Old March 6th, 2006, 07:07 PM
Registered User
 
Join Date: Feb 2006
Posts: 4
Thanks: 0
Thanked 0 Times in 0 Posts
Default

This is a great solution for the html tags... now I just have to find a way to do this from my oracle DB... still should be possible... but what about those pesky & symbols? If I tell my users they have to give that up they'll lose their minds. The database is quite large, and thousands and thousands of records have this problem.

I confirmed today that this was already screwing up their existing xml and rss by the way as well... but users are creatures of habit, and worse these users are fusing data from multiple sources and sometimes the source requires the data is left as it was received... so replacing the & with and is not an option.

One thought I had was to write a little program, JAVA or Perl to grab the XML, replace all the &s with & process the XSL to spit out the html file... Then if necessary, store the "un-altered" (read malformed) xml back. This isn't the most elegant way of doing it, but we're really desperate for a fix here.
 
Old March 7th, 2006, 04:30 AM
joefawcett's Avatar
Wrox Author
 
Join Date: Jun 2003
Posts: 3,074
Thanks: 1
Thanked 38 Times in 37 Posts
Default

The ampersand should be okay provided that you either wrap the elements text in a CDATA section or use the DOM to create the XML, in which case it will be escaped as &-a-m-p-;.

--

Joe (Microsoft MVP - XML)





Similar Threads
Thread Thread Starter Forum Replies Last Post
Very Difficult to Follow - Anyone Agree? projectedNexus BOOK: ASP.NET Website Programming Problem-Design-Solution 25 July 31st, 2006 04:39 PM
The difficult part k0023382 Access 1 October 8th, 2004 03:37 AM
Eric it's too difficult eureka BOOK: ASP.NET Website Programming Problem-Design-Solution 2 September 10th, 2004 03:01 AM
Simple and Difficult too sumit1228 SQL Language 1 February 4th, 2004 08:42 PM
Difficult sorting problem sunjammer XSLT 0 July 1st, 2003 11:34 PM





Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.