Wrox Programmer Forums

Need to download code?

View our list of code downloads.

Go Back   Wrox Programmer Forums > XML > XSLT
Password Reminder
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read
XSLT General questions and answers about XSLT. For issues strictly specific to the book XSLT 1.1 Programmers Reference, please post to that forum instead.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the XSLT section of the Wrox Programmer to Programmer discussions. This is a community of tens of thousands of software programmers and website developers including Wrox book authors and readers. As a guest, you can read any forum posting. By joining today you can post your own programming questions, respond to other developers’ questions, and eliminate the ads that are displayed to guests. Registration is fast, simple and absolutely free .
DRM-free e-books 300x50
Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old March 6th, 2006, 06:03 PM
Registered User
Join Date: Feb 2006
Location: Colorado Springs, CO, .
Posts: 4
Thanks: 0
Thanked 0 Times in 0 Posts
Default Difficult question about ampersands and the like

I have a rather complicated issue. I have a database application that allows users to fill out forms with data. The forms consist of a bunch of metadata, and some major chunks of text data.

The application generates XML when the user presses a button, and the XML is published to a webserver, where I run an XSLT transformation on the XML. This was all working great (in development), and then I brought up a user record and found a problem.

The user had entered an & symbol in the big text section, this screwed up the XML because it was expecting something like & not &... So the XSLT doesn't work either.

Turns out this was the tip of the iceberg. The users have also been pasting HTML into the big text field, and the output they are getting today from a homegrown perl "parser" is spitting that back out to look mostly like the html.

The problem with the html is with tags like <li> and <br> which aren't terminated.

Has anyone here had to work with a problem like this, and how do you overcome it? I can't just replace the & with and, because some of the & are in names like Jones&Smith plumbing, etc.. and its important for this application to keep names as is and not modify them.

Anyway, my team and I are in a conundrum, maybe we haven't had enough caffeine, but we can't seem to figure out a good working solution to this problem. Hoping you folks may have had a similar experience, and could point me down the right path.
Reply With Quote
  #2 (permalink)  
Old March 6th, 2006, 06:09 PM
mhkay's Avatar
Wrox Author
Points: 18,487, Level: 59
Points: 18,487, Level: 59 Points: 18,487, Level: 59 Points: 18,487, Level: 59
Activity: 100%
Activity: 100% Activity: 100% Activity: 100%
Join Date: Apr 2004
Location: Reading, Berks, United Kingdom.
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts

Your first priority is to plug the hole that lets dirty data into your database; the second priority is to clean the existing data.

The simplest way to plug the hole is to wrap the input data in a CDATA section. Note that this means users won't be able to enter HTML markup and have it interpreted as markup; instead, <li> will appear on the browser screen as "<li>". If you want to allow users to enter HTML markup, then your best bet might be to put it through the HTMLTidy program before storing it in the database.

This is also the way to clean your data. Take each record, process it using HTMLTidy, and then write it back.

Michael Kay
Author, XSLT Programmer's Reference and XPath 2.0 Programmer's Reference
Reply With Quote
  #3 (permalink)  
Old March 6th, 2006, 07:07 PM
Registered User
Join Date: Feb 2006
Location: Colorado Springs, CO, .
Posts: 4
Thanks: 0
Thanked 0 Times in 0 Posts

This is a great solution for the html tags... now I just have to find a way to do this from my oracle DB... still should be possible... but what about those pesky & symbols? If I tell my users they have to give that up they'll lose their minds. The database is quite large, and thousands and thousands of records have this problem.

I confirmed today that this was already screwing up their existing xml and rss by the way as well... but users are creatures of habit, and worse these users are fusing data from multiple sources and sometimes the source requires the data is left as it was received... so replacing the & with and is not an option.

One thought I had was to write a little program, JAVA or Perl to grab the XML, replace all the &s with & process the XSL to spit out the html file... Then if necessary, store the "un-altered" (read malformed) xml back. This isn't the most elegant way of doing it, but we're really desperate for a fix here.
Reply With Quote
  #4 (permalink)  
Old March 7th, 2006, 04:30 AM
joefawcett's Avatar
Wrox Author
Points: 9,763, Level: 42
Points: 9,763, Level: 42 Points: 9,763, Level: 42 Points: 9,763, Level: 42
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
Join Date: Jun 2003
Location: Exeter, , United Kingdom.
Posts: 3,074
Thanks: 1
Thanked 38 Times in 37 Posts

The ampersand should be okay provided that you either wrap the elements text in a CDATA section or use the DOM to create the XML, in which case it will be escaped as &-a-m-p-;.


Joe (Microsoft MVP - XML)
Reply With Quote

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Very Difficult to Follow - Anyone Agree? projectedNexus BOOK: ASP.NET Website Programming Problem-Design-Solution 25 July 31st, 2006 04:39 PM
The difficult part k0023382 Access 1 October 8th, 2004 03:37 AM
Eric it's too difficult eureka BOOK: ASP.NET Website Programming Problem-Design-Solution 2 September 10th, 2004 03:01 AM
Simple and Difficult too sumit1228 SQL Language 1 February 4th, 2004 08:42 PM
Difficult sorting problem sunjammer XSLT 0 July 1st, 2003 11:34 PM

All times are GMT -4. The time now is 11:28 PM.

Powered by vBulletin®
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
© 2013 John Wiley & Sons, Inc.