Wrox Programmer Forums
Go Back   Wrox Programmer Forums > XML > XSLT
|
XSLT General questions and answers about XSLT. For issues strictly specific to the book XSLT 1.1 Programmers Reference, please post to that forum instead.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the XSLT section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
 
Old October 18th, 2011, 12:26 PM
Registered User
 
Join Date: Feb 2011
Posts: 5
Thanks: 0
Thanked 0 Times in 0 Posts
Default Handeling repeated, missing, and out of order elements in a transform

I can't solve some complications that are coming up in my XML to CSV conversion. I have looked at the existing threads on XML to CSV conversion, on this site and all over the web! But I still need help.

I am brand new to actual programing in all languages. However, I know XML and CSS, so I am the logical person to solve this problem at my non-profit (the only person available). I need to take some very messy XML and convert it to a very neat CSS file for upload to a website database. So far my code is so far off the mark, I am not even going to bother posting it. I have throughput, but it only does a quarter of what I need.

I don't really need a finished solution, but I need help with understanding the process I should follow to solve my problem in XSLT. I won't ask you all to code for me, just tell me the elements and template structure I need. I would also love if the community could explain the logic behind the process, so that I can modify it as needed.

I have xml that has records in all orders and numbers:
Code:
    <record-list>
    <record>
	<title>Title One</title
	<author>Author One</author>
	<subject>Subject One A
		Subject One B
		Subject One C</subject>
	<subject>Subject Two</subject>
	<subject>Subject Three</subject>
	<subject>Subject Four</subject>
    </record>
    <record>
	<subject>Subject Five</subject>
	<title>Title Two</title>
	<useless-element>Extra Stuff One</useless-element>
    </record><record>
	<title>Title Three</title>
	<subject>Subject Six</subject>
	<author/>
    </record>
    </record-list>
So I have multiple numbers of repeated elements, some missing elements, some empty elements, elements out of order, and some elements with extra line breaks.

I need a CSV file which reads as below, or with a different number of subject repeats (see requirements below)
Code:
    "Title","Subject","Subject","Subject","Author"
    "Title One","Subject One A ; Subject One B ; Subject One C","Subject Two","Subject Three","Author One"
    "Title Two", "Subject Five","","",""
    "Title Three","Subject Six","","",""
Requirements for the final output

-The number of columns of any repeated elements either needs to match the record with the most repeats of that element, or the program needs to chop off any repeats past a certain number.
-Each new record needs a line break and no other line breaks can exist in the files (only as record delimiters).
-The elements each need to be in the same order for each record.
-Each element text needs quotes around it (to handle intrinsic commas).
-Missing or empty elements need blank, comma surrounded quotes.
-Extra elements can't be sent through to the output


What I have done:

I have figured out how to get rid of the extra line breaks within the elements using the replace function. I can get the quotes, commas, and line breaks in the output with text elements and strip-space.

However, I don't know how to straighten out the order of the elements, handle the element repeats, or put through only some elements while still using the <record> element as the cue for the line-break.


Right now, I just need a solution that works, even if all sorts of manual manipulation or multiple style-sheets are required. I can even do a find and replace in a text editor, as long as the output is good. Please help with an XSLT solution, I don't even begin to know any other suitable programing languages (college matlab many years ago is not helping).

I think I need to run two transforms. I looked at the XSLT Cookbook, where two transforms are used sequentially for a similar problem. However, this solution is so generalized, I can't understand it. If I can't figure out how it works, I can't modify it for my needs. Sorry, but without a programming background, the explanations on this site, the web, and in the text are challenging at best. However, I think I am presenting a problem with some novel features, compared to others asked on this forum.

Any help, be it non-generalized code, or even just a suggested schematic procedure for multiple runs through my processor would be wonderful. I have been struggling with this for over a week and have made very little progress.

Thanks
CAMc
 
Old October 18th, 2011, 12:45 PM
Friend of Wrox
 
Join Date: Nov 2007
Posts: 1,243
Thanks: 0
Thanked 245 Times in 244 Posts
Default

As you mention the replace function I assume you want to solve that with XSLT 2.0.
However what I don't understand is why the "record" with "Title One" has four "subject" child elements, yet the CSV only seems to have three "subject columns". What determines the number of "subject" columns you want in the CSV?
Are the elements you want to map to columns in the CSV known i.e. do you simply want a solution for that particular XML document type with "title", "author" and "subject" elements?
__________________
Martin Honnen
Microsoft MVP (XML, Data Platform Development) 2005/04 - 2013/03
My blog
 
Old October 18th, 2011, 01:16 PM
Registered User
 
Join Date: Feb 2011
Posts: 5
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Quote:
Originally Posted by Martin Honnen View Post
As you mention the replace function I assume you want to solve that with XSLT 2.0.
However what I don't understand is why the "record" with "Title One" has four "subject" child elements, yet the CSV only seems to have three "subject columns". What determines the number of "subject" columns you want in the CSV?
Are the elements you want to map to columns in the CSV known i.e. do you simply want a solution for that particular XML document type with "title", "author" and "subject" elements?

Hi Martin,
Thanks for you interest!

Sorry I wasn't clear.

To answer the second question first, Yes, I know all the particular elements I want to map from the document to my CSV. The tag in the original will NOT necessarily be the same as the column header in the CSV. There will be other elements in the document I don't want to map, and some unique elements that will be mapped to repeated columns with the same title, but I know all the tags around the material I want in the original and how they relate to the new column headings.

For the repeated "subject" fields, I either need a stylesheet that truncates repeated elements (for example, only 10 subjects will be allowed in the final CSV) OR one which adjusts the CSV to have enough columns to fit whatever record has the largest number of repeats of an element. I do not know the largest number of repeated "subject" elements in the record set. I have tried the truncating option, b/c I think it might be simpler, but I can't get it to work.

thanks
-CAMc
 
Old October 18th, 2011, 01:55 PM
mhkay's Avatar
Wrox Author
 
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
Default

I think this is a significantly difficult problem even for an experienced coder. The difficulty, in fact, isn't in writing the code, it's in deciding what the code should do in all possible input situations. Part of that is defining exactly what is the range of inputs that it needs to handle.

Assuming I haven't misunderstood the requirement, here are some suggestions (using XSLT 2.0, which I would strongly recommend).

1. Determine the set of distinct column names

Code:
<xsl:variable name="names" select="distinct-values(/record-list/record/*/name())"/>
2. For each name in the list, replicate it to the maximum number of occurrences in any record
Code:
<xsl:variable name="columns" select="
  for $name in $names,
       $count in max(for $r in /record-list/record return count(*[name() = $name])),
       $i in 1 to $count 
  return $name"/>
3. Write the rows
Code:
<xsl:for-each select="/record-list/record">
  <xsl:variable name="this" select="."/>
  <xsl:for-each select="1 to count($columns)">
    <xsl:variable name="name" select="$columns[$i]"/>
    <xsl:variable name="index" select="count(subsequence($columns, 1, $i)[. = $name]))"/>
    <xsl:text>"</xsl:text>
    <xsl:value-of select="$this/*[name()=$name][$index])"/>
    <xsl:text>"</xsl:text>
    <xsl:value-of select="if (position()=last()) then &#xa; else ','"/>
  </xsl:for-each>
</xsl:for-each>
__________________
Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer\'s Reference
 
Old October 18th, 2011, 02:32 PM
Registered User
 
Join Date: Feb 2011
Posts: 5
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Thanks Michael,

I think I understand the structure of what you did. I will give it a try, play around with things and get back to you all with a report
-Much appreciated,
Christine





Similar Threads
Thread Thread Starter Forum Replies Last Post
Order XML elements using XSLT imshriram XSLT 9 July 10th, 2011 10:56 AM
display out of order html elements? chobo XSLT 2 April 1st, 2008 02:28 AM
XSLT: ONE template to transform all the elements Behl_Neha XSLT 8 December 15th, 2007 07:31 PM
order elements of list box Vince_421 Access VBA 2 April 28th, 2007 11:18 AM
HTML table with missing values (elements) go4java XSLT 6 July 10th, 2006 03:31 AM





Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.