I can't solve some complications that are coming up in my XML to CSV conversion. I have looked at the existing threads on XML to CSV conversion, on this site and all over the web! But I still need help.
I am brand new to actual programing in all languages. However, I know XML and CSS, so I am the logical person to solve this problem at my non-profit (the only person available). I need to take some very messy XML and convert it to a very neat CSS file for upload to a website database. So far my code is so far off the mark, I am not even going to bother posting it. I have throughput, but it only does a quarter of what I need.
I don't really need a finished solution, but I need help with understanding the process I should follow to solve my problem in XSLT. I won't ask you all to code for me, just tell me the elements and template structure I need. I would also love if the community could explain the logic behind the process, so that I can modify it as needed.
I have xml that has records in all orders and numbers:
Code:
<record-list>
<record>
<title>Title One</title
<author>Author One</author>
<subject>Subject One A
Subject One B
Subject One C</subject>
<subject>Subject Two</subject>
<subject>Subject Three</subject>
<subject>Subject Four</subject>
</record>
<record>
<subject>Subject Five</subject>
<title>Title Two</title>
<useless-element>Extra Stuff One</useless-element>
</record><record>
<title>Title Three</title>
<subject>Subject Six</subject>
<author/>
</record>
</record-list>
So I have multiple numbers of repeated elements, some missing elements, some empty elements, elements out of order, and some elements with extra line breaks.
I need a CSV file which reads as below, or with a different number of subject repeats (see requirements below)
Code:
"Title","Subject","Subject","Subject","Author"
"Title One","Subject One A ; Subject One B ; Subject One C","Subject Two","Subject Three","Author One"
"Title Two", "Subject Five","","",""
"Title Three","Subject Six","","",""
Requirements for the final output
-The number of columns of any repeated elements either needs to match the record with the most repeats of that element, or the program needs to chop off any repeats past a certain number.
-Each new record needs a line break and no other line breaks can exist in the files (only as record delimiters).
-The elements each need to be in the same order for each record.
-Each element text needs quotes around it (to handle intrinsic commas).
-Missing or empty elements need blank, comma surrounded quotes.
-Extra elements can't be sent through to the output
What I have done:
I have figured out how to get rid of the extra line breaks within the elements using the replace function. I can get the quotes, commas, and line breaks in the output with text elements and strip-space.
However, I don't know how to straighten out the order of the elements, handle the element repeats, or put through only some elements while still using the <record> element as the cue for the line-break.
Right now, I just need a solution that works, even if all sorts of manual manipulation or multiple style-sheets are required. I can even do a find and replace in a text editor, as long as the output is good. Please help with an XSLT solution, I don't even begin to know any other suitable programing languages (college matlab many years ago is not helping).
I think I need to run two transforms. I looked at the XSLT Cookbook, where two transforms are used sequentially for a similar problem. However, this solution is so generalized, I can't understand it. If I can't figure out how it works, I can't modify it for my needs. Sorry, but without a programming background, the explanations on this site, the web, and in the text are challenging at best. However, I think I am presenting a problem with some novel features, compared to others asked on this forum.
Any help, be it non-generalized code, or even just a suggested schematic procedure for multiple runs through my processor would be wonderful. I have been struggling with this for over a week and have made very little progress.
Thanks
CAMc