Wrox Programmer Forums
Go Back   Wrox Programmer Forums > XML > XSLT
XSLT General questions and answers about XSLT. For issues strictly specific to the book XSLT 1.1 Programmers Reference, please post to that forum instead.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the XSLT section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
Old October 10th, 2012, 08:49 AM
Authorized User
Join Date: Aug 2012
Posts: 10
Thanks: 2
Thanked 0 Times in 0 Posts
Default Extracting multiple elements from multiple files


I have a requirement to extract the contents, including attributes, of 3 separate elements in approximately 3000 SGML files.

To explain further: Here is an example file

<!DOCTYPE DMODULE PUBLIC "-//AECMA Change 6 Legacy//DTD Air Vehicle Engines Equipment Description 19981030//EN">
<dmtitle><techname> SUMMARY OF DATA AND LIST OF REFERENCES </techname>
<infoname>Fig 1 Sonar Type 2093 - Location of Major Units</infoname>
<issno issno="004" type="changed"></dmaddres>
<issdate year="2008" month="03" day="22">
<security class="2">
<rpc> </rpc>
<orig> </orig>
<authblk>Cat 1A Chap 1</authblk>
<tpbase>BR 8412(1A)</tpbase>
<firstver type="tabtop"></qa>
<rfu>Amendment Issue 2</rfu>
<remarks>Stage 2</remarks>
<figure id="f0011">
<title>Fig 1 Sonar Type 2093 - Location of Major Units</title>
<graphic boardno="00110001.tif"></figure>
What i need to extract, is everything contained within the 'DMC' element near the top, including the contents of its child elements. Also, I need the 'id' attribute of the 'figure' element so that i capture the f0011 information (in this instance). And i also need to extract the 'boardnumber' attribute of the 'graphic' element so i can get the .tiff file names.

As i say, i need to do this to approximately 3000 files which are currently in SGML (see example above). I'm assuming i would first have to convert these files to XML? I'm also assuming this is straightforward enough - perhaps naively.

The biggest problem is then the XSLT part. What i ultimately want is a nice list at the end, ideally Excel but just a list is fine, perhaps with 3 columns: DMC, figure id, and graphic boardnumber, obviously populated with the data extracted from the 3000 or so files.

Is this possible?

Any solutions or tips would be most grateful. I'm even willing to offload this task and pay a fee to have this work done, as it could save us considerable time in manually creating an Excel spreadsheet with this data. It is quite an urgent task though.

Old October 10th, 2012, 10:13 AM
Friend of Wrox
Join Date: Nov 2007
Posts: 1,243
Thanks: 0
Thanked 245 Times in 244 Posts

Conversion of SGML to XML can be done with tools like sx, available at http://www.jclark.com/sp/index.htm. Processing of multiple files with one XSLT stylesheet can be done with XSLT 2.0 and Saxon 9 and the collection function.
Outputting in a file format for Excel can be achieved by transforming to Excel's XML format.
As for concrete code, it is not clear to me what you want to put into the result, in particular for the dmc element, as that has complex contents with other elements.
So you will need to elaborate on the result format.
Martin Honnen
Microsoft MVP (XML, Data Platform Development) 2005/04 - 2013/03
My blog
Old October 10th, 2012, 10:31 AM
Authorized User
Join Date: Aug 2012
Posts: 10
Thanks: 2
Thanked 0 Times in 0 Posts

Hello Martin,

Thanks for your response.

What i need from the DMC regarding output, is all of the text contained within the child elements. So that i end up with the DMC codes displayed in a list, with the other two pieces of information related to that file also displayed.

Ultimately i want to see something like this for each of the 3000 or so files:

Ideally these three items of data displayed in columns - One for DMC, one for figure number, and one for the .tiff file name.

Any further pointers would be great. This is well out of my comfort zone.

Similar Threads
Thread Thread Starter Forum Replies Last Post
Checking an Element Against Multiple Elements ritagr XSLT 8 May 26th, 2011 07:18 PM
Mirge Multiple nodes with sons elements Ahly_Bayern XSLT 1 March 24th, 2011 07:40 AM
Insert multiple elements based on values of others fixit XSLT 14 September 25th, 2009 05:36 AM
Selecting multiple elements from the source XML mkansal XSLT 15 June 18th, 2007 09:40 AM
Importing Multiple files in Multiple tables Versi Suomi Access 6 June 1st, 2005 08:47 AM

Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.