Wrox Programmer Forums
Go Back   Wrox Programmer Forums > XML > XSLT
|
XSLT General questions and answers about XSLT. For issues strictly specific to the book XSLT 1.1 Programmers Reference, please post to that forum instead.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the XSLT section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
 
Old August 25th, 2009, 04:59 AM
Friend of Wrox
 
Join Date: Feb 2009
Posts: 119
Thanks: 25
Thanked 3 Times in 3 Posts
Default Tokenize

Hello,

I want to use the tokenize function and for each string in the resulting sequence create an element. How can I get the value of the matched regular expression to use as a attribute of this element?

Regards,

John
 
Old August 25th, 2009, 05:02 AM
mhkay's Avatar
Wrox Author
 
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
Default

The tokenize() function doesn't tell you anything about what separators were found, or in what way they matched the regular expression. If you need that information, you need to use xsl:analyze-string.
__________________
Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer\'s Reference
 
Old August 25th, 2009, 05:16 AM
Friend of Wrox
 
Join Date: Feb 2009
Posts: 119
Thanks: 25
Thanked 3 Times in 3 Posts
Default

But the analyze string won't do what I am after as all the matches of the regex go into the matching-substring part. I can create the elements in the non-matching part but the attributes that are needed will be in the matching -part?

So what do i do?
 
Old August 25th, 2009, 05:27 AM
mhkay's Avatar
Wrox Author
 
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
Default

>So what do i do?

Start by explaining the requirement. What's the input, what's the desired output?
__________________
Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer\'s Reference
 
Old August 25th, 2009, 05:38 AM
Friend of Wrox
 
Join Date: Feb 2009
Posts: 119
Thanks: 25
Thanked 3 Times in 3 Posts
Default

This is the input:

http://www.sec.gov/Archives/edgar/da...66204e10vk.htm

You may have seen this before.

Now my boss told me to parse this thing, replacing all the < with [[ and > with ]] so that there was one wrapper root xml element and contents all basically text.

Then the task is to split the document up according to sections for each item

Starting with:
<root>
<preamble>text herr</preamble>
<tableofcontents>table of contents data here</tableofcontents>
<section label="item1">text here of item one</section>
<section label="item 1A">etc</section>
....
</root>

So I was going to tokenize the text base on the "item number" as the regex then loop through the resulting sequence and build sections elements but then I can't get the label elements.

Is there a better way to do this?

Regards,

John.
 
Old August 25th, 2009, 05:51 AM
mhkay's Avatar
Wrox Author
 
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
Default

There's a mythical character in XML folklore known as the Desperate Perl Hacker or DPH. He's known for attempting amazing feats of transformation using regular expressions as his only weapon. I'm not sure I've ever met one before (I thought they were mythical), but this comes close.

I don't think this is the right design approach. You want to create a tree representation of the structure, and then do a transformation. Certainly if you're going down the pure regex route then you're using the wrong language - you'd be much better off with Perl.
__________________
Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer\'s Reference
 
Old August 25th, 2009, 06:00 AM
Friend of Wrox
 
Join Date: Feb 2009
Posts: 119
Thanks: 25
Thanked 3 Times in 3 Posts
Default

Your right on the money. He is a perl programmer. LOL.

So how would I go about it do the tree structure transformation?
 
Old August 25th, 2009, 07:34 AM
Friend of Wrox
 
Join Date: Nov 2007
Posts: 1,243
Thanks: 0
Thanked 245 Times in 244 Posts
Default

Tree structure transformation means you parse the HTML document you have with a parser that allows you to create a tree suitable as an input tree for an XSLT transformation. So with XSLT 2.0 you can use the HTML parser implementation done by David Carlisle in pure XSLT 2.0 or if you use the Java version of Saxon you can plug in the TagSoup parser from http://home.ccil.org/~cowan/XML/tagsoup/. Or you can use the HTML Tidy tool to transform that HTML you have to XHTML, then you can feed that XHTML document to any XSLT processor.
__________________
Martin Honnen
Microsoft MVP (XML, Data Platform Development) 2005/04 - 2013/03
My blog





Similar Threads
Thread Thread Starter Forum Replies Last Post
tokenize function use required - please help jamesdurham XSLT 5 April 20th, 2009 11:56 AM
tokenize bbvic XSLT 1 July 19th, 2007 04:10 PM
<xsl:for-each select="tokenize($indoc,'&#xA;')"> kapy_kal XSLT 4 June 9th, 2006 07:33 AM
tokenize sakura C# 1 December 3rd, 2005 10:43 AM
Functions replace and tokenize not found. spencer.clark XSLT 2 July 20th, 2005 02:51 PM





Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.