Subject: Matching between two sibling nodes
Posted By: AForgue Post Date: 11/25/2003 3:23:58 PM
I am in the process of creating a transformation for OpenOffice.org's native XML files and have run into a hitch. Because of the nature of the OOo documents, I cannot have nested paragraph styles within other paragraph styles. What I need is a template that will match everything between the "BeginIntroduction" and "EndIntroduction" nodes in the XML below, and wrap it in <introduction></introduction> tags.

<office:body>
   <text:p text:style-name="Standard">
      <text:span text:style-name="ChapterNumber">1</text:span>
   </text:p>
   <text:p text:style-name="Standard">
      <text:span text:style-name="ChapterTitle">Title</text:span>
   </text:p>

   <text:p text:style-name="BeginIntroduction">Begin Introduction</text:p>
   <text:p text:style-name="InlineHeading">inline heading</text:p>
   <text:p text:style-name="Normal">text</text:p>
   <text:p text:style-name="Normal">text</text:p>
   <text:p text:style-name="Normal">text</text:p>
   <text:p text:style-name="Normal">text</text:p>
   <text:p text:style-name="EndIntroduction">End Introduction</text:p>

   <text:p text:style-name="InlineHeading">inline heading</text:p>
   <text:p text:style-name="Normal">text</text:p>
   <text:p text:style-name="Normal">text</text:p>
   <text:p text:style-name="Normal">text</text:p>
   <text:p text:style-name="Normal">text</text:p>
</office:body>


The ideal transformation should look like this:

<root>
   <ChapterNumber>1</ChapterNumber>
   <ChapterTitle>title</ChapterTitle>
   <Introduction>
      <InlineHeading>heading</InlineHeading>
      <Paragraph>text</Paragraph>
      <Paragraph>text</Paragraph>
      <Paragraph>text</Paragraph>
      <Paragraph>text</Paragraph>
   </Introduction>
   <InlineHeading>heading</InlineHeading>
   <Paragraph>text</Paragraph>
   <Paragraph>text</Paragraph>
   <Paragraph>text</Paragraph>
   <Paragraph>text</Paragraph>
</root>


Is this going to be possible? Or do I need to consider some different options? Also, it is important for me to note that I actually do have control over the "BeginIntroduction" and "EndIntroduction" tags. I can change the name of them, but they have to be siblings of the paragraphs.

So, for example, instead of:
<text:p text:style-name="BeginIntroduction">Begin Introduction</text:p>
...
<text:p text:style-name="EndIntroduction">End Introduction</text:p>


I could make it:
<text:p text:style-name="Introduction">Introduction</text:p>
...
<text:p text:style-name="Introduction">Introduction</text:p>


That is the extent of the control I have.

Thanks in advance for any advice!!!

Aaron


Reply By: armmarti Reply Date: 11/26/2003 1:57:36 AM
I've added namespace declarations:

<office:body xmlns:office="uri-for-office-here" xmlns:text="uri-for-text-here">


So, the stylesheet is:


<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
 xmlns:office="uri-for-office-here" xmlns:text="uri-for-text-here"
  exclude-result-prefixes="office text">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    
    <xsl:template match="/">
        <root>
            <xsl:apply-templates select="office:body/text:p/text:span"/>
            
            <Introduction>
                <xsl:apply-templates select="office:body/text:p[preceding-sibling::text:p[@text:style-name='BeginIntroduction'] and following-sibling::text:p[@text:style-name='EndIntroduction']]"/>
            </Introduction>
            
            <xsl:apply-templates select="office:body/text:p[preceding-sibling::text:p[@text:style-name='EndIntroduction']]"/>
        </root>
    </xsl:template>
    
    <xsl:template match="office:body/text:p/text:span">
        <xsl:element name="{@text:style-name}">
            <xsl:value-of select="."/>
        </xsl:element>
    </xsl:template>
    
    <xsl:template match="office:body/text:p[@text:style-name='Normal']">
        <Paragraph>
            <xsl:value-of select="."/>
        </Paragraph>
    </xsl:template>
    
    <xsl:template match="office:body/text:p[@text:style-name='InlineHeading']">
         <InlineHeading>
             <xsl:value-of select="."/>
         </InlineHeading>
    </xsl:template>
</xsl:stylesheet>


This stylesheet relies on the positional structure of the source XML doc in some places, so the code is somehow awkward :)
You can chnage some points in the stylesheet if you want; you'll find the proper places, I'm sure ;)

Regards,
Armen
Reply By: AForgue Reply Date: 11/26/2003 9:02:39 AM
Armen, this worked very nicely! Thanks for the help.

Just for education's sake, you mentioned that this relies on the position of the XML structure. I can see what you mean by this in that if the Introduction was in any other place in the document this would produce strange results.

Going off of your example, I am wondering if it would be possible to match anything where any preceding-sibling != 'BeginIntroduction' AND any following-sibling != 'EndIntroduction'. If I am thinking about this correctly, this should match everything that does not appear between 'BeginIntro' and 'EndIntro'. After calling that, the next step would be to match everything between them.

So for example:

<xsl:template match="office:body">
   <root>
      <xsl:apply-templates select="*[preceding-sibling::node()[@text:style-name != 'BeginIntroduction'] and following-sibling::node()[@text:style-name != 'EndIntroduction']]"/>
   </root>
</xsl:template>


Although this sounds great in my mind, it is producing the wrong result set. I am wondering if my select statement is wrong in some way.

-Aaron

Reply By: armmarti Reply Date: 11/26/2003 9:45:15 AM
quote:

Going off of your example, I am wondering if it would be possible to match anything where any preceding-sibling != 'BeginIntroduction' AND any following-sibling != 'EndIntroduction'. If I am thinking about this correctly, this should match everything that does not appear between 'BeginIntro' and 'EndIntro'. After calling that, the next step would be to match everything between them.

So for example:

<xsl:template match="office:body">
   <root>
      <xsl:apply-templates select="*[preceding-sibling::node()[@text:style-name != 'BeginIntroduction'] and following-sibling::node()[@text:style-name != 'EndIntroduction']]"/>
   </root>
</xsl:template>


Although this sounds great in my mind, it is producing the wrong result set. I am wondering if my select statement is wrong in some way.

-Aaron





Your XPath expression produces wrong result because:
not(A and B) IS EQUIVALENT TO not(A) or not(B)


So just negate:


<xsl:apply-templates select="office:body/text:p[not(preceding-sibling::text:p[@text:style-name='BeginIntroduction'] and following-sibling::text:p[@text:style-name='EndIntroduction'])]"/>




Regards,
Armen
Reply By: AForgue Reply Date: 11/26/2003 9:52:37 AM
Heh, I read that last post just as I was getting ready to post my answer, which turned out to be exactly what you said. I just negated the whole thing.

<xsl:apply-templates select="node()[not(preceding-sibling::node()[@text:style-name = 'BeginIntroduction'] and following-sibling::node()[@text:style-name = 'EndIntroduction'])]"/>


Thanks again for your help. Good to know that there are knowledgable people out there willing to help out the not-so-knowledgable!

-Aaron

Reply By: armmarti Reply Date: 11/26/2003 10:05:44 AM
It has no connection with XSLT or programming; it's just a mathematical truth! ;)

Go to topic 6406

Return to index page 998
Return to index page 997
Return to index page 996
Return to index page 995
Return to index page 994
Return to index page 993
Return to index page 992
Return to index page 991
Return to index page 990
Return to index page 989