Grouping within arbitrary document xml
Hi
I have some document-style xml which has an arbitrary structure. The problem is to create a group element around specific clusters of elements.
For example, say I'd like to group dog, cat and bird elements inside a new pets element but those elements arent always in the same location in the xml, such as the following case (sorry the indentation seems to get lost here):
<code>
<section>
<section>
<dog></dog>
<cat></cat>
<bird></bird>
<cow></cow>
</section>
<para>
<dog></dog>
<cat></cat>
<bird></bird>
<sub-para>
<cat></cat>
<bird></bird>
</sub-para>
</para>
</section>
</code>
How do I group such that a dog is always the start of a pets element and any following cat or bird (in any order or repetition) are included. Other elements (e.g. cow) are not to be included and stop the grouping. Critically cat or bird elements NOT preceded by a dog do not form a group (as in sub-para/cat above.
Here is the desired output:
<code>
<section>
<section>
<pets>
<dog></dog>
<cat></cat>
<bird></bird>
</pets>
<cow></cow>
</section>
<para>
<pets>
<dog></dog>
<cat></cat>
<bird></bird>
</pets>
<sub-para>
<cat></cat>
<bird></bird>
</sub-para>
</para>
</section>
</code>
With a flatter, more database style xml, this grouping might be easier. What complicates this problem (to my current knowledge of xslt 2) is the need to parse through the xml as a document rather than just grabbing and reordering nodes from a 'data' file. Second, the arbitrary document structure (adjacent dog, cat and bird elements could occur anywhere in the document) means there's no obvious parent element from which to apply a for-each-group and finally, the fact that the focus elements arent expressed in regular order.
If anyone can help, I'd appreciate it. One solution I've considered is doing 2 passes and tagging candidate elements before returning in
a second pass that groups consecutive elements that share the same tag. But I'm not sure where to issue a group-adjacent instruction from in that case.
|