If you can ensure the duplicate nodes are always consecutive, then the simplest way to do this is build upon the XSTL Identity Transform an just have an extra template to strip out the templates like so
<xsl:template
match="*[not(*)]
[name() = preceding-sibling::*[1]/name()]
[@value = preceding-sibling::*[1]/@value]" />
This matches any child element, and ignores it if it has the same name and value as the previous element. There is no need to hard-code an element name anywhere in this case.
Here is the full XSLT in this case
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" />
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[not(*)][name() = preceding-sibling::*[1]/name()][@value = preceding-sibling::*[1]/@value]" />
</xsl:stylesheet>
However this would fail if your XML looked like this, and your duplicate nodes were consecutive
<Samsung>
<name value="galaxy"/>
<name value="galaxys"/>
<id value="123"/>
<name value="galaxy"/>
<id value="123"/>
<name2 value="galaxy"/>
</Samsung>
You could fix this by changing the template to check back all previous nodes
<xsl:template match="*[not(*)]
[name() = preceding-sibling::*/name()]
[@value = preceding-sibling::*/@value]" />
However, this starts to become inefficient with large numbers of elements. If you have hundred of elements, then each precedinig-sibling check will repeatedly involve checking hundred of elements (i.e the 100th element has to check 99 preceding ones, the 101th element checks 100 ones, etc).
A more efficient method (in XSLT1.0) is to use a technique called Muenchian Grouping. It is certainly something worth learning about if you use XSLT a lot.
First you define a key to 'group' your elements. In this case, your are looking for distinct elements defined by their parent, element name, and value
<xsl:key name="duplicate" match="*[not(*)]" use="concat(generate-id(..), '|', name(), '|', @value)" />
Then to ignore the duplicates, you match any element that doesn't occur in the first position in the key for the given 'lookup' value
<xsl:template match="*[not(*)]
[generate-id() !=
generate-id(key('duplicate', concat(generate-id(..), '|', name(), '|', @value))[1])]" />
Here is the full XSLT in this case
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" />
<xsl:key name="duplicate" match="*[not(*)]" use="concat(generate-id(..), '|', name(), '|', @value)" />
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[not(*)][generate-id() != generate-id(key('duplicate', concat(generate-id(..), '|', name(), '|', @value))[1])]" />
</xsl:stylesheet>
share|improve this answer
edited Apr 23 '13 at 7:24
answered Apr 19 '13 at 13:04
Tim C
24.1k92949
Very god answer. I thought about something like this, but was to lazy. But wouldn't it be even better to use generate-id(..) instead of name(..) as key? Then in would also work with more same named parent nodes. â hr_117 Apr 19 '13 at 15:40
That is a good point. I've amended my answer to do just that. Thanks! â Tim C Apr 19 '13 at 16:10
I tried the code, but it's deleting every element not just duplicates. Am I doing anything wrong? â knix2 Apr 22 '13 at 7:20
The original XML in your question is not actually well-formed, which suggests you are using different XML to what you have shown us. If you could show us a more accurate sample, one that is not getting the results you want, that may help. Thanks! â Tim C Apr 22 '13 at 7:42
@ Tim C **This is sample xml on which I tried your code**<?xml version="1.0" encoding="UTF-8"?> <check> <val> <Samsung> <name value="galaxy" /> <name value="galaxy" /> <name value="galaxys" /> <id value="123" /> <id value="123" /> <name2 value="galaxy" /> </Samsung> <htc> <name value="galaxy" /> <name value="galaxy" /> <name value="galaxys" /> <id value="123" /> <id value="123" /> <name2 value="galaxy" /> </htc> </val> </check> â knix2 Apr 22 '13 at 9:25
|