Transformation to plain text deletes part of text

ViZart · January 16th, 2012, 04:10 AM

Hi

I dont know if this is the right forum for my problem. If not im sorry. But here goes:

I have a transformation using a data xml file and a xslt style sheet using xsl-fo. In my data xsml there is a larger amount of text in one value (for example one page of text. The transformation output is plain text. My problem is when my text gets to the output is misses some characters. It looks like the transformer tries to insert line breaks in the text, som doing that deletes some characters. Are I doing something wrong? See example below.

Only solution i have found for this problem is setting the page-width to a large value to prevent linebreaking. But i think is it a bad solution, because is doesnt really fix the problem.

Data xml:

Code:

<ReportResult>
<Patients>
<Patient>
<Results>
<Result >
<Values>
<Value Type="String" Unit="" ValueAsLong="0" ValueAsDouble="0.0">
<InterventionCode Code="VIJI_P114006"/>
<Code Code="VIJI_P114006"/>
<Text IsEmpty="false">
	XML 1.0 Conformance
	Saxon can be used with any SAX-conformant XML parser. The extent of XML conformance depends entirely on the chosen parser.

	The default parser is the one supplied with JDK 1.4, which is a version of Apache Crimson.

	DOM Conformance
	Saxon accepts input (both source document and stylesheet) from any standards-compliant DOM implementation.

	Saxon allows the result tree to be attached to any Document or Element node of an existing DOM. Any DOM implementation can be used, provided it is mutable.

	Saxon's internal tree structure (which is visible through the Java API, including the case where Java extensions functions are called from XPath expressions) conforms with the minimal requirements of the DOM level 2 core Java language binding. This DOM interface is read-only, so all attempts to call updating methods throw an appropriate DOM exception. No optional features are implemented. The DOM interfaces to Saxon's tree structure do not reveal namespace nodes as attributes. This means it is not possible to get information about namespace declarations except by calls such as getPrefix() and getNamespaceURI() on Element and Attr nodes).

	If an extension function returns a DOM Node or NodeList, this must consist only of Nodes in a tree constructed using Saxon. Since Saxon's trees cannot be updated using DOM methods, this means that the nodes returned must either be nodes from the original source tree, or nodes from a tree constructed using Saxon's proprietary API. It is not possible to construct the tree using DOM methods such as createElement() and createAttribute().

	JAXP 1.2 Conformance
	Saxon implements the JAXP 1.2 API (including TrAX), as defined in JSR-63. Saxon implements the interfaces in the javax.xml.transform package in full, including support for SAX, DOM, and Stream input, and SAX, DOM, and Stream output.

	Note: The transformation interfaces in JAXP 1.2 are identical to JAXP 1.1: the new version only affects the XML parser interface, adding options to control schema validation.

	There are restrictions in using transform() on a DOMSource when the node to be transformed is a node other than the root (i.e. the DOM Document node). These apply only if the supplied DOM is a third-party DOM, not if it is a Saxon-constructed tree. Specifically, if the start node is not the root then it must be an element; and it must not have an ancestor or preceding-sibling node, or an ancestor with a preceding-sibling node, that is an entity reference node or CDATA section node. In addition, the element must be part of a tree that is rooted at a Document node.

	Saxon also implements part of the javax.xml.parsers API. Saxon no longer provides its own SAX parser, however it does provide a DocumentBuilder. The DOM interfaces are limited by the capabilities of the Saxon DOM, specifically the fact that it is read-only. Nevertheless, the DocumentBuilder may be used to construct a Saxon tree, or to obtain an empty Document node which can be supplied in a DOMResult to hold the result of a transformation.
</Text>
</Value>
</Values>
</Result>
</Results>
<ExtraValues>
</ExtraValues>
</Patient>
</Patients>
</ReportResult>

The XSLT file used is this:

Code:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:s="http://www.stylusstudio.com/xquery">
	<xsl:output method="text" media-type="text/plain"/>
	<xsl:template match="/">
		<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
			<fo:layout-master-set>
				<fo:simple-page-master master-name="default-page" >
					<fo:region-body/>
				</fo:simple-page-master>
			</fo:layout-master-set>
			<fo:page-sequence master-reference="default-page">
				<fo:flow flow-name="xsl-region-body">
					<fo:block linefeed-treatment="preserve" keep-together.within-column="always">
						<fo:inline>Here is some text:							<xsl:value-of select="ReportResult/Patients/Patient/Results/Result/Values/Value/InterventionCode[@Code=&quot;VIJI_P114006&quot;]/../Text" disable-output-escaping="yes"/>
						</fo:inline>
					</fo:block>
				</fo:flow>
			</fo:page-sequence>
		</fo:root>
	</xsl:template>
</xsl:stylesheet>

Im using the following code to perform the transformation with 2 files above as parameters and the the Mime type MIME_PLAIN_TEXT

Code:

private static final String TRANSFORMER_FACTORY = "javax.xml.transform.TransformerFactory";
	private static final String TRANSFORMER_FACTORY_USED = "net.sf.saxon.TransformerFactoryImpl";

	public static byte[] xmlToMimeType( String aDataXml, byte[] aXslt, String aMimetype ) throws Exception
	{
		String previousTransformerProperty = System.getProperty( TRANSFORMER_FACTORY );
		try
		{
			ByteArrayOutputStream out = new ByteArrayOutputStream();

			System.setProperty( TRANSFORMER_FACTORY, TRANSFORMER_FACTORY_USED );
			FopFactory fopFactory = FopFactory.newInstance();
			FOUserAgent foUserAgent = fopFactory.newFOUserAgent();

			// Construct fop with desired output format
			Fop fop = fopFactory.newFop( aMimetype, foUserAgent, out );

			// Setup XSLT
			TransformerFactory factory = TransformerFactoryImpl.newInstance();
			Transformer transformer = factory.newTransformer( new StreamSource( new ByteArrayInputStream( aXslt ) ) );

			// Set the value of a <param> in the stylesheet
			transformer.setParameter( "versionParam", "2.0" );

			// Resulting SAX events (the generated FO) must be piped through to FOP
			Result res = new SAXResult( fop.getDefaultHandler() );

			// Start XSLT transformation and FOP processing
			transformer.transform( new StreamSource( new ByteArrayInputStream( aDataXml.getBytes( "UTF-8" ) ) ), res );
			return out.toByteArray();
		}
		catch ( Exception e )
		{
			Exception ex = new Exception( BusinessErrors.V0700_EXCEPTION, e );
			ex.addDescriptionParameterString( e.getMessage() );
			throw ex;
		}
		finally
		{
			if ( previousTransformerProperty != null )
			{
				System.setProperty( TRANSFORMER_FACTORY, previousTransformerProperty );
			}
			else
			{
				System.getProperties().remove( TRANSFORMER_FACTORY );
			}
		}
	}

I get the following output. Is you look close you can see that some of the words in the text are missing. I expects that the text can make it though transformation with out losing words, but it doesnt.

Code:

Here is some text:


XML 1.0 Conformance

Saxon can be used with any SAX-conformant XML parser. The extent of XML conformance depends entirely

the chosen parser.



The default parser is the one supplied with JDK 1.4, which is a version of Apache Crimson.


DOM Conformance

Saxon accepts input (both source document and stylesheet) from any standards-compliant DOM

implementation.



Saxon allows the result tree to be attached to any Document or Element node of an existing DOM. Any 

implementation can be used, provided it is mutable.


Saxon's internal tree structure (which is visible through the Java API, including the case where Jav

functions are called from XPath expressions) conforms with the minimal requirements of the DOM level

core Java language binding. This DOM interface is read-only, so all attempts to call updating method

an appropriate DOM exception. No optional features are implemented. The DOM interfaces to Saxon's tr

structure do not reveal namespace nodes as attributes. This means it is not possible to get informat
namespace declarations except by calls such as getPrefix() and getNamespaceURI() on Element and Attr

nodes).



If an extension function returns a DOM Node or NodeList, this must consist only of Nodes in a tree c

using Saxon. Since Saxon's trees cannot be updated using DOM methods, this means that the nodes retu

must either be nodes from the original source tree, or nodes from a tree constructed using Saxon's p
API. It is not possible to construct the tree using DOM methods such as createElement() and createAt



JAXP 1.2 Conformance

Saxon implements the JAXP 1.2 API (including TrAX), as defined in JSR-63. Saxon implements the inter

in the javax.xml.transform package in full, including support for SAX, DOM, and Stream input, and SA

and Stream output.


Note: The transformation interfaces in JAXP 1.2 are identical to JAXP 1.1: the new version only affe

parser interface, adding options to control schema validation.



There are restrictions in using transform() on a DOMSource when the node to be transformed is a node
than the root (i.e. the DOM Document node). These apply only if the supplied DOM is a third-party DO

if it is a Saxon-constructed tree. Specifically, if the start node is not the root then it must be a

must not have an ancestor or preceding-sibling node, or an ancestor with a preceding-sibling node, t

entity reference node or CDATA section node. In addition, the element must be part of a tree that is

Document node.


Saxon also implements part of the javax.xml.parsers API. Saxon no longer provides its own SAX parser

however it does provide a DocumentBuilder. The DOM interfaces are limited by the capabilities of the

DOM, specifically the fact that it is read-only. Nevertheless, the DocumentBuilder may be used to co

Saxon tree, or to obtain an empty Document node which can be supplied in a DOMResult to hold the res

transformation.

I was expecting this result: (words underlined missing in actual result)

Code:

Here is some text:

XML 1.0 Conformance

Saxon can be used with any SAX-conformant XML parser. The extent of XML conformance depends entirely on

the chosen parser.



The default parser is the one supplied with JDK 1.4, which is a version of Apache Crimson.


DOM Conformance

Saxon accepts input (both source document and stylesheet) from any standards-compliant DOM

implementation.

Saxon allows the result tree to be attached to any Document or Element node of an existing DOM. Any DOM 

implementation can be used, provided it is mutable.


Saxon's internal tree structure (which is visible through the Java API, including the case where Java extensions 

functions are called from XPath expressions) conforms with the minimal requirements of the DOM level 2 

core Java language binding. This DOM interface is read-only, so all attempts to call updating methods throw 

an appropriate DOM exception. No optional features are implemented. The DOM interfaces to Saxon's tree 

structure do not reveal namespace nodes as attributes. This means it is not possible to get information about 
namespace declarations except by calls such as getPrefix() and getNamespaceURI() on Element and Attr 

nodes).

If an extension function returns a DOM Node or NodeList, this must consist only of Nodes in a tree constructed 

using Saxon. Since Saxon's trees cannot be updated using DOM methods, this means that the nodes returned 

must either be nodes from the original source tree, or nodes from a tree constructed using Saxon's proprietary 
API. It is not possible to construct the tree using DOM methods such as createElement() and createAttribute().



JAXP 1.2 Conformance

Saxon implements the JAXP 1.2 API (including TrAX), as defined in JSR-63. Saxon implements the interfaces 

in the javax.xml.transform package in full, including support for SAX, DOM, and Stream input, and SAX, DOM, 

and Stream output.

Note: The transformation interfaces in JAXP 1.2 are identical to JAXP 1.1: the new version only affects the XML 

parser interface, adding options to control schema validation.



There are restrictions in using transform() on a DOMSource when the node to be transformed is a node other 
than the root (i.e. the DOM Document node). These apply only if the supplied DOM is a third-party DOM, not 

if it is a Saxon-constructed tree. Specifically, if the start node is not the root then it must be an element; and it 

must not have an ancestor or preceding-sibling node, or an ancestor with a preceding-sibling node, that is an 

entity reference node or CDATA section node. In addition, the element must be part of a tree that is rooted at a 

Document node.



Saxon also implements part of the javax.xml.parsers API. Saxon no longer provides its own SAX parser, 

however it does provide a DocumentBuilder. The DOM interfaces are limited by the capabilities of the Saxon 

DOM, specifically the fact that it is read-only. Nevertheless, the DocumentBuilder may be used to construct a 

Saxon tree, or to obtain an empty Document node which can be supplied in a DOMResult to hold the result of a 

transformation.

ViZart · January 16th, 2012, 04:14 AM

Forgot to mention that no text is missing if the transform outout is pdf instead of plain text