Brian Minchau 13 January 2005 08:30:52 [ permanent link ]
Engin,
I reproduced your problem, but with some differences. Your call to Source stylesheet = tFactory.getAssociatedStylesheet(new StreamSource("x.xml"),media, title,charset); gave me a null, so I changed your code to this:
public class Jan12 { public static void main(String[] args) throws TransformerException, TransformerConfigurationException {
String media = null, title = null, charset = null;
try {
TransformerFactory tFactory = TransformerFactory.newInstance(); StreamSource ss = new StreamSource("jan12/x.xsl");
final Transformer transformer; transformer = tFactory.newTransformer(ss);
//create input stream with special encoding
FileInputStream fi = new FileInputStream("jan12/x.xml");
InputStreamReader i = new InputStreamReader(fi, "ISO8859_9");
StreamSource so = new StreamSource(i);
//create output stream with special encoding
FileOutputStream f = new FileOutputStream("xout.xml");
OutputStreamWriter o = new OutputStreamWriter(f, "ISO8859_9");
StreamResult s = new StreamResult(o);
transformer.transform(so, s);
fi.close();
i.close();
o.close();
f.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
The input x.xml was irrelevant, because I used this stylesheet for x.xsl: <?xml version="1.0" encoding="ISO-8859-9"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes" encoding="ISO-8859-9" />
<xsl:template match="/"> <out>char 287:ğ char 350:Ş Dotted capital I char 304 İ</out> </xsl:template>
</xsl:stylesheet>
The behavior is different depending on whether the Java Class sun.io.CharToByteConverter is available or not.
I suspect that when your run on windows the class is there, but on your UNIX system the JRE is different and the class is not available. You can add this to you Java code: Class clazz = Class.forName("sun.io.CharToByteConverter"); and test whether clazz is null in one environment but not the other. I suspect that when this class is available that you get the correct output.
When this class is not available it looks like it exposes a configuration error in Xalan in its Encodings.properties file in the org.apache.xml.serializer package. It has information for the Turkish characters in lines like this: ISO8859_9 ISO-8859-9 0x00FF ISO8859-9 ISO-8859-9 0x00FF The third word on the line, 0x00FF indicates the code point of the highest value used in the character set. In base 10 this value is 255. But these Turkish characters are 287, 350, 304, which is bigger than 255. When writing the characters to the output file, the serializer thinks the unicode characters are out of range because they are larger than the supposed maximum codepoint value. So the serializer converts them to numerical character references, e.g. the five characters İ rather than the single unicode character with a code point of 304.
At this point I'm not sure what the correct maximal code point value is for this character set, but I think that getting the value right might fix your problem.
Engin Ertilav 13 January 2005 10:40:47 [ permanent link ]
Hi,
My problem is solved. I find out that my input xml contains Turkish characters with these codes :
Ећ : 222 Д° : 221 ...
All my turkish characters are in range of 0-255. so i tried to give encoding ISO8859_1 for both of input and output streams.(it works without giving them because it is default for streams on unix...) Now my file is correct.
I also checked sun.io.CharToByteConverter, and it is available.
In my opinion when i used ISO8859_9 it converts my turkish characters to their ISO8859_9 equivalent codes. (350,304...)
It is interesting that when i perform transform with this code piece :
transformer.transform(new StreamSource("xin.xml"), new StreamResult(new java.io.FileOutputStream("xout.xml")));
it does not work and again it produces 350,304 codes for some characters. Maybe it uses stylesheet encoding.
If you would like to report an abuse of our service, such as a spam message, please . Если Вы хотите пожаловаться на содержимое этой страницы, пожалуйста .