XML parser unable to correctly handle numeric entities (Android)
Hi,
I've been struggling for a while around the following issue with the android XML parser (Ti Mobile SDK 1.4).
In particular, if an XML file (or HTTP xml response message) contains some numeric entities for special characters (e.g. ë), these are correctly handled only when they are found in an attribute, but not when they are present inside the inner text of an element.
As an example, take the following xml file:
<?xml version="1.0" encoding="utf-8"?>
<meteo>
<fcast day="Venerdì 27 agosto 2010">
Prevalentemente nuvoloso con rovesci anche temporaleschi
più insistenti lungo la dorsale di confine.
</fcast>
</meteo>
I have the "day" attribute of the fcast element correctly parsed, as "Venerdì 27 agosto", while the numeric entities present in the inner text of the node are simply elided in the parsed text, e.g. with:
[...]
var meteo = xmlDoc.getElementsByTagName('meteo').item(0);
var details = meteo.item(0).getElementsByTagName('fcast').item(0).text;
Ti.API.info(details);
the result is:
"Prevalentemente nuvoloso con rovesci anche temporaleschi pi insistenti lungo la dorsale di confine."
Where the bold text is supposed to be "più".
Please note that the same code works like a charm on iOS.
At the moment I've been able to resolve the issue by simply pre-parsing the XML string (either loaded from file, or received in an http response), by substituting the numeric entities with their corresponding unicode encodings.
Have a nice day.
Olivier
5 Answers
-
var xml_special_to_escaped_one_map = { '&': '&', '"': '"', '<': '<', '>': '>' }; var escaped_one_to_xml_special_map = { '&': '&', '"': '"', '<': '<', '>': '>' }; function encodeXml(string) { return string.replace(/([\&"<>])/g, function(str, item) { return xml_special_to_escaped_one_map[item]; }); }; function decodeXml(string) { return string.replace(/("|<|>|&)/g, function(str, item) { return escaped_one_to_xml_special_map[item]; }); }
You can add your own caracters…
-
var xml_special_to_escaped_one_map = { '&': '&', '"': '"', '<': '<', '>': '>' }; var escaped_one_to_xml_special_map = { '&': '&', '"': '"', '<': '<', '>': '>' }; function encodeXml(string) { return string.replace(/([\&"<>])/g, function(str, item) { return xml_special_to_escaped_one_map[item]; }); }; function decodeXml(string) { return string.replace(/("|<|>|&)/g, function(str, item) { return escaped_one_to_xml_special_map[item]; }); }
You can add your own caracters…
-
Hello Olivier
Sorry for this message, but this is the only way to contact you…
Many of members of this site have read your message about geo-augmented reality…
Is it possible that you publish this code, or send it to me : Ivan . Mathy [at] free . Fr
It would help many of us…
Thanks, and sorry for my bad english'
Ivan
-
I posted a possible solution here: http://developer.appcelerator.com/question/57891/xhr-rss-feed—special-characters#114271
-
@Olivier would you enter a Lighthouse Ticket with example XML that should be parsed and assign it to me. I don't see this particular issue in the system.