Burp Suite User Forum

Create new post

We can't use multi-byte characters in sitemap comment to save as XML

Takehiro | Last updated: May 11, 2017 10:21AM UTC

When saving sitemap, we can't use multi-byte Japanese characters as comment. (Its generate invalid encoded XML.) [View] Target > Site map [Steps] 1. Set following words as sitemap comment. ???? 2. left-click on the tree-view and select 'save selected items'. 3. save to XML (e.g. sitemap.xml) 4. check saved XML like this... # grep --color=never '<comment>' sitemap-with-jp5.xml | grep -v '<comment></comment>' | sed -e 's# *<comment><!\[CDATA\[##' | sed -e 's#\]\]></comment>##' | od -c 0000000 357 276 241 006 ; b \n 0000007 # LANG=ja_JP.utf8 echo -n "????" | od -c 0000000 347 256 241 347 220 206 347 224 273 351 235 242 0000014 5. load xml by Java, then following error occurs. org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x6) was found in the CDATA section at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source) [Expected] Japanese characters are exported as UTF-8 (or other valid encoding) in XML. [Environment] file.encoding utf-8 file.encoding.pkg sun.io file.separator \ java.class.path burpsuite_pro_v1.7.22.jar java.class.version 52.0 java.runtime.name Java(TM) SE Runtime Environment java.runtime.version 1.8.0_102-b14 java.specification.name Java Platform API Specification java.specification.vendor Oracle Corporation java.specification.version 1.8 java.version 1.8.0_102 java.vm.info mixed mode java.vm.name Java HotSpot(TM) 64-Bit Server VM java.vm.specification.name Java Virtual Machine Specification java.vm.version 25.102-b14 os.arch amd64 os.name Windows 10 os.version 10.0 sun.arch.data.model 64 sun.cpu.endian little sun.cpu.isalist amd64 sun.desktop windows sun.io.unicode.encoding UnicodeLittle sun.java.command burpsuite_pro_v1.7.22.jar --diagnostics sun.java.launcher SUN_STANDARD sun.jnu.encoding utf-8 sun.management.compiler HotSpot 64-Bit Tiered Compilers

PortSwigger Agent | Last updated: May 15, 2017 08:27AM UTC

Thanks for this report. If you modify the XML version to 1.1, does your parser accept the XML file containing the multibyte characters in the comment?

You must be an existing, logged-in customer to reply to a thread. Please email us for additional support.