IExtensionHelpers.urlDecode() not handling UTF-8
I have an input string which contains an ENDASH encoded using UTF-8 as:
When I decode that in my extension with IExtensionHelpers.urlDecode(String input) I get:
However, the Java URLDecoder.decode(String input, "UTF-8") produces the proper ENDASH:
What encoding is assumed internally by IExtensionHelpers.urlDecode()? Could the API be modified to allow for specifying the encoding, similar to the Java API?
Thanks for your message. urlDecode assumes the charset is Latin-1. Burp generally handles HTTP messages like this, apart from displaying messages.
We may implement more advanced unicode handling in the future. In the meantime, I suggest you use the byte version of urlDecode, and Java APIs for utf-8 decoding.
Please let us know if you need any further assistance.
new String(byteArray, StandardCharsets.UTF_8)
This seems like a pretty big oversight in this day and age...
Just to be clear, Burp does this almost everywhere. It’s pretty much only displaying messages that is Unicode aware.
What are you using the API for? We’ve not prioritized this because for security tests, working in 8-bit is fine, and sometimes preferable.
But using the methods in IExtensionHelpers, my ENDASH goes from:
My Tab: â
Understood. We’ll look at this improvement in future.
Have you managed to get this working using Java encode/decode functions? If you’re struggling, send me a code snippet and I’ll see if I can help.
Was this issue ever resolved? I'd like to do some encoding work in a plugin of mine as well. This doesn't appear to be a well documented issue.
I want to be able to encode my Intruder payloads with UTF-8.
This is still logged in our backlog. We’ll update this tread when we’ve made some progress. Unfortunately, we can’t provide an ETA.