Burp Suite User Forum

Create new post

IExtensionHelpers.urlDecode() not handling UTF-8

August | Last updated: Nov 30, 2017 12:34AM UTC

I have an input string which contains an ENDASH encoded using UTF-8 as: %E2%80%93 When I decode that in my extension with IExtensionHelpers.urlDecode(String input) I get: â?? However, the Java URLDecoder.decode(String input, "UTF-8") produces the proper ENDASH: – What encoding is assumed internally by IExtensionHelpers.urlDecode()? Could the API be modified to allow for specifying the encoding, similar to the Java API?

PortSwigger Agent | Last updated: Nov 30, 2017 08:45AM UTC

Hi August, Thanks for your message. urlDecode assumes the charset is Latin-1. Burp generally handles HTTP messages like this, apart from displaying messages. We may implement more advanced unicode handling in the future. In the meantime, I suggest you use the byte[] version of urlDecode, and Java APIs for utf-8 decoding. Please let us know if you need any further assistance.

Burp User | Last updated: Nov 30, 2017 08:21PM UTC

I noted that the same issue appears to be happening with IExtensionHelpers.bytesToString(). You can work around the issue by using Java's String constructor: new String(byteArray, StandardCharsets.UTF_8) This seems like a pretty big oversight in this day and age...

PortSwigger Agent | Last updated: Dec 01, 2017 08:14AM UTC

Hi August, Just to be clear, Burp does this almost everywhere. It's pretty much only displaying messages that is Unicode aware. What are you using the API for? We've not prioritized this because for security tests, working in 8-bit is fine, and sometimes preferable.

Burp User | Last updated: Dec 01, 2017 10:16PM UTC

I'm building an extension that takes some URLEncoded, UTF-8, JSON input, prettifies it, and displays it to the user in an editable tab (in Repeater for example). When the user edits the JSON it gets recompressed, URL encoded, and passed back to the Raw tab. But using the methods in IExtensionHelpers, my ENDASH goes from: Raw: %E2%80%93 My Tab: â?? Raw: %C3%A2%C2%80%C2%93

PortSwigger Agent | Last updated: Dec 04, 2017 08:46AM UTC

Hi August, Understood. We'll look at this improvement in future. Have you managed to get this working using Java encode/decode functions? If you're struggling, send me a code snippet and I'll see if I can help.

Liam, PortSwigger Agent | Last updated: Dec 04, 2017 11:56AM UTC

Hi Wyatt This is still logged in our backlog. We'll update this tread when we've made some progress. Unfortunately, we can't provide an ETA.

Burp User | Last updated: Nov 06, 2018 03:31PM UTC

Hi August and Paul, Was this issue ever resolved? I'd like to do some encoding work in a plugin of mine as well. This doesn't appear to be a well documented issue. I want to be able to encode my Intruder payloads with UTF-8.

You must be an existing, logged-in customer to reply to a thread. Please email us for additional support.