Name is required.
Email address is required.
Invalid email address
Answer is required.
Exceeding max length of 5KB

IExtensionHelpers.urlDecode() not handling UTF-8

August Nov 30, 2017 12:34AM UTC

I have an input string which contains an ENDASH encoded using UTF-8 as:


When I decode that in my extension with IExtensionHelpers.urlDecode(String input) I get:


However, the Java URLDecoder.decode(String input, "UTF-8") produces the proper ENDASH:

What encoding is assumed internally by IExtensionHelpers.urlDecode()? Could the API be modified to allow for specifying the encoding, similar to the Java API?

Paul Johnston Nov 30, 2017 08:58AM UTC Support Center agent

Hi August,

Thanks for your message. urlDecode assumes the charset is Latin-1. Burp generally handles HTTP messages like this, apart from displaying messages.

We may implement more advanced unicode handling in the future. In the meantime, I suggest you use the byte[] version of urlDecode, and Java APIs for utf-8 decoding.

Please let us know if you need any further assistance.

August Nov 30, 2017 08:21PM UTC
I noted that the same issue appears to be happening with IExtensionHelpers.bytesToString(). You can work around the issue by using Java's String constructor:

new String(byteArray, StandardCharsets.UTF_8)

This seems like a pretty big oversight in this day and age...

Paul Johnston Dec 01, 2017 08:50AM UTC Support Center agent

Hi August,

Just to be clear, Burp does this almost everywhere. It’s pretty much only displaying messages that is Unicode aware.

What are you using the API for? We’ve not prioritized this because for security tests, working in 8-bit is fine, and sometimes preferable.

August Dec 01, 2017 10:16PM UTC
I'm building an extension that takes some URLEncoded, UTF-8, JSON input, prettifies it, and displays it to the user in an editable tab (in Repeater for example). When the user edits the JSON it gets recompressed, URL encoded, and passed back to the Raw tab.

But using the methods in IExtensionHelpers, my ENDASH goes from:

Raw: %E2%80%93
My Tab: –
Raw: %C3%A2%C2%80%C2%93

Paul Johnston Dec 04, 2017 11:56AM UTC Support Center agent

Hi August,

Understood. We’ll look at this improvement in future.

Have you managed to get this working using Java encode/decode functions? If you’re struggling, send me a code snippet and I’ll see if I can help.

Wyatt Dahlenburg Nov 06, 2018 03:31PM UTC
Hi August and Paul,

Was this issue ever resolved? I'd like to do some encoding work in a plugin of mine as well. This doesn't appear to be a well documented issue.

I want to be able to encode my Intruder payloads with UTF-8.

Liam Tai-Hogan Nov 06, 2018 04:01PM UTC Support Center agent

Hi Wyatt

This is still logged in our backlog. We’ll update this tread when we’ve made some progress. Unfortunately, we can’t provide an ETA.

Post Your public answer

Your name
Your email address