UTF8 in Java

Java seems to already assume UTF8 encoding.

String test = "ÀÁÂÃÄ";
System.out.println(test); // prints out ÀÁÂÃÄ
System.out.println(test.length()); // prints out 5

Here's how to convert the above UTF8 string into series of bytes:

byte[] bytes = test.getBytes("UTF8");

And here's how to convert those bytes back into a Java string:

String utf8 = new String(bytes, "UTF8");

Both of these must be wrapped by try-catches in case your system doesn't recognize UTF8 encoding, which would be pretty rare...

No comments: