Tomcat query parameters and encodings

Did you ever wondered which from which encoding the query parameters are parsed by default in java (servlet) and the response in rendered? Say UTF-8. Wrong. Try ISO-8859-1. There are 3 cases to consider:

1. Query parameters as GET

2. Query parameters as POST

3. Response encoding.

In order to solve 1 and 2 one solution is just to convert the parameters to UTF-8:

String param = request.getParameter("test");
if(param != null)
    param = new String(param.getBytes("8859_1"),"UTF8");

This is not a solution if you need to change in a lot of places.

You can fix this if using tomcat by setting the URIEncoding=”UTF-8″ in the connector definition in server.xml (jboss location: ./deploy/jboss-web.deployer/server.xml)

<Connector port="8080" address="${jboss.bind.address}"    
 maxThreads="250" maxHttpHeaderSize="8192"
 emptySessionPath="true" protocol="HTTP/1.1"
 enableLookups="false" redirectPort="8443" acceptCount="100"
 connectionTimeout="20000" disableUploadTimeout="true" URIEncoding="UTF-8"/>

It will only fix the problem for GET query parameters. For POST a more complicated solution has to be done which involves writing a servlet filter. There is an example which can be found with tomcat examples. Just search for: SetCharacterEncodingFilter.java and the corresponding configuration in web.xml. This filter will set the request characterEncoding to the value provided as parameter.

The third case implies just setting the response encoding to UTF-8. It can be done as such:

public void service(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {
...
 response.setCharacterEncoding("UTF8");
 PrintWriter out = response.getWriter();
 //response.setCharacterEncoding("UTF8"); seetting the encoding here has no effect
... write things to out

Note: the character encoding must be set before getting the writer otherwise it will not work.

Another idea is to modify the SetCharacterEncodingFilter.java to set the response encoding as well.

public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {

 // Conditionally select and set the character encoding to be used
 if (ignore || (request.getCharacterEncoding() == null)) {
 String encoding = selectEncoding(request);
 if (encoding != null){
 request.setCharacterEncoding(encoding);
 }
 }
 response.setCharacterEncoding(encoding);
 // Pass control on to the next filter
 chain.doFilter(request, response);
 }

Tested with Tomcat 5 as bundled with Jboss 4.2.2-GA.

Addendum

If as in my case the params ended eventually in an oracle database via hibernate the funny thing was that the query param (interpreted as iso-8859-1) was converted to utf-8 via jdbc conversion and stored into a nvarchar2 column. How to fix the database content? The following sql might come in handy:

update my_table set value = to_nchar(convert(to_char(value), 'WE8ISO8859P1', 'UTF8'));

As always with encoding problems, good luck, you will need it.

Leave a Reply

*