Oct 12, 2007

struts ,jsp,mysql with unicode support

jsp, mysql, struts with unicode character set support should have been an easy task, I found ample reference in web, still I had to face a lot of complications while setting up a website ( jsp based )which can take Bangla ( a unicode character set ) input from users, process the input via struts framework, store it in MySQL database and again display them in the same website. For Bangla text, it must have unicode support in the JSPs . The underlying database is mysql and we have used Struts to implement the MVC architecture. Our task was to provide it character set UTF-8 support in every layer.
Now I describe what we really done to make it :

Make the database Unicode compatible:


MySQL’s default character encoding is Latin1. We have to change it to utf-8 for Unicode compatibility.
    • Install mysql 5, if any previous version is already installed, first uninstall it.
    • If you need the data already inserted n the database, don’t forget to keep the backup. Dump it to a file using mysqldump. We here call the file dump.sql.
    • Every database has a database character set and a database collation.
    • The CREATE DATABASE and ALTER DATABASE statements have optional clauses forspecifying the database character set and collation:
CREATE DATABASE db_name
[[DEFAULT] CHARACTER SET charset_name]
[[DEFAULT] COLLATE collation_name]

ALTER DATABASE db_name
[[DEFAULT] CHARACTER SET charset_name]
[[DEFAULT] COLLATE collation_name]


See for more ..
For example :


“Drop database dbname;”
“CREATE DATABASE dbname CHARACTER SET utf8 COLLATE utf8_general_ci;”
    • Replace all latin1 instances with utf8 in the dump.sql file .
    • Load the dump into the new DataBase, run the script dump.sql. If you are creating a whole new database without using any previous data, then this step is not needed.

    • If you are uncertain about the character set or collation of the result returned by a string function, you can use the CHARSET() or COLLATE() function to find it.
    • For example:
mysql>SELECT USER(),CHARSET(USER()),COLLATION(USER());

USER()| CHARSET(USER())| COLLATION(USER())|
------------------------------------------------------

dba@localhost| utf8 | utf8_general_ci |
    • Now the database is Unicode compatible , but the JDBC driver or database connector has to know it; so we have to tell it through the connection string. We have to append this with the connection string : useUnicode=true&characterEncoding=UTF-8”.
For example:


String username="root";
String password="";
String url=
"jdbc:mysql://hostname/databsename?useUnicode=true&characterEncoding=UTF-8";
try {
Class.forName("com.mysql.jdbc.Driver").newInstance(); this.connection=DriverManager.getConnection(url,username,password);
}
catch(ClassNotFoundException ex){}

Thus we are done with the database.

Page encoding:

this is for the jsp pages to recognize utf-8 and use unicode encoding, so that your browser knows that it should use the character set utf-8. Put the following line at the top of your page; it will ensure that the Content-Type header is set appropriately
<%@page contentType="text/html;charset=UTF-8"%>

To set page encoding to utf-8 , this is what you should write:

<%@page  contentType="text/html; charset=UTF-8" pageEncoding="UTF-8" %>
put the correct meta tags in your HTML pages:
  • meta http-equiv="Content-Type" content="text/html; charset=UTF-8" 

If you're using forms, be sure to add this in the form:

 method="post"

accept-charset="UTF-8"
enctype="multipart/form-data"
As we are running the web application in Tomcat, it's better add the parameter
-Dfile.encoding=UTF-8
to Java options in catalina.bat/catalina.sh file to tell JVM that you are using utf-8.
Filter for Struts Action :

As I have already said that we are using Struts action for submitting our forms, we have to make sure that our encodings are intact through struts. The default character encoding of struts is iso-8859-1. So, we have to use this filter so all our request and response undergo this filter to view unicode character properly. To do this, just create a java class, suppose Filter.java .

public class Filter implements javax.servlet.Filter {

private FilterConfig filterConfig;
public void init(FilterConfig filterConfig) throws ServletException {
this.filterConfig = filterConfig; }

public void doFilter(ServletRequest servletRequest,ServletResponse servletResponse,FilterChain filterChain) throws IOException, ServletException {

servletRequest.setCharacterEncoding("utf-8");

servletResponse.setCharacterEncoding("UTF-8");

// Set the content type in the header of the response servletResponse.setContentType("text/html;charset=UTF-8");

filterChain.doFilter(servletRequest, servletResponse);
}
public void destroy() {}
}

Register this filter in web.xml file and define the mapping for this filter in web.xml file .

Hopefully this is all we need , still if anything goes wrong, don't worry, keep patience and think cool headed,

this is the lesson I learned while working in this project which now supports Unicode in all its layers.