struts2中的参数字符集转换
我有一个 struts2 web 应用程序,它接受许多不同字符集的 POST 和 GET 请求,将它们转换为 utf-8,在屏幕上显示正确的 utf-8 字符,然后将它们写入 utf-8 数据库.
I have a struts2 web application which accepts both POST and GET requests in many different charsets, does conversion of them into utf-8, displays the correct utf-8 characters on the screen and then writes them into utf-8 database.
我已经尝试了至少 5 种不同的方法来进行 windows-1250 到 utf-8 的简单无损字符集转换,但它们都不起作用.utf-8 是更大的集合",它应该可以正常工作(至少这是我的理解).
I have tried at least 5 different methods for doing simple losless charset conversion of windows-1250 to utf-8 to start with, and all of them did not work. Utf-8 being the "larger set", it should work without a problem (at least this is my understanding).
您能否提出如何将字符集从 windows-1250 转换为 utf-8,并且 struts2 是否可能对 params 字符集做了一些奇怪的事情,这将解释为什么我似乎无法正确理解.
Can you propose how to do a charset conversion from windows-1250 to utf-8, and is it possible that struts2 is doing something weird with the params charset, which would explain why I can't seem to get it right.
这是我最近的尝试:
String inputData = getSimpleParamValue("some_input_param_from_get");
Charset inputCharset = Charset.forName("windows-1250");
Charset utfCharset = Charset.forName("UTF-8");
CharsetDecoder decoder = inputCharset.newDecoder();
CharsetEncoder encoder = utfCharset.newEncoder();
String decodedData = "";
try {
ByteBuffer inputBytes = ByteBuffer.wrap(inputData.getBytes()); // I've tried putting UTF-8 here as well, with no luck
CharBuffer chars = decoder.decode(inputBytes);
ByteBuffer utfBytes = encoder.encode(chars);
decodedData = new String(utfBytes.array());
} catch (CharacterCodingException e) {
logger.error(e);
}
有什么想法可以尝试让它发挥作用吗?
Any ideas on what to try to get this working?
谢谢并致以最诚挚的问候,
Thank you and best regards,
博佐
我不确定您的方案.在 Java 中,字符串是 Unicode,只有在必须将字符串从/到字符串转换为二进制表示时才处理字符集转换.在您的示例中,当调用 getSimpleParamValue("some_input_param_from_get") 时, inputData 应该已经具有正确的"字符串,从字节流(从客户端浏览器传输到 Web 服务器)到字符串的转换应该已经进行了部分(您的应用程序的 Web 服务器 + Web 层的职责).为此,我对网络传输强制使用 UTF-8,在 web.xml(在 Struts 之前)中放置一个过滤器,例如:
I'm not sure of your scenario. In Java, a String is Unicode, one only deals with charset conversion when has to convert from/to String to/from a binary representation. In your example, when getSimpleParamValue("some_input_param_from_get") is called, inputData should already have the "correct" String, the conversion from the stream of bytes (that travelled from the client browser to the web server) to a string should have already taken part (responsability of the web server+web layer of your application). For this, I enforce UTF-8 for the web trasmission, placing a filter in the web.xml (before Struts), for example:
public class CharsetFilter implements Filter {
public void destroy() {}
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {
HttpServletRequest req = (HttpServletRequest) request;
HttpServletResponse res = (HttpServletResponse) response;
req.setCharacterEncoding("UTF-8");
chain.doFilter(req, res);
String contentType = res.getContentType();
if( contentType !=null && contentType.startsWith("text/html"))
res.setCharacterEncoding("UTF-8");
}
public void init(FilterConfig filterConfig) throws ServletException {
}
}
如果您不能这样做,并且如果您的 getSimpleParamValue() 在字符集转换中出错"(例如:它假定字节流是 UTF-8 并且是 windows-1250),那么您现在有一个不正确"的字符串,并且您必须尝试通过撤消和重做字节到字符串的转换来恢复它 - 在这种情况下,您必须知道错误和正确的字符集 - 更糟糕的是,处理丢失字符的可能性(如果它被解释为 UTF8,我可能发现了非法的字符序列).如果您必须在 Struts2 操作中处理这个问题,我会说您遇到了问题,您应该在它之前/之后明确处理它(在上层 Web 层 - 或在数据库驱动程序或文件编码中或其他)
If you cannot do this, and if your getSimpleParamValue() "errs" in the charset conversion (eg: it assumed the byte stream was UTF-8 and was windows-1250) you now have an "incorrect" string, and you must try to recover it by undoing and redoing the byte-to-string conversion - in which case you must know the wrong AND the correct charset - and, worse, deal with the possibity of missing chars (if it was interpreted as UTF8, i maight have found illegal char sequence). If you have to deal with this in a Struts2 action, I'd say you are in problems, you should deal with it explicitly before/after it (in the upper web layer - or in the Database driver or File encoding or whatever)