Java HTTP Client 抓取网页,这个网页无论用什么编码都是乱码?
问题描述:
http://www.licai.com/xuetang/CiDian.aspx?dj=1&type=&page=1
client.executeMethod(get);
//
String statusText = get.getStatusText();
//System.out.println("Test.main():--->" + statusText);
System.out.println("Test.main():--->" + new String(get.getResponseBodyAsString().getBytes("GB2312"), "UTF-8"));
InputStream in = get.getResponseBodyAsStream();
BufferedReader br = new BufferedReader(new InputStreamReader(in, charset));
String tempbf;
html = new StringBuffer(100);
while ((tempbf = br.readLine()) != null) {
html.append(tempbf + "\n");
}
代码 大概就是这样的
答
// 默认的client类。
HttpClient client = new DefaultHttpClient();
// 设置为get取连接的方式.
HttpGet get = new HttpGet(url);
// 得到返回的response.
HttpResponse response = client.execute(get);
// 得到返回的client里面的实体对象信息.
HttpEntity entity = response.getEntity();
if (entity != null) {
System.out.println( entity.getContentEncoding());
System.out.println( entity.getContentType());
// 得到返回的主体内容.
InputStream instream = entity.getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(instream, encoding));
System.out.println(reader.readLine());
// EntityUtils 处理HttpEntity的工具类
// System.out.println(EntityUtils.toString(entity));
}
// 关闭连接.
client.getConnectionManager().shutdown();
答
通过get获取响应内容的 ContentEncoding,看看返回的内容到底是什么编码的。