poi读取前程无忧的简历会打不开的,至少我以前读是这样的,因为他有时候是mht文件直接另存为word文档的,所以保险起见建议用jacob来读,如果他是doc或者是docx文档可以转化为html然后用jsoup来读取,效果挺好的
下面是转化的代码:
package com.java.doc;
import com.jacob.activeX.ActiveXComponent;
import com.jacob.com.Dispatch;
import com.jacob.com.Variant;
public class JacobRead {
public static void extractDoc(String inputFIle, String outputFile) {
boolean flag = false;
// 打开Word应用程序
ActiveXComponent app = new ActiveXComponent("Word.Application");
try {
// 设置word不可见
app.setProperty("Visible", new Variant(false));
// 打开word文件
Dispatch doc1 = app.getProperty("Documents").toDispatch();
Dispatch doc2 = Dispatch.invoke(
doc1,
"Open",
Dispatch.Method,
new Object[] { inputFIle, new Variant(false),
new Variant(true) }, new int[1]).toDispatch();
// 作为txt格式保存到临时文件
Dispatch.invoke(doc2, "SaveAs", Dispatch.Method, new Object[] {
outputFile, new Variant(7) }, new int[1]);
// 关闭word
Variant f = new Variant(false);
Dispatch.call(doc2, "Close", f);
flag = true;
} catch (Exception e) {
e.printStackTrace();
} finally {
app.invoke("Quit", new Variant[] {});
}
if (flag == true) {
System.out.println("Transformed Successfully");
} else {
System.out.println("Transform Failed");
}
}
public static void main(String[] args) {
JacobRead.extractDoc("D:/xxxx简历.doc","D:/e.txt");
}
}
当然,也可以转化为txt读取,这部分代码没保存,你可以到网上找找,和转化成html的方法大差不差。
然后下面是我以前写的poi读取的方式:
package TestHanLp;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import org.apache.poi.POIXMLDocument;
import org.apache.poi.POIXMLTextExtractor;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
public class Test {
private static String text = "";
public static String Read(String path) throws Exception{
//解析docx格式的简历
if(path.toLowerCase().endsWith("docx")){
try {
OPCPackage oPCPackage = POIXMLDocument.openPackage(path);
XWPFDocument xwpf = new XWPFDocument(oPCPackage);
POIXMLTextExtractor ex = new XWPFWordExtractor(xwpf);
text=ex.getText();
oPCPackage.close();
}
catch (FileNotFoundException e)
{
e.printStackTrace();
}
catch (IOException e)
{
e.printStackTrace();
}
}else {
//解析doc格式的简历
if(path.toLowerCase().endsWith("doc")){
FileInputStream fis = new FileInputStream(path);// 载入文档
WordExtractor wordExtractor = new WordExtractor(fis);
String[] paragraph = wordExtractor.getParagraphText();
StringBuffer stringBuffer = new StringBuffer();
for (int i = 0; i < paragraph.length; i++) {
if (null != paragraph[i] && !"".equals(paragraph[i])) {
paragraph[i] = paragraph[i].substring(0, paragraph[i].length() - 1);// 去掉末尾符号
}
stringBuffer.append(paragraph[i]).append("\n");//将每一小段隔开
}
text = stringBuffer.toString();
}
}
return text ;
}
}
望题主采纳
对了,jacob读取word文档的效果比poi号,但运行速度不够,用的时候自己考虑考虑吧