如何转换.html中的.doc,.docx文件

问题描述:

如何转换.html中的.doc,.docx文件

How to convert a .doc, .docx file in .html

首先,你应该明白问题是模棱两可的,因为没有格式与其数据和渲染模型之间的一种通用一对一对应关系。也就是说,如果你开发这样的函数,它应该接受两个,而不是一个参数(文档本身),它还应该接收一些映射规则,这些规则可以不同,产生不同的结果。



要读取/解析Word文档,可以使用Microsoft Office Interop for Word。如果您安装了office,这是已经放入GAC的程序集,因此您可以使用Add Reference窗口的.NET选项卡来引用它。请参阅:

http://en.wikipedia.org/wiki/Visual_Studio_Tools_for_Office [ ^ ],

http://msdn.microsoft.com/en-us/library/ff601860.aspx [ ^ ],

http://msdn.microsoft.com/en-us/library/microsoft.office.interop .word.aspx [ ^ ]。



这篇文章也很有用:

http://www.dotnetperls.com/word [ ^ ]。



如果你想在不安装Office的情况下使用Word格式,你仍然可以做到。毕竟,OpenOffice,LibreOffice和其他产品支持所有版本的格式,请参阅:

http ://en.wikipedia.org/wiki/OpenOffice.org [ ^ ],

http://en.wikipedia.org/wiki/LibreOffice [ ^ ]。



这些产品是开源的,所以你可以随时下载源代码并查看转换背后的代码。



如果你我想只支持较新的Office Open XML,格式本身可用,并在ECMA-376和ISO / IEC 29500:2008下标准化:

http://en.wikipedia.org/wiki/Office_Open_XML [ ^ ],

http:/ /en.wikipedia.org/wiki/Office_Open_XML_software [ ^ ]。



请参阅Office Open XML软件的对比图:

http://en.wikipedia.org/wiki/Comparison_of_Office_Open_XML_software [ ^ ]。



由于某些源代码可用且已打开,您可以使用它。



-SA
First and foremost, you should understand that the problem is ambiguous, as there is not one universal one-to-one correspondence between the format and their data and rendering model. That said, if you develop such function, it should accept two, not one parameter (document itself), it should also receive some set of mapping rules, which can be different, producing different results.

To read/parse Word documents, you can use Microsoft Office Interop for Word. This is the assembly already put to GAC if you install office, so you can reference it using ".NET" tab of the "Add Reference" window. Please see:
http://en.wikipedia.org/wiki/Visual_Studio_Tools_for_Office[^],
http://msdn.microsoft.com/en-us/library/ff601860.aspx[^],
http://msdn.microsoft.com/en-us/library/microsoft.office.interop.word.aspx[^].

This article can also be useful:
http://www.dotnetperls.com/word[^].

If you want to work with Word formats without installation of Office, you still can do it. After all, OpenOffice, LibreOffice and other products support all versions of the format, please see:
http://en.wikipedia.org/wiki/OpenOffice.org[^],
http://en.wikipedia.org/wiki/LibreOffice[^].

These products are open-source, so you can always download the source code and see the code behind the conversion.

If you would like to support only the newer Office Open XML, the format itself is available and is standardized under ECMA-376 and ISO/IEC 29500:2008:
http://en.wikipedia.org/wiki/Office_Open_XML[^],
http://en.wikipedia.org/wiki/Office_Open_XML_software[^].

Please see the comparison chart on Office Open XML software:
http://en.wikipedia.org/wiki/Comparison_of_Office_Open_XML_software[^].

As some source code is available and open, you can use it.

—SA


http://*.com/questions/8135901/converting-docx-to-html [ ^ ]

http://janewdaisy.wordpress.com/2012/04/06/how-to-convert-word -document-to-html-with-cvb-net / [ ^ ]