Convert HTML to Word with Python: A Simple Three-Step Guide
“How to convert HTML text to MS Word Doc?”
Interest in converting between HTML and Word documents has been high for years, with questions on the topic appearing on forums as early as 2012. Whether for creating reports or collecting content for offline reading, turning HTML to Word is essential for many tasks. In this guide, we’ll demonstrate how to convert HTML to Word efficiently using Python, helping you save time and boost productivity by eliminating manual steps.
HTML VS Word Document
Before getting down to the main topic, to gain a better understanding, let’s take a look at what are HTML and Word.
HTML(Hypertext Markup Language), according to Wikipedia, is the standard markup document language that is used for presenting web pages on a browser. HTML structures allow images and other objects, like interactive forms, to be embedded within the displayed page. It enables the creation of organized documents by defining the structural semantics of text, including headings, paragraphs, lists, links, quotes, and other elements.
A Word document often refers to a text file created with Word processing applications, such as Microsoft Word. It usually contains text, images, tables and so on. The biggest difference between Word and HTML is that Word documents can be mastered offline. Furthermore, Word documents are well formatted and easier to read, compared with HTML file.
Python Library to Convert HTML to Word
In this guide, we will use Spire.Doc for Python to demonstrate how to accomplish the task. It is a comprehensive Word library that allows developers to create, edit, and convert Word files. This tool helps users to convert HTML to Word quickly and in high quality without any loss.
You can install it using the pip command: pip install Spire.Doc
How to Convert an HTML to a Word Document in Three Steps
Now that the setup is complete, let’s dive into the main process: converting HTML to a Word document in just three straightforward steps. In this section, we’ll guide you through the detailed instructions and a code example, helping you streamline the time-consuming work. Let’s begin!
Steps to convert HTML to Word:
Create an object of the Document class.
Load an HTML file from local storage using the Document.LoadFromFile() method.
Save the HTML as Word documents with the Document.SaveToFile() method, and then release the resource.
Here is the code example of converting HTML to Word Doc:
from spire.doc import *
from spire.doc.common import *
# Create an object of the Document class
document = Document()
# Load an HTML file
document.LoadFromFile("/Input.html", FileFormat.Html, XHTMLValidationType.none)
# Save the HTML file to a .docx file
document.SaveToFile("/HtmlToWord.docx", FileFormat.Docx2016)
# Relase resources
document.Close()
How to Convert HTML String to Word
Sometimes, converting an entire HTML file to Word may be unnecessary. A full web page typically includes text, images, links, and other elements, yet you may only need to extract and convert the text. In such cases, converting HTML strings to a Word document is more efficient and requires a different approach. In the following section, we’ll walk through the process of handling HTML strings for a more targeted conversion.
Steps to convert HTML string to Word:
Instantiate a Document class.
Add a section to the document using the Document.AddSection() method, then add a paragraph to the section using the Section.AddParagraph() method.
Specify the HTML string.
Append the string to the paragraph with the Paragraph.AppendHTML(htmlString) method.
Save the resulting document by calling the Document.SaveToFile() method, and then release the memory.
Below is a code example of converting HTML string to Word documents:
from spire.doc import *
from spire.doc.common import *
# Create an object of the Document class
document = Document()
# Add a section to the document
sec = document.AddSection()
# Add a paragraph to the section
paragraph = sec.AddParagraph()
# Specify the HTML string
htmlString = """
<html>
<head>
<title>HTML to Word Example</title>
<style>
body {
font-family: Arial, sans-serif;
}
h1 {
color: #FF5733;
font-size: 24px;
margin-bottom: 20px;
}
p {
color: #333333;
font-size: 16px;
margin-bottom: 10px;
}
ul {
list-style-type: disc;
margin-left: 20px;
margin-bottom: 15px;
}
li {
font-size: 14px;
margin-bottom: 5px;
}
table {
border-collapse: collapse;
width: 100%;
margin-bottom: 20px;
}
th, td {
border: 1px solid #CCCCCC;
padding: 8px;
text-align: left;
}
th {
background-color: #F2F2F2;
font-weight: bold;
}
td {
color: #0000FF;
}
</style>
</head>
<body>
<h1>This is a Heading</h1>
<p>This is a paragraph demonstrating the conversion of HTML to Word document.</p>
<p>Here's an example of an unordered list:</p>
<ul>
<li>Item 1</li>
<li>Item 2</li>
<li>Item 3</li>
</ul>
<p>And here's a table:</p>
<table>
<tr>
<th>Product</th>
<th>Quantity</th>
<th>Price</th>
</tr>
<tr>
<td>Jacket</td>
<td>30</td>
<td>$150</td>
</tr>
<tr>
<td>Sweater</td>
<td>25</td>
<td>$99</td>
</tr>
</table>
</body>
</html>
"""
# Append the HTML string to the paragraph
paragraph.AppendHTML(htmlString)
# Save the result document
document.SaveToFile("/HtmlStringToWord.docx", FileFormat.Docx2016)
# Close the document
document.Close()
The Bottom Line
This article walks you through everything you need to know to convert HTML to Word, from saving a complete HTML file as a Word document to handling HTML strings with ease. With step-by-step instructions and example code in each section, you’ll soon discover how simple it is to turn HTML into polished Word documents!
Subscribe to my newsletter
Read articles from Casie Liu directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by