Converting a Docx directly to EPUB using Calibre

Recently the software Calibre added Docx to its converter as a possible input format. Thus it’s allowed to convert directly from Docx to Epub. The procedure is quite simple. We’ll use the test Docx document provided by the Calibre team.

Add the docx file by drag-and-dropping it into calibre. In this way it’ll be added to your library.

Click on Convert Books in the top bar.

Select “DOCX” as input format and “EPUB” as output, add metadata and click OK.

You should now be able to see EPUB on the right. Click on it to open the epub with the Calibre viewer, right click on it to see the option to save the file to your disk.

Now let’s have a look to the output Epub.

Calibre added a default cover (with generated author and title) because we didn’t specified any.

Besides some minor glitches, all the features of the Docx document are preserved.

Let’s have a look now into the Epub. To do so we need to unzip it. We can either change the extension to “.zip” or use the Epub UnZip tool.

The nice thing is that the conversion keeps (or create) a table of contents, splitting the document in several HTML files. From the document itself:

There are two approaches that calibre takes when generating a Table of Contents. The first is if the Word document has a Table of Contents itself. Provided that the Table of Contents uses hyperlinks, calibre will automatically use it […] If no Table of Contents is found in the document, then a table of contents is automatically generated from the headings in the document. A heading is identified as something that has the Heading 1 or Heading 2, etc. style applied to it. These headings are turned into a Table of Contents with Heading 1 being the topmost level, Heading 2 the second level and so on.

This means that by merging several Docx in a single one, it is possible to obtain an Epub containing multiple chapters, therefore an entire book.

The Epub obtained passes validation and it wouldn’t be too intensive to clean its code, especially if not many styles are expressed.

Metadata should be added or modified in the content.opf file.

Share