[DH] Manage Document Formats with Pandoc

Bernice ChoyBernice Choy
4 min read

Previously...

In my previous article in this series, I shared about the use of Markdown document and Pandoc to aid in note-taking and writing documentation. It's common to have to follow a standardised document format for a project requirements.

In this article, I will be sharing how to create a custom reference document to use for style management.

Getting Started

I will be using the same Markdown document - sample.md for demonstrating the difference between using Pandoc default reference document and custom reference document.

The document contains the following elements.

  • Paragraphs of content (dummy text)

  • Images

  • Tables

Pandoc base reference document

  • To take a look at the style specifications used by Pandoc, you can retrieve the base reference style which is reference.docx by default.

  • You will need to redirect the content to the file as this command only prints the content from reference.docx .

  • From the image below, you can see the styles applied for the different headings, captions, tables and others.

pandoc --print-default-data-file reference.docx > pandoc-reference.docx

Sample Report

Without a reference document, this is how the Word document look like using Pandoc's base reference document.


Create Custom Reference Document

  • You can first create a reference document with a different name from the base reference style document.

  • In this case, I named mine as custom-reference.docx.

pandoc -o custom-reference.docx --print-default-data-file reference.docx

Manage Custom Reference Document

There are different ways to modify the styles in the reference document. The changes I would like to make are:

  • Change fonts to be black instead of blue for the headings

  • Table to include borders

  • To support block quotes

Copy styles from existing document

The steps below are for Mac machine. Please refer to this post for steps to execute on a Windows machine.

  1. In the top navigation bar, go to "Format" > "Style"

  2. When the "Style" prompt appear, click on "Organizer..."

  1. In the "Organizer..." prompt, you can close the "Normal.dotm" file and open the desired file to copy the styles between the selected documents.

Create/Modify styles from scratch

In general, it is recommended to only modify styles used by pandoc. The styles are:

  • Paragraph styles

  • Character styles

  • Table style

You can refer to Pandoc official documentation for more information [here](https://pandoc.org/MANUAL.html#option--reference-doc).

Modify Table Style

One of the more tricky changes to make is to the table style. By default, the table does not have any borders. To modify the table borders in Microsoft Word, take the following steps

  1. Highlight the target table. The "Table Design" tab will be available in the tool bar.

  2. Select "Modify Table Style"

  3. In the "Modify Style" prompt, make the desired changes. Click OK

  4. Remember to save the changes for the reference document

Convert Markdown to Word Document using Reference Document

This command includes an extra option --reference-doc, where you specify the path to a custom reference document for Pandoc to use for styling during conversion.

# Note the --reference-doc option should be provided
# with the filepath to the reference document
pandoc sample.md -t docx+native_numbering \
  --reference-doc=custom-reference.docx \
  -o sample-report.docx --trace

Final Output

Here's the end result of converting the same Markdown document

Caveats

Type of Word Processor

Your mileage may vary, when you are not using the typical word processor, i.e. Microsoft Word.

  • Alternatives such as WPS Office, LibreOffice and Google Docs may have different ways of handling styles.

  • For example, WPS Office Document does not seem to have the "Table Style" equivalent. So even with the custom reference document, the table style will not be converted as intended.

Tables and Captions

  • If you have very huge tables, you will have to manually re-distribute the width of the columns and/or rows to make the tables look more presentable.

  • The annotations used to generate the table captions are specifically to indicate to Pandoc for conversion. The syntax is not be valid Markdown syntax based on your version control platform, e.g. Gitlab. So do take note if the Markdown document is intended for a wider audience viewing.

References

  • While the table of figures and table of content can be generated programmatically, I feel it's more intuitive to generate them through the Microsoft Word native feature.
0
Subscribe to my newsletter

Read articles from Bernice Choy directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Bernice Choy
Bernice Choy

A fledgling engineer dabbling into areas of DevOps, AWS and automation. I enjoy tinkering with technology frameworks and tools to understand and gain visibility in the underlying mechanisms of the "magic" in them. In the progress of accumulating nuggets of wisdom in the different software engineering disciplines!