Checking and fixing HTML conversion outcomes
Background
The first step in word2canvas
is converting your Word document to HTML. To achieve this, word2canvas
uses
- the Mammoth .docx to HTML converter; and,
- a custom style map.
After attempting to convert your Word document, word2canvas
will display both the Messages and the HTML generated by Mammoth. These are displayed in accordions which can be opened/closed to check the outcomes of the conversion and inform how to fix any issues.
Checking the conversion outcomes
The following image shows the conversion outcomes of the sample w2c.docx file. The two most common types of checks are
- Do the Messages include anything of concern?
- Are there any problems with the generated HTML?
Check the Messages
Mammoth messages follow a common format, typically including a type and a message.
The type with be either a warning or a error. Warnings tend to mean the conversion happened, but perhaps with unexpected results.
As illustrated above the most common warning indicates that the document includes a Word style that is not recognised by Mammoth.
Handling unrecognised Word styles
There are three ways you can handle a Word style unrecognised by Mammoth
- Ignore it - generally the text will still appear in the HTML. However, it may have lost some of the intended styling.
- Remove the style from the Word document - you can search Word documents for specific styles and then choose to apply a different style or to remove the style entirely
- Add the style to the
word2canvas
custom map - the custom map is in the word converter model. If you're not comfortable coding, you can request an update of the custom style map via theword2canvas
Issues.
Handling problems with the generated HTML
If the generated HTML does not meet your expectations, the only real solution is to modify the Word document (e.g. change the style used, modify an image etc) and test the conversion again.