Convert HTML to Word
Namespace: Clippit.Html
Convert an XHTML document to a Word document, with configurable CSS cascading and page layout settings.
public class HtmlToWmlConverter {
public static WmlDocument ConvertHtmlToWml(
string defaultCss, string authorCss, string userCss,
XElement xhtml, HtmlToWmlConverterSettings settings)
{...}
public static WmlDocument ConvertHtmlToWml(
string defaultCss, string authorCss, string userCss,
XElement xhtml, HtmlToWmlConverterSettings settings,
WmlDocument emptyDocument, string annotatedHtmlDumpFileName)
{...}
public static HtmlToWmlConverterSettings GetDefaultSettings()
{...}
public static string CleanUpCss(string css)
{...}
}
The converter accepts XHTML as an XElement and produces a WmlDocument. CSS is applied in three
layers following the CSS cascade: default (browser-like defaults), author (document styles), and
user (overrides). The GetDefaultSettings() method returns a pre-configured
HtmlToWmlConverterSettings with sensible page layout defaults.
HtmlToWmlConverterSettings
| Field | Type | Description |
|---|---|---|
MajorLatinFont |
string |
Major (heading) Latin font |
MinorLatinFont |
string |
Minor (body) Latin font |
DefaultFontSize |
double |
Default font size |
DefaultSpacingElement |
XElement |
Default paragraph spacing |
DefaultSpacingElementForParagraphsInTables |
XElement |
Default spacing in tables |
SectPr |
XElement |
Section properties (page size, margins) |
DefaultBlockContentMargin |
string |
Default margin for block content |
BaseUriForImages |
string |
Base URI for resolving image paths |
The settings also expose read-only properties derived from SectPr:
PageWidthTwips, PageMarginLeftTwips, PageMarginRightTwips,
PageWidthEmus, PageMarginLeftEmus, PageMarginRightEmus.
Static Properties
| Property | Type | Description |
|---|---|---|
EmptyDocument |
WmlDocument |
A minimal empty Word document used as a template |
HtmlToWmlConverter Sample
var settings = HtmlToWmlConverter.GetDefaultSettings();
settings.BaseUriForImages = "/images/";
var defaultCss = File.ReadAllText("default.css");
var authorCss = File.ReadAllText("styles.css");
var userCss = "";
var xhtml = XElement.Parse(@"
<html xmlns='http://www.w3.org/1999/xhtml'>
<head><title>Sample</title></head>
<body>
<h1>Hello World</h1>
<p>This is a <strong>sample</strong> document converted from HTML.</p>
<table>
<tr><td>Cell 1</td><td>Cell 2</td></tr>
<tr><td>Cell 3</td><td>Cell 4</td></tr>
</table>
</body>
</html>");
var doc = HtmlToWmlConverter.ConvertHtmlToWml(
defaultCss, authorCss, userCss, xhtml, settings);
doc.SaveAs("output.docx");