Search Results for

    Show / Hide Table of Contents

    Convert Word to HTML

    Namespace: Clippit.Word

    Convert a Word document to an HTML XElement, with configurable CSS generation and image handling.

    public static class WmlToHtmlConverter {
        public static XElement ConvertToHtml(
            WmlDocument doc, WmlToHtmlConverterSettings htmlConverterSettings)
        {...}
    
        public static XElement ConvertToHtml(
            WordprocessingDocument wordDoc, WmlToHtmlConverterSettings htmlConverterSettings)
        {...}
    }
    

    The converter produces a complete HTML document as an XElement (XHTML). It generates CSS classes for paragraph and character styles, handles numbering/lists, and processes images through a configurable ImageHandler callback.

    An extension method is also available directly on WmlDocument:

    WmlDocument doc = new WmlDocument("input.docx");
    XElement html = doc.ConvertToHtml(settings);
    

    WmlToHtmlConverterSettings

    Field Type Default
    PageTitle string ""
    CssClassPrefix string "pt-"
    FabricateCssClasses bool true
    GeneralCss string "span { white-space: pre-wrap; }"
    AdditionalCss string ""
    RestrictToSupportedLanguages bool false
    RestrictToSupportedNumberingFormats bool false
    ImageHandler Func<ImageInfo, XElement> null

    ImageInfo

    The ImageHandler callback receives an ImageInfo object for each image in the document:

    Field Type Description
    Image SixLabors.ImageSharp.Image The decoded image
    ImgStyleAttribute XAttribute The computed style attribute (width/height)
    ContentType string The image MIME type
    DrawingElement XElement The source OpenXml drawing element
    AltText string Alternative text for the image

    WmlToHtmlConverter Sample

    var doc = new WmlDocument("input.docx");
    
    var settings = new WmlToHtmlConverterSettings
    {
        PageTitle = "My Document",
        CssClassPrefix = "doc-",
        FabricateCssClasses = true,
        AdditionalCss = "body { font-family: Calibri, sans-serif; }",
        ImageHandler = imageInfo =>
        {
            // Convert images to inline base64 data URIs
            using var stream = new MemoryStream();
            imageInfo.Image.SaveAsPng(stream);
            var base64 = Convert.ToBase64String(stream.ToArray());
            var imgElement = new XElement(
                Xhtml.img,
                imageInfo.ImgStyleAttribute,
                new XAttribute("src", $"data:image/png;base64,{base64}"),
                new XAttribute("alt", imageInfo.AltText ?? "")
            );
            return imgElement;
        }
    };
    
    var html = WmlToHtmlConverter.ConvertToHtml(doc, settings);
    File.WriteAllText("output.html", html.ToString());
    
    • Edit this page
    In this article
    Back to top Generated by DocFX