Search Results for

    Show / Hide Table of Contents

    Simplify Markup

    Namespace: Clippit.Word

    Strip unnecessary markup from Word documents to simplify the underlying XML structure.

    public static class MarkupSimplifier {
        public static WmlDocument SimplifyMarkup(
            WmlDocument doc, SimplifyMarkupSettings settings)
        {...}
    
        public static void SimplifyMarkup(
            WordprocessingDocument doc, SimplifyMarkupSettings settings)
        {...}
    }
    

    MarkupSimplifier removes various categories of markup that are often unnecessary for document processing, comparison, or conversion. Each category is controlled by a flag in SimplifyMarkupSettings.

    An instance method is also available directly on WmlDocument:

    var simplified = wmlDoc.SimplifyMarkup(settings);
    

    SimplifyMarkupSettings

    All fields are bool and default to false.

    Field Description
    AcceptRevisions Accept all tracked revisions before simplifying
    NormalizeXml Normalize the XML structure
    RemoveBookmarks Remove bookmark start/end elements
    RemoveComments Remove comments and comment references
    RemoveContentControls Remove structured document tags (content controls)
    RemoveEndAndFootNotes Remove endnote and footnote references and content
    RemoveFieldCodes Remove field codes, keeping field results
    RemoveGoBackBookmark Remove the _GoBack bookmark
    RemoveHyperlinks Remove hyperlink wrappers
    RemoveLastRenderedPageBreak Remove lastRenderedPageBreak elements
    RemoveMarkupForDocumentComparison Remove markup that interferes with document comparison (implies RemoveRsidInfo)
    RemovePermissions Remove permission start/end elements
    RemoveProof Remove proofing markup (spell check, grammar)
    RemoveRsidInfo Remove revision save ID attributes
    RemoveSmartTags Remove smart tag elements
    RemoveSoftHyphens Remove soft hyphen characters
    RemoveWebHidden Remove web-hidden paragraph marks
    ReplaceTabsWithSpaces Replace tab characters with spaces

    Additional Methods

    Method Description
    MergeAdjacentSuperfluousRuns(XElement) Merge adjacent runs with identical formatting
    TransformElementToSingleCharacterRuns(XElement) Split runs so each contains a single character
    TransformPartToSingleCharacterRuns(OpenXmlPart) Apply single-character run transform to a part
    TransformToSingleCharacterRuns(WordprocessingDocument) Apply single-character run transform to entire document

    SimplifyMarkup Sample

    var wmlDoc = new WmlDocument("input.docx");
    
    var settings = new SimplifyMarkupSettings
    {
        RemoveComments = true,
        RemoveRsidInfo = true,
        RemoveProof = true,
        RemoveBookmarks = true,
        RemoveGoBackBookmark = true,
        RemoveSoftHyphens = true,
        RemoveLastRenderedPageBreak = true,
        RemoveContentControls = true,
        RemoveSmartTags = true
    };
    
    var simplified = wmlDoc.SimplifyMarkup(settings);
    simplified.SaveAs("simplified.docx");
    

    Prepare for Comparison Sample

    var settings = new SimplifyMarkupSettings
    {
        RemoveMarkupForDocumentComparison = true,
        AcceptRevisions = true
    };
    
    var doc1 = new WmlDocument("doc1.docx").SimplifyMarkup(settings);
    var doc2 = new WmlDocument("doc2.docx").SimplifyMarkup(settings);
    
    // Documents are now ready for structural comparison
    
    • Edit this page
    In this article
    Back to top Generated by DocFX