Word / Docx

Read and generate Word documents with correct structure, styles, and cross-platform compatibility

Word / Docx is a community skill for Microsoft Word document processing, covering document reading and parsing, content generation with styles, table creation and formatting, image insertion, and cross-platform file compatibility for automated document workflows.

What Is This?

Overview

Word / Docx provides AI agents with the ability to read and generate Microsoft Word documents programmatically with correct structure and formatting. It covers document reading that extracts text, paragraphs, tables, and images from existing DOCX files with structure preservation, content generation that creates new documents with headings, paragraphs, bullet lists, and formatted text using styles, table operations that build structured tables with merged cells, borders, and data alignment, image handling that inserts pictures with sizing and positioning control, and cross-platform compatibility that ensures documents work correctly across Windows, Mac, and Linux systems. The skill helps automate document creation and processing tasks.

Who Should Use This

This skill serves document automation developers building report generation systems, AI agents creating contracts and proposals from templates, and businesses automating repetitive document production workflows like invoices and certificates.

Why Use It?

Problems It Solves

Creating Word documents manually is time-consuming when generating multiple similar documents like contracts or reports with variable data. Extracting data from existing Word documents requires parsing complex XML structures and handling various formatting edge cases. Ensuring consistent document styling across templates involves manual formatting that is error-prone and difficult to maintain at scale. Integrating document generation into automated workflows requires libraries that understand the DOCX format specification completely and handle cross-platform compatibility issues.

Core Highlights

Document reader extracts text, tables, and images from DOCX files with structure preservation. Content generator creates documents with headings, paragraphs, and formatted text using styles. Table builder constructs structured tables with formatting options. Image handler inserts pictures with size and positioning control.

How to Use It?

Basic Usage

from docx import Document

doc = Document(
    'input.docx')

for para in doc.paragraphs:
    print(para.text)

new_doc = Document()
new_doc.add_heading(
    'Report Title', 0)
new_doc.add_paragraph(
    'This is the '
    'introduction.')
new_doc.add_heading(
    'Section 1', 1)
new_doc.add_paragraph(
    'Content here.')

new_doc.save(
    'output.docx')

Real-World Examples

from docx import Document
from docx.shared import Inches

doc = Document()
doc.add_heading(
    'Sales Report', 0)

p = doc.add_paragraph(
    'Q1 2025 Summary')
p.style = 'Heading 2'

table = doc.add_table(
    rows=4, cols=3)
table.style = \
    'Light Grid Accent 1'

headers = [
    'Month', 'Revenue',
    'Growth']
for i, h in \
        enumerate(headers):
    table.rows[0]\
        .cells[i].text = h

data = [
    ['Jan', '$50K', '10%'],
    ['Feb', '$55K', '15%'],
    ['Mar', '$60K', '12%']
]
for row_idx, row_data \
        in enumerate(data):
    for col_idx, value \
            in enumerate(
            row_data):
        table.rows[
            row_idx + 1]\
            .cells[col_idx]\
            .text = value

doc.add_picture(
    'chart.png',
    width=Inches(4))

doc.save(
    'sales_report.docx')

Advanced Tips

Use document templates with predefined styles and layouts as a base for generating consistent documents. Apply paragraph and character styles programmatically to maintain formatting consistency across all generated documents. Leverage the document properties API to set metadata like author, title, and creation date for better document management and searchability.

When to Use It?

Use Cases

Generate personalized contracts and proposals by filling templates with customer data and terms automatically. Create monthly reports that combine data from databases with charts and narrative text in Word format. Build document processing pipelines that extract data from submitted Word forms for validation and storage.

Related Topics

Document automation, Microsoft Word, DOCX format, template processing, report generation, and office file handling.

Important Notes

Requirements

Python with the python-docx library installed via pip for reading and writing Word documents. Microsoft Word or compatible software for viewing and editing generated documents, though not required for programmatic generation. Understanding of Word document structure including paragraphs, runs, and styles for effective formatting.

Usage Recommendations

Do: use predefined styles from templates rather than applying direct formatting for consistency. Test generated documents across different platforms and Word versions to ensure compatibility. Handle image sizing explicitly to prevent layout issues in the final document.

Don't: assume all Word features are supported since some advanced formatting may not be available through the library. Modify document XML directly without using the library API since this can corrupt files. Generate extremely large documents with thousands of elements without pagination since this impacts performance.

Limitations

The python-docx library does not support all Word features such as SmartArt, charts, and complex drawing objects. Reading documents created with older Word formats like DOC requires conversion to DOCX first using external tools. Complex table operations like diagonal borders and advanced cell merging may have limited support in the library.