Analyze Microsoft Office documents for
security, malware, and forensics.

DOCX XLSX PPTX and more...

OfficeDissector is a Python toolkit to analyze Microsoft Office Open XML (OOXML) files and documents—the format used by Microsoft Word, Excel, and PowerPoint. It is the most powerful tool for security analysis of Office documents.

Download v1.0 Quick start

import officedissector
doc = officedissector.doc.Document('word.docx')
mp = doc.main_part()
print mp.content_type()
# application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml

Built for security from the ground up

OfficeDissector was built for MIT Lincoln Lab's Cyber Systems Asessment Group, some of the best security analysts in the world. It's designed to be the tool of choice for security analysts, malware reverse engineers, or forensic investigators, analyzing Office documents.

Tested on close to a gigabyte of Office documents

Security tools need to work, no matter what. OfficeDissector's unit test battery achieves nearly 100% code coverage, and has been tested on close to a gigabyte of Office documents, including both reference documents and those gathered from the wild.

Parse everything

OfficeDissector is a complete parser for Office Open XML Documents (which have a 6546 page spec!). It parses document properties, parts, content-type, relationships, embedded objects, multimedia, and comments, and exposes them via a Python interface.
JSON Export

Export anything into JSON

Every aspect of the document is available in JSON, the digital lingua franca. Use OfficeDissector to parse the document, and analyze the results using any language or toolset that can read JSON.

High performance

Tools shouldn't slow you down. OfficeDissector can parse 100 MB of documents in under 20 seconds.
Peformance graph
Python powered

Interactive tool, batch analysis, or Python library

Use OfficeDissector as you like it: As an interactive exploratory tool, using an ipython console; as a a high performance batch analysis system (optionally as a MASTIFF plugin); or as Python 2.7 library.

Quality code, open source

Security analysts are paranoid: they need to know what's going on inside their tools. OfficeDissector is 100% open source, with ample documentation and clear, well commented code. Its innovative architecture draws on the economy of mechanism within OOXML, and leverages industrial strength tools like libxml2.
OfficeDissector source code

OfficeDissector: The most powerful document analysis tool for Microsoft Office security, malware analysis, and forensics.