doc – OfficeDissector - Document Class

An OOXML Document.

class officedissector.doc.Document(filepath)

A OOXML document.

Variables:
  • filepath – The path to the OOXML document.
  • type – The OOXML document type, eg. Word.
  • is_macro_enabled – True if document is macro enabled.
  • is_template – True if document is a template.
  • parts – List of all Parts in Document
  • parts_by_name – Dictionary of all Parts with name as the key.
  • root_part – Singleton of the RootPart class. Used to represent the virtual root part as the source of a Relationship.
  • relationships – List of Relationships in the Document.
  • relationships_dict – Dictionary of all Relationships with the full Relationship Type as the key.
  • features – Object which contains all Features of the Document
  • core_properties – Object which contains all Core Properties of the Document.
__init__(filepath)

Initialize attributes. Build collections of Parts and Relationships. Parse and provide access to Features and CoreProperties.

Parameters:filepath (string) – the path to the document file
find_relationships_by_type(reltype)

Determine list of all Relationships where reltype matches the end of the Relationship Type.

For example:

>>> rel = doc.find_relationship_by_type('ships/officeDocument')
>>> rel[0].type
'http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument'
Parameters:reltype (string) – Relationship type. Match the end of reltype.
Returns:list of all Relationships with reltype matching the end of reltype
main_part()

Get the main document Part.

Returns:the main document Part
parts_by_content_type(contype)

Determines list of all Parts with a given content-type.

Parameters:contype (string) – content-type
Returns:list of all Parts with content-type
parts_by_content_type_regex(exp)

Determine list of all Parts with content-type matching regex.

Parameters:exp (string) – regular expression to match content-type of parts
Returns:list of all Parts with content-type matching regex
parts_by_relationship_type(reltype)

Determine list of all Parts with incoming Relationships where reltype matches the end of the Relationship Type.

For example:

>>> part = doc.parts_by_relationship_type('ships/officeDocument')
>>> print part[0]
[Part [/word/document.xml]]
>>> part[0].relationships_in()[0].type
'http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument'.
Parameters:reltype (string) – Relationship type. Match the end of reltype.
Returns:list of all Parts with reltype matching the end of the Relationship type
to_json(include_stream=False)

Export this object to JSON

Parameters:include_stream (bool) – Optional - Include base64 encoded stream of all Parts (Default false).
Returns:a JSON encoded string
zip()

Return a Zip object of OOXML located at self.filepath.

Returns:Zip object

Previous topic

OfficeDissector API Documentation

Next topic

part – OfficeDissector - Part Class

This Page