mirror of
https://github.com/python-pillow/Pillow.git
synced 2025-01-26 09:14:27 +03:00
Deprecate vendored version of olefile Python package in favor of upstream
Pillow now requires the olefile Python package through setup.py. This removes Pillow's maintenance of this library by instead relying on and reusing the upstream version. No longer need to regularly update the vendored package and docs. olefile bug fixes and features can go directly upstream. During travis tests, now installs Pillow package before tests; this will also install all dependencies (currently, only olefile).
This commit is contained in:
parent
2e178d7dbb
commit
52b1f46b12
|
@ -42,6 +42,8 @@ install:
|
|||
# libimagequant
|
||||
- pushd depends && ./install_imagequant.sh && popd
|
||||
|
||||
- travis_retry pip install -e .
|
||||
|
||||
before_script:
|
||||
# Qt needs a display for some of the tests, and it's only run on the system site packages install
|
||||
- "export DISPLAY=:99.0"
|
||||
|
|
|
@ -16,11 +16,14 @@
|
|||
#
|
||||
|
||||
|
||||
from PIL import Image, ImageFile
|
||||
from PIL.OleFileIO import i8, i32, MAGIC, OleFileIO
|
||||
from PIL import Image, ImageFile, _binary
|
||||
|
||||
import olefile
|
||||
|
||||
__version__ = "0.1"
|
||||
|
||||
i32 = _binary.i32le
|
||||
i8 = _binary.i8
|
||||
|
||||
# we map from colour field tuples to (mode, rawmode) descriptors
|
||||
MODES = {
|
||||
|
@ -42,7 +45,7 @@ MODES = {
|
|||
# --------------------------------------------------------------------
|
||||
|
||||
def _accept(prefix):
|
||||
return prefix[:8] == MAGIC
|
||||
return prefix[:8] == olefile.MAGIC
|
||||
|
||||
|
||||
##
|
||||
|
@ -59,7 +62,7 @@ class FpxImageFile(ImageFile.ImageFile):
|
|||
# to be a FlashPix file
|
||||
|
||||
try:
|
||||
self.ole = OleFileIO(self.fp)
|
||||
self.ole = olefile.OleFileIO(self.fp)
|
||||
except IOError:
|
||||
raise SyntaxError("not an FPX file; invalid OLE file")
|
||||
|
||||
|
|
|
@ -18,7 +18,8 @@
|
|||
|
||||
|
||||
from PIL import Image, TiffImagePlugin
|
||||
from PIL.OleFileIO import MAGIC, OleFileIO
|
||||
|
||||
import olefile
|
||||
|
||||
__version__ = "0.1"
|
||||
|
||||
|
@ -28,7 +29,7 @@ __version__ = "0.1"
|
|||
|
||||
|
||||
def _accept(prefix):
|
||||
return prefix[:8] == MAGIC
|
||||
return prefix[:8] == olefile.MAGIC
|
||||
|
||||
|
||||
##
|
||||
|
@ -45,7 +46,7 @@ class MicImageFile(TiffImagePlugin.TiffImageFile):
|
|||
# to be a Microsoft Image Composer file
|
||||
|
||||
try:
|
||||
self.ole = OleFileIO(self.fp)
|
||||
self.ole = olefile.OleFileIO(self.fp)
|
||||
except IOError:
|
||||
raise SyntaxError("not an MIC file; invalid OLE file")
|
||||
|
||||
|
|
|
@ -1,180 +0,0 @@
|
|||
olefile (formerly OleFileIO_PL)
|
||||
===============================
|
||||
|
||||
[olefile](http://www.decalage.info/olefile) is a Python package to parse, read and write
|
||||
[Microsoft OLE2 files](http://en.wikipedia.org/wiki/Compound_File_Binary_Format)
|
||||
(also called Structured Storage, Compound File Binary Format or Compound Document File Format),
|
||||
such as Microsoft Office 97-2003 documents, vbaProject.bin in MS Office 2007+ files, Image Composer
|
||||
and FlashPix files, Outlook messages, StickyNotes, several Microscopy file formats, McAfee antivirus quarantine files,
|
||||
etc.
|
||||
|
||||
|
||||
**Quick links:** [Home page](http://www.decalage.info/olefile) -
|
||||
[Download/Install](https://bitbucket.org/decalage/olefileio_pl/wiki/Install) -
|
||||
[Documentation](https://bitbucket.org/decalage/olefileio_pl/wiki) -
|
||||
[Report Issues/Suggestions/Questions](https://bitbucket.org/decalage/olefileio_pl/issues?status=new&status=open) -
|
||||
[Contact the author](http://decalage.info/contact) -
|
||||
[Repository](https://bitbucket.org/decalage/olefileio_pl) -
|
||||
[Updates on Twitter](https://twitter.com/decalage2)
|
||||
|
||||
|
||||
News
|
||||
----
|
||||
|
||||
Follow all updates and news on Twitter: <https://twitter.com/decalage2>
|
||||
|
||||
- **2015-01-25 v0.42**: improved handling of special characters in stream/storage names on Python 2.x (using UTF-8
|
||||
instead of Latin-1), fixed bug in listdir with empty storages.
|
||||
- 2014-11-25 v0.41: OleFileIO.open and isOleFile now support OLE files stored in byte strings, fixed installer for
|
||||
python 3, added support for Jython (Niko Ehrenfeuchter)
|
||||
- 2014-10-01 v0.40: renamed OleFileIO_PL to olefile, added initial write support for streams >4K, updated doc and
|
||||
license, improved the setup script.
|
||||
- 2014-07-27 v0.31: fixed support for large files with 4K sectors, thanks to Niko Ehrenfeuchter, Martijn Berger and
|
||||
Dave Jones. Added test scripts from Pillow (by hugovk). Fixed setup for Python 3 (Martin Panter)
|
||||
- 2014-02-04 v0.30: now compatible with Python 3.x, thanks to Martin Panter who did most of the hard work.
|
||||
- 2013-07-24 v0.26: added methods to parse stream/storage timestamps, improved listdir to include storages, fixed
|
||||
parsing of direntry timestamps
|
||||
- 2013-05-27 v0.25: improved metadata extraction, properties parsing and exception handling, fixed
|
||||
[issue #12](https://bitbucket.org/decalage/olefileio_pl/issue/12/error-when-converting-timestamps-in-ole)
|
||||
- 2013-05-07 v0.24: new features to extract metadata (get\_metadata method and OleMetadata class), improved
|
||||
getproperties to convert timestamps to Python datetime
|
||||
- 2012-10-09: published [python-oletools](http://www.decalage.info/python/oletools), a package of analysis tools based
|
||||
on OleFileIO_PL
|
||||
- 2012-09-11 v0.23: added support for file-like objects, fixed [issue #8](https://bitbucket.org/decalage/olefileio_pl/issue/8/bug-with-file-object)
|
||||
- 2012-02-17 v0.22: fixed issues #7 (bug in getproperties) and #2 (added close method)
|
||||
- 2011-10-20: code hosted on bitbucket to ease contributions and bug tracking
|
||||
- 2010-01-24 v0.21: fixed support for big-endian CPUs, such as PowerPC Macs.
|
||||
- 2009-12-11 v0.20: small bugfix in OleFileIO.open when filename is not plain str.
|
||||
- 2009-12-10 v0.19: fixed support for 64 bits platforms (thanks to Ben G. and Martijn for reporting the bug)
|
||||
- see changelog in source code for more info.
|
||||
|
||||
Download/Install
|
||||
----------------
|
||||
|
||||
If you have pip or setuptools installed (pip is included in Python 2.7.9+), you may simply run **pip install olefile**
|
||||
or **easy_install olefile** for the first installation.
|
||||
|
||||
To update olefile, run **pip install -U olefile**.
|
||||
|
||||
Otherwise, see https://bitbucket.org/decalage/olefileio_pl/wiki/Install
|
||||
|
||||
Features
|
||||
--------
|
||||
|
||||
- Parse, read and write any OLE file such as Microsoft Office 97-2003 legacy document formats (Word .doc, Excel .xls,
|
||||
PowerPoint .ppt, Visio .vsd, Project .mpp), Image Composer and FlashPix files, Outlook messages, StickyNotes,
|
||||
Zeiss AxioVision ZVI files, Olympus FluoView OIB files, etc
|
||||
- List all the streams and storages contained in an OLE file
|
||||
- Open streams as files
|
||||
- Parse and read property streams, containing metadata of the file
|
||||
- Portable, pure Python module, no dependency
|
||||
|
||||
olefile can be used as an independent package or with PIL/Pillow.
|
||||
|
||||
olefile is mostly meant for developers. If you are looking for tools to analyze OLE files or to extract data (especially
|
||||
for security purposes such as malware analysis and forensics), then please also check my
|
||||
[python-oletools](http://www.decalage.info/python/oletools), which are built upon olefile and provide a higher-level interface.
|
||||
|
||||
|
||||
History
|
||||
-------
|
||||
|
||||
olefile is based on the OleFileIO module from [PIL](http://www.pythonware.com/products/pil/index.htm), the excellent
|
||||
Python Imaging Library, created and maintained by Fredrik Lundh. The olefile API is still compatible with PIL, but
|
||||
since 2005 I have improved the internal implementation significantly, with new features, bugfixes and a more robust
|
||||
design. From 2005 to 2014 the project was called OleFileIO_PL, and in 2014 I changed its name to olefile to celebrate
|
||||
its 9 years and its new write features.
|
||||
|
||||
As far as I know, olefile is the most complete and robust Python implementation to read MS OLE2 files, portable on
|
||||
several operating systems. (please tell me if you know other similar Python modules)
|
||||
|
||||
Since 2014 olefile/OleFileIO_PL has been integrated into [Pillow](http://python-pillow.org), the friendly fork
|
||||
of PIL. olefile will continue to be improved as a separate project, and new versions will be merged into Pillow
|
||||
regularly.
|
||||
|
||||
|
||||
Main improvements over the original version of OleFileIO in PIL:
|
||||
----------------------------------------------------------------
|
||||
|
||||
- Compatible with Python 3.x and 2.6+
|
||||
- Many bug fixes
|
||||
- Support for files larger than 6.8MB
|
||||
- Support for 64 bits platforms and big-endian CPUs
|
||||
- Robust: many checks to detect malformed files
|
||||
- Runtime option to choose if malformed files should be parsed or raise exceptions
|
||||
- Improved API
|
||||
- Metadata extraction, stream/storage timestamps (e.g. for document forensics)
|
||||
- Can open file-like objects
|
||||
- Added setup.py and install.bat to ease installation
|
||||
- More convenient slash-based syntax for stream paths
|
||||
- Write features
|
||||
|
||||
Documentation
|
||||
-------------
|
||||
|
||||
Please see the [online documentation](https://bitbucket.org/decalage/olefileio_pl/wiki) for more information,
|
||||
especially the [OLE overview](https://bitbucket.org/decalage/olefileio_pl/wiki/OLE_Overview) and the
|
||||
[API page](https://bitbucket.org/decalage/olefileio_pl/wiki/API) which describe how to use olefile in Python applications.
|
||||
A copy of the same documentation is also provided in the doc subfolder of the olefile package.
|
||||
|
||||
|
||||
## Real-life examples ##
|
||||
|
||||
A real-life example: [using OleFileIO_PL for malware analysis and forensics](http://blog.gregback.net/2011/03/using-remnux-for-forensic-puzzle-6/).
|
||||
|
||||
See also [this paper](https://computer-forensics.sans.org/community/papers/gcfa/grow-forensic-tools-taxonomy-python-libraries-helpful-forensic-analysis_6879) about python tools for forensics, which features olefile.
|
||||
|
||||
|
||||
License
|
||||
-------
|
||||
|
||||
olefile (formerly OleFileIO_PL) is copyright (c) 2005-2015 Philippe Lagadec
|
||||
([http://www.decalage.info](http://www.decalage.info))
|
||||
|
||||
All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without modification,
|
||||
are permitted provided that the following conditions are met:
|
||||
|
||||
* Redistributions of source code must retain the above copyright notice, this
|
||||
list of conditions and the following disclaimer.
|
||||
* Redistributions in binary form must reproduce the above copyright notice,
|
||||
this list of conditions and the following disclaimer in the documentation
|
||||
and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
|
||||
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
||||
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
|
||||
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
||||
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
|
||||
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
|
||||
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
|
||||
----------
|
||||
|
||||
olefile is based on source code from the OleFileIO module of the Python Imaging Library (PIL) published by Fredrik
|
||||
Lundh under the following license:
|
||||
|
||||
The Python Imaging Library (PIL) is
|
||||
|
||||
Copyright © 1997-2011 by Secret Labs AB
|
||||
Copyright © 1995-2011 by Fredrik Lundh
|
||||
|
||||
By obtaining, using, and/or copying this software and/or its associated documentation, you agree that you have read,
|
||||
understood, and will comply with the following terms and conditions:
|
||||
|
||||
Permission to use, copy, modify, and distribute this software and its associated documentation for any purpose and
|
||||
without fee is hereby granted, provided that the above copyright notice appears in all copies, and that both that
|
||||
copyright notice and this permission notice appear in supporting documentation, and that the name of Secret Labs AB or
|
||||
the author not be used in advertising or publicity pertaining to distribution of the software without specific, written
|
||||
prior permission.
|
||||
|
||||
SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES
|
||||
OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR BE LIABLE FOR ANY SPECIAL, INDIRECT OR
|
||||
CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF
|
||||
CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS
|
||||
SOFTWARE.
|
2307
PIL/OleFileIO.py
2307
PIL/OleFileIO.py
File diff suppressed because it is too large
Load Diff
|
@ -1,147 +0,0 @@
|
|||
from helper import unittest, PillowTestCase
|
||||
|
||||
import datetime
|
||||
|
||||
import PIL.OleFileIO as OleFileIO
|
||||
|
||||
|
||||
class TestOleFileIo(PillowTestCase):
|
||||
|
||||
def test_isOleFile(self):
|
||||
ole_file = "Tests/images/test-ole-file.doc"
|
||||
|
||||
self.assertTrue(OleFileIO.isOleFile(ole_file))
|
||||
with open(ole_file, 'rb') as fp:
|
||||
self.assertTrue(OleFileIO.isOleFile(fp))
|
||||
self.assertTrue(OleFileIO.isOleFile(fp.read()))
|
||||
|
||||
non_ole_file = "Tests/images/flower.jpg"
|
||||
|
||||
self.assertFalse(OleFileIO.isOleFile(non_ole_file))
|
||||
with open(non_ole_file, 'rb') as fp:
|
||||
self.assertFalse(OleFileIO.isOleFile(fp))
|
||||
self.assertFalse(OleFileIO.isOleFile(fp.read()))
|
||||
|
||||
def test_exists_worddocument(self):
|
||||
# Arrange
|
||||
ole_file = "Tests/images/test-ole-file.doc"
|
||||
ole = OleFileIO.OleFileIO(ole_file)
|
||||
|
||||
# Act
|
||||
exists = ole.exists('worddocument')
|
||||
|
||||
# Assert
|
||||
self.assertTrue(exists)
|
||||
ole.close()
|
||||
|
||||
def test_exists_no_vba_macros(self):
|
||||
# Arrange
|
||||
ole_file = "Tests/images/test-ole-file.doc"
|
||||
ole = OleFileIO.OleFileIO(ole_file)
|
||||
|
||||
# Act
|
||||
exists = ole.exists('macros/vba')
|
||||
|
||||
# Assert
|
||||
self.assertFalse(exists)
|
||||
ole.close()
|
||||
|
||||
def test_get_type(self):
|
||||
# Arrange
|
||||
ole_file = "Tests/images/test-ole-file.doc"
|
||||
ole = OleFileIO.OleFileIO(ole_file)
|
||||
|
||||
# Act
|
||||
entry_type = ole.get_type('worddocument')
|
||||
|
||||
# Assert
|
||||
self.assertEqual(entry_type, OleFileIO.STGTY_STREAM)
|
||||
ole.close()
|
||||
|
||||
def test_get_size(self):
|
||||
# Arrange
|
||||
ole_file = "Tests/images/test-ole-file.doc"
|
||||
ole = OleFileIO.OleFileIO(ole_file)
|
||||
|
||||
# Act
|
||||
size = ole.get_size('worddocument')
|
||||
|
||||
# Assert
|
||||
self.assertGreater(size, 0)
|
||||
ole.close()
|
||||
|
||||
def test_get_rootentry_name(self):
|
||||
# Arrange
|
||||
ole_file = "Tests/images/test-ole-file.doc"
|
||||
ole = OleFileIO.OleFileIO(ole_file)
|
||||
|
||||
# Act
|
||||
root = ole.get_rootentry_name()
|
||||
|
||||
# Assert
|
||||
self.assertEqual(root, "Root Entry")
|
||||
ole.close()
|
||||
|
||||
def test_meta(self):
|
||||
# Arrange
|
||||
ole_file = "Tests/images/test-ole-file.doc"
|
||||
ole = OleFileIO.OleFileIO(ole_file)
|
||||
|
||||
# Act
|
||||
meta = ole.get_metadata()
|
||||
|
||||
# Assert
|
||||
self.assertEqual(meta.author, b"Laurence Ipsum")
|
||||
self.assertEqual(meta.num_pages, 1)
|
||||
ole.close()
|
||||
|
||||
def test_gettimes(self):
|
||||
# Arrange
|
||||
ole_file = "Tests/images/test-ole-file.doc"
|
||||
ole = OleFileIO.OleFileIO(ole_file)
|
||||
root_entry = ole.direntries[0]
|
||||
|
||||
# Act
|
||||
ctime = root_entry.getctime()
|
||||
mtime = root_entry.getmtime()
|
||||
|
||||
# Assert
|
||||
self.assertIsNone(ctime)
|
||||
self.assertIsInstance(mtime, datetime.datetime)
|
||||
self.assertEqual(mtime.year, 2014)
|
||||
ole.close()
|
||||
|
||||
def test_listdir(self):
|
||||
# Arrange
|
||||
ole_file = "Tests/images/test-ole-file.doc"
|
||||
ole = OleFileIO.OleFileIO(ole_file)
|
||||
|
||||
# Act
|
||||
dirlist = ole.listdir()
|
||||
|
||||
# Assert
|
||||
self.assertIn(['WordDocument'], dirlist)
|
||||
ole.close()
|
||||
|
||||
def test_debug(self):
|
||||
# Arrange
|
||||
ole_file = "Tests/images/test-ole-file.doc"
|
||||
ole = OleFileIO.OleFileIO(ole_file)
|
||||
meta = ole.get_metadata()
|
||||
|
||||
# Act
|
||||
OleFileIO.set_debug_mode(True)
|
||||
ole.dumpdirectory()
|
||||
meta.dump()
|
||||
|
||||
OleFileIO.set_debug_mode(False)
|
||||
ole.dumpdirectory()
|
||||
meta.dump()
|
||||
|
||||
# Assert
|
||||
# No assert, just check they run ok
|
||||
ole.close()
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
unittest.main()
|
|
@ -1,364 +0,0 @@
|
|||
.. py:module:: PIL.OleFileIO
|
||||
.. py:currentmodule:: PIL.OleFileIO
|
||||
|
||||
:py:mod:`OleFileIO` Module
|
||||
===========================
|
||||
|
||||
The :py:mod:`OleFileIO` module reads Microsoft OLE2 files (also called
|
||||
Structured Storage or Microsoft Compound Document File Format), such
|
||||
as Microsoft Office documents, Image Composer and FlashPix files, and
|
||||
Outlook messages.
|
||||
|
||||
This module is the `OleFileIO\_PL`_ project by Philippe Lagadec, v0.42,
|
||||
merged back into Pillow.
|
||||
|
||||
.. _OleFileIO\_PL: http://www.decalage.info/python/olefileio
|
||||
|
||||
How to use this module
|
||||
----------------------
|
||||
|
||||
For more information, see also the file **PIL/OleFileIO.py**, sample
|
||||
code at the end of the module itself, and docstrings within the code.
|
||||
|
||||
About the structure of OLE files
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
An OLE file can be seen as a mini file system or a Zip archive: It
|
||||
contains **streams** of data that look like files embedded within the
|
||||
OLE file. Each stream has a name. For example, the main stream of a MS
|
||||
Word document containing its text is named "WordDocument".
|
||||
|
||||
An OLE file can also contain **storages**. A storage is a folder that
|
||||
contains streams or other storages. For example, a MS Word document with
|
||||
VBA macros has a storage called "Macros".
|
||||
|
||||
Special streams can contain **properties**. A property is a specific
|
||||
value that can be used to store information such as the metadata of a
|
||||
document (title, author, creation date, etc). Property stream names
|
||||
usually start with the character '05'.
|
||||
|
||||
For example, a typical MS Word document may look like this:
|
||||
|
||||
::
|
||||
|
||||
\x05DocumentSummaryInformation (stream)
|
||||
\x05SummaryInformation (stream)
|
||||
WordDocument (stream)
|
||||
Macros (storage)
|
||||
PROJECT (stream)
|
||||
PROJECTwm (stream)
|
||||
VBA (storage)
|
||||
Module1 (stream)
|
||||
ThisDocument (stream)
|
||||
_VBA_PROJECT (stream)
|
||||
dir (stream)
|
||||
ObjectPool (storage)
|
||||
|
||||
Test if a file is an OLE container
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Use isOleFile to check if the first bytes of the file contain the Magic
|
||||
for OLE files, before opening it. isOleFile returns True if it is an OLE
|
||||
file, False otherwise.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
assert OleFileIO.isOleFile('myfile.doc')
|
||||
|
||||
Open an OLE file from disk
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Create an OleFileIO object with the file path as parameter:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
ole = OleFileIO.OleFileIO('myfile.doc')
|
||||
|
||||
Open an OLE file from a file-like object
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This is useful if the file is not on disk, e.g. already stored in a
|
||||
string or as a file-like object.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
ole = OleFileIO.OleFileIO(f)
|
||||
|
||||
For example the code below reads a file into a string, then uses BytesIO
|
||||
to turn it into a file-like object.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
data = open('myfile.doc', 'rb').read()
|
||||
f = io.BytesIO(data) # or StringIO.StringIO for Python 2.x
|
||||
ole = OleFileIO.OleFileIO(f)
|
||||
|
||||
How to handle malformed OLE files
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
By default, the parser is configured to be as robust and permissive as
|
||||
possible, allowing to parse most malformed OLE files. Only fatal errors
|
||||
will raise an exception. It is possible to tell the parser to be more
|
||||
strict in order to raise exceptions for files that do not fully conform
|
||||
to the OLE specifications, using the raise\_defect option:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
ole = OleFileIO.OleFileIO('myfile.doc', raise_defects=DEFECT_INCORRECT)
|
||||
|
||||
When the parsing is done, the list of non-fatal issues detected is
|
||||
available as a list in the parsing\_issues attribute of the OleFileIO
|
||||
object:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
print('Non-fatal issues raised during parsing:')
|
||||
if ole.parsing_issues:
|
||||
for exctype, msg in ole.parsing_issues:
|
||||
print('- %s: %s' % (exctype.__name__, msg))
|
||||
else:
|
||||
print('None')
|
||||
|
||||
Syntax for stream and storage path
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Two different syntaxes are allowed for methods that need or return the
|
||||
path of streams and storages:
|
||||
|
||||
1) Either a **list of strings** including all the storages from the root
|
||||
up to the stream/storage name. For example a stream called
|
||||
"WordDocument" at the root will have ['WordDocument'] as full path. A
|
||||
stream called "ThisDocument" located in the storage "Macros/VBA" will
|
||||
be ['Macros', 'VBA', 'ThisDocument']. This is the original syntax
|
||||
from PIL. While hard to read and not very convenient, this syntax
|
||||
works in all cases.
|
||||
|
||||
2) Or a **single string with slashes** to separate storage and stream
|
||||
names (similar to the Unix path syntax). The previous examples would
|
||||
be 'WordDocument' and 'Macros/VBA/ThisDocument'. This syntax is
|
||||
easier, but may fail if a stream or storage name contains a slash.
|
||||
|
||||
Both are case-insensitive.
|
||||
|
||||
Switching between the two is easy:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
slash_path = '/'.join(list_path)
|
||||
list_path = slash_path.split('/')
|
||||
|
||||
Get the list of streams
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
listdir() returns a list of all the streams contained in the OLE file,
|
||||
including those stored in storages. Each stream is listed itself as a
|
||||
list, as described above.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
print(ole.listdir())
|
||||
|
||||
Sample result:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
[['\x01CompObj'], ['\x05DocumentSummaryInformation'], ['\x05SummaryInformation']
|
||||
, ['1Table'], ['Macros', 'PROJECT'], ['Macros', 'PROJECTwm'], ['Macros', 'VBA',
|
||||
'Module1'], ['Macros', 'VBA', 'ThisDocument'], ['Macros', 'VBA', '_VBA_PROJECT']
|
||||
, ['Macros', 'VBA', 'dir'], ['ObjectPool'], ['WordDocument']]
|
||||
|
||||
As an option it is possible to choose if storages should also be listed,
|
||||
with or without streams:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
ole.listdir (streams=False, storages=True)
|
||||
|
||||
Test if known streams/storages exist:
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
exists(path) checks if a given stream or storage exists in the OLE file.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
if ole.exists('worddocument'):
|
||||
print("This is a Word document.")
|
||||
if ole.exists('macros/vba'):
|
||||
print("This document seems to contain VBA macros.")
|
||||
|
||||
Read data from a stream
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
openstream(path) opens a stream as a file-like object.
|
||||
|
||||
The following example extracts the "Pictures" stream from a PPT file:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
pics = ole.openstream('Pictures')
|
||||
data = pics.read()
|
||||
|
||||
|
||||
Get information about a stream/storage
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Several methods can provide the size, type and timestamps of a given
|
||||
stream/storage:
|
||||
|
||||
get\_size(path) returns the size of a stream in bytes:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
s = ole.get_size('WordDocument')
|
||||
|
||||
get\_type(path) returns the type of a stream/storage, as one of the
|
||||
following constants: STGTY\_STREAM for a stream, STGTY\_STORAGE for a
|
||||
storage, STGTY\_ROOT for the root entry, and False for a non existing
|
||||
path.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
t = ole.get_type('WordDocument')
|
||||
|
||||
get\_ctime(path) and get\_mtime(path) return the creation and
|
||||
modification timestamps of a stream/storage, as a Python datetime object
|
||||
with UTC timezone. Please note that these timestamps are only present if
|
||||
the application that created the OLE file explicitly stored them, which
|
||||
is rarely the case. When not present, these methods return None.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
c = ole.get_ctime('WordDocument')
|
||||
m = ole.get_mtime('WordDocument')
|
||||
|
||||
The root storage is a special case: You can get its creation and
|
||||
modification timestamps using the OleFileIO.root attribute:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
c = ole.root.getctime()
|
||||
m = ole.root.getmtime()
|
||||
|
||||
Extract metadata
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
get\_metadata() will check if standard property streams exist, parse all
|
||||
the properties they contain, and return an OleMetadata object with the
|
||||
found properties as attributes.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
meta = ole.get_metadata()
|
||||
print('Author:', meta.author)
|
||||
print('Title:', meta.title)
|
||||
print('Creation date:', meta.create_time)
|
||||
# print all metadata:
|
||||
meta.dump()
|
||||
|
||||
Available attributes include:
|
||||
|
||||
::
|
||||
|
||||
codepage, title, subject, author, keywords, comments, template,
|
||||
last_saved_by, revision_number, total_edit_time, last_printed, create_time,
|
||||
last_saved_time, num_pages, num_words, num_chars, thumbnail,
|
||||
creating_application, security, codepage_doc, category, presentation_target,
|
||||
bytes, lines, paragraphs, slides, notes, hidden_slides, mm_clips,
|
||||
scale_crop, heading_pairs, titles_of_parts, manager, company, links_dirty,
|
||||
chars_with_spaces, unused, shared_doc, link_base, hlinks, hlinks_changed,
|
||||
version, dig_sig, content_type, content_status, language, doc_version
|
||||
|
||||
See the source code of the OleMetadata class for more information.
|
||||
|
||||
Parse a property stream
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
get\_properties(path) can be used to parse any property stream that is
|
||||
not handled by get\_metadata. It returns a dictionary indexed by
|
||||
integers. Each integer is the index of the property, pointing to its
|
||||
value. For example in the standard property stream
|
||||
'05SummaryInformation', the document title is property #2, and the
|
||||
subject is #3.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
p = ole.getproperties('specialprops')
|
||||
|
||||
By default as in the original PIL version, timestamp properties are
|
||||
converted into a number of seconds since Jan 1,1601. With the option
|
||||
convert\_time, you can obtain more convenient Python datetime objects
|
||||
(UTC timezone). If some time properties should not be converted (such as
|
||||
total editing time in '05SummaryInformation'), the list of indexes can
|
||||
be passed as no\_conversion:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
p = ole.getproperties('specialprops', convert_time=True, no_conversion=[10])
|
||||
|
||||
Close the OLE file
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Unless your application is a simple script that terminates after
|
||||
processing an OLE file, do not forget to close each OleFileIO object
|
||||
after parsing to close the file on disk.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
ole.close()
|
||||
|
||||
Use OleFileIO as a script
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
OleFileIO can also be used as a script from the command-line to
|
||||
display the structure of an OLE file and its metadata, for example:
|
||||
|
||||
::
|
||||
|
||||
PIL/OleFileIO.py myfile.doc
|
||||
|
||||
You can use the option -c to check that all streams can be read fully,
|
||||
and -d to generate very verbose debugging information.
|
||||
|
||||
How to contribute
|
||||
-----------------
|
||||
|
||||
The code is available in `a Mercurial repository on
|
||||
bitbucket <https://bitbucket.org/decalage/olefileio_pl>`_. You may use
|
||||
it to submit enhancements or to report any issue.
|
||||
|
||||
If you would like to help us improve this module, or simply provide
|
||||
feedback, please `contact me <http://decalage.info/contact>`_. You can
|
||||
help in many ways:
|
||||
|
||||
- test this module on different platforms / Python versions
|
||||
- find and report bugs
|
||||
- improve documentation, code samples, docstrings
|
||||
- write unittest test cases
|
||||
- provide tricky malformed files
|
||||
|
||||
How to report bugs
|
||||
------------------
|
||||
|
||||
To report a bug, for example a normal file which is not parsed
|
||||
correctly, please use the `issue reporting
|
||||
page <https://bitbucket.org/decalage/olefileio_pl/issues?status=new&status=open>`_,
|
||||
or if you prefer to do it privately, use this `contact
|
||||
form <http://decalage.info/contact>`_. Please provide all the
|
||||
information about the context and how to reproduce the bug.
|
||||
|
||||
If possible please join the debugging output of OleFileIO. For this,
|
||||
launch the following command :
|
||||
|
||||
::
|
||||
|
||||
PIL/OleFileIO.py -d -c file >debug.txt
|
||||
|
||||
|
||||
Classes and Methods
|
||||
-------------------
|
||||
|
||||
.. automodule:: PIL.OleFileIO
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
:noindex:
|
|
@ -27,7 +27,6 @@ Reference
|
|||
ImageWin
|
||||
ExifTags
|
||||
TiffTags
|
||||
OleFileIO
|
||||
PSDraw
|
||||
PixelAccess
|
||||
PyAccess
|
||||
|
|
Loading…
Reference in New Issue
Block a user