Merge pull request #2269 from wiredfool/design-docs

Design docs
This commit is contained in:
Hugo 2016-12-09 11:01:04 +02:00 committed by GitHub
commit 7dff4e5a8f
4 changed files with 175 additions and 0 deletions

View File

@ -33,3 +33,4 @@ Reference
PyAccess
../PIL
plugins
internal_design

View File

@ -0,0 +1,8 @@
Internal Reference Docs
=======================
.. toctree::
:maxdepth: 2
open_files
limits

41
docs/reference/limits.rst Normal file
View File

@ -0,0 +1,41 @@
Limits
------
This page is documentation to the various fundamental size limits in
the Pillow implementation.
Internal Limits
===============
* Image sizes cannot be negative. These are checked both in
``Storage.c`` and ``Image.py``
* Image sizes may be 0. (At least, prior to 3.4)
* Maximum pixel dimensions are limited to INT32, or 2^31 by the sizes
in the image header.
* Individual allocations are limited to 2GB in ``Storage.c``
* The 2GB allocation puts an upper limit to the xsize of the image of
either 2^31 for 'L' or 2^29 for 'RGB'
* Individual memory mapped segments are limited to 2GB in map.c based
on the overflow checks. This requires that any memory mapped image
is smaller than 2GB, as calculated by ``y*stride`` (so 2Gpx for 'L'
images, and .5Gpx for 'RGB'
* Any call to internal python size functions for buffers or strings
are currently returned as int32, not py_ssize_t. This limits the
maximum buffer to 2GB for operations like frombytes and frombuffer.
* This also limits the size of buffers converted using a
decoder. (decode.c:127)
Format Size Limits
==================
* ICO: Max size is 256x256
* Webp: 16383x16383 (underlying library size limit:
https://developers.google.com/speed/webp/docs/api)

View File

@ -0,0 +1,125 @@
File Handling in Pillow
=======================
When opening a file as an image, Pillow requires a filename,
pathlib.Path object, or a file-like object. Pillow uses the filename
or Path to open a file, so for the rest of this article, they will all
be treated as a file-like object.
The first four of these items are equivalent, the last is dangerous
and may fail::
from PIL import Image
import io
import pathlib
im = Image.open('test.jpg')
im2 = Image.open(pathlib.Path('test.jpg'))
f = open('test.jpg', 'rb')
im3 = Image.open(f)
with open('test.jpg', 'rb') as f:
im4 = Image.open(io.BytesIO(f.read()))
# Dangerous FAIL:
with open('test.jpg', 'rb') as f:
im5 = Image.open(f)
im5.load() # FAILS, closed file
The documentation specifies that the file will be closed after the
``Image.Image.load()`` method is called. This is an aspirational
specification rather than an accurate reflection of the state of the
code.
Pillow cannot in general close and reopen a file, so any access to
that file needs to be prior to the close.
Issues
------
The current open file handling is inconsistent at best:
* Most of the image plugins do not close the input file.
* Multi-frame images behave badly when seeking through the file, as
it's legal to seek backward in the file until the last image is
read, and then it's not.
* Using the file context manager to provide a file-like object to
Pillow is dangerous unless the context of the image is limited to
the context of the file.
Image Lifecycle
---------------
* ``Image.open()`` called. Path-like objects are opened as a
file. Metadata is read from the open file. The file is left open for
further usage.
* ``Image.Image.load()`` when the pixel data from the image is
required, ``load()`` is called. The current frame is read into
memory. The image can now be used independently of the underlying
image file.
* ``Image.Image.seek()`` in the case of multi-frame images
(e.g. multipage TIFF and animated GIF) the image file left open so
that seek can load the appropriate frame. When the last frame is
read, the image file is closed (at least in some image plugins), and
no more seeks can occur.
* ``Image.Image.close()`` Closes the file pointer and destroys the
core image object. This is used in the Pillow context manager
support. e.g.::
with Image.open('test.jpg') as img:
... # image operations here.
The lifecycle of a single frame image is relatively simple. The file
must remain open until the ``load()`` or ``close()`` function is
called.
Multi-frame images are more complicated. The ``load()`` method is not
a terminal method, so it should not close the underlying file. The
current behavior of ``seek()`` closing the underlying file on
accessing the last frame is presumably a heuristic for closing the
file after iterating through the entire sequence. In general, Pillow
does not know if there are going to be any requests for additional
data until the caller has explicitly closed the image.
Complications
-------------
* TiffImagePlugin has some code to pass the underlying file descriptor
into libtiff (if working on an actual file). Since libtiff closes
the file descriptor internally, it is duplicated prior to passing it
into libtiff.
* ``decoder.handles_eof`` This slightly misnamed flag indicates that
the decoder wants to be called with a 0 length buffer when reads are
done. Despite the comments in ``ImageFile.load()``, the only decoder
that actually uses this flag is the Jpeg2K decoder. The use of this
flag in Jpeg2K predated the change to the decoder that added the
pulls_fd flag, and is therefore not used.
* I don't think that there's any way to make this safe without
changing the lazy loading::
# Dangerous FAIL:
with open('test.jpg', 'rb') as f:
im5 = Image.open(f)
im5.load() # FAILS, closed file
Proposed File Handling
----------------------
* ``Image.Image.load()`` should close the image file, unless there are
multiple frames.
* ``Image.Image.seek()`` should never close the image file.
* Users of the library should call ``Image.Image.close()`` on any
multi-frame image to ensure that the underlying file is closed.