diff --git a/docs/reference/index.rst b/docs/reference/index.rst index 555bd2a57..1dffcc9e2 100644 --- a/docs/reference/index.rst +++ b/docs/reference/index.rst @@ -33,3 +33,4 @@ Reference PyAccess ../PIL plugins + internal_design diff --git a/docs/reference/internal_design.rst b/docs/reference/internal_design.rst new file mode 100644 index 000000000..a8d6e2284 --- /dev/null +++ b/docs/reference/internal_design.rst @@ -0,0 +1,8 @@ +Internal Reference Docs +======================= + +.. toctree:: + :maxdepth: 2 + + open_files + limits diff --git a/docs/reference/limits.rst b/docs/reference/limits.rst new file mode 100644 index 000000000..6f81c9e65 --- /dev/null +++ b/docs/reference/limits.rst @@ -0,0 +1,41 @@ +Limits +------ + +This page is documentation to the various fundamental size limits in +the Pillow implementation. + +Internal Limits +=============== + +* Image sizes cannot be negative. These are checked both in + ``Storage.c`` and ``Image.py`` + +* Image sizes may be 0. (At least, prior to 3.4) + +* Maximum pixel dimensions are limited to INT32, or 2^31 by the sizes + in the image header. + +* Individual allocations are limited to 2GB in ``Storage.c`` + +* The 2GB allocation puts an upper limit to the xsize of the image of + either 2^31 for 'L' or 2^29 for 'RGB' + +* Individual memory mapped segments are limited to 2GB in map.c based + on the overflow checks. This requires that any memory mapped image + is smaller than 2GB, as calculated by ``y*stride`` (so 2Gpx for 'L' + images, and .5Gpx for 'RGB' + +* Any call to internal python size functions for buffers or strings + are currently returned as int32, not py_ssize_t. This limits the + maximum buffer to 2GB for operations like frombytes and frombuffer. + +* This also limits the size of buffers converted using a + decoder. (decode.c:127) + +Format Size Limits +================== + +* ICO: Max size is 256x256 + +* Webp: 16383x16383 (underlying library size limit: + https://developers.google.com/speed/webp/docs/api) diff --git a/docs/reference/open_files.rst b/docs/reference/open_files.rst new file mode 100644 index 000000000..143eb7209 --- /dev/null +++ b/docs/reference/open_files.rst @@ -0,0 +1,125 @@ +File Handling in Pillow +======================= + +When opening a file as an image, Pillow requires a filename, +pathlib.Path object, or a file-like object. Pillow uses the filename +or Path to open a file, so for the rest of this article, they will all +be treated as a file-like object. + +The first four of these items are equivalent, the last is dangerous +and may fail:: + + from PIL import Image + import io + import pathlib + + im = Image.open('test.jpg') + + im2 = Image.open(pathlib.Path('test.jpg')) + + f = open('test.jpg', 'rb') + im3 = Image.open(f) + + with open('test.jpg', 'rb') as f: + im4 = Image.open(io.BytesIO(f.read())) + + # Dangerous FAIL: + with open('test.jpg', 'rb') as f: + im5 = Image.open(f) + im5.load() # FAILS, closed file + +The documentation specifies that the file will be closed after the +``Image.Image.load()`` method is called. This is an aspirational +specification rather than an accurate reflection of the state of the +code. + +Pillow cannot in general close and reopen a file, so any access to +that file needs to be prior to the close. + +Issues +------ + +The current open file handling is inconsistent at best: + +* Most of the image plugins do not close the input file. +* Multi-frame images behave badly when seeking through the file, as + it's legal to seek backward in the file until the last image is + read, and then it's not. +* Using the file context manager to provide a file-like object to + Pillow is dangerous unless the context of the image is limited to + the context of the file. + +Image Lifecycle +--------------- + +* ``Image.open()`` called. Path-like objects are opened as a + file. Metadata is read from the open file. The file is left open for + further usage. + +* ``Image.Image.load()`` when the pixel data from the image is + required, ``load()`` is called. The current frame is read into + memory. The image can now be used independently of the underlying + image file. + +* ``Image.Image.seek()`` in the case of multi-frame images + (e.g. multipage TIFF and animated GIF) the image file left open so + that seek can load the appropriate frame. When the last frame is + read, the image file is closed (at least in some image plugins), and + no more seeks can occur. + +* ``Image.Image.close()`` Closes the file pointer and destroys the + core image object. This is used in the Pillow context manager + support. e.g.:: + + with Image.open('test.jpg') as img: + ... # image operations here. + + +The lifecycle of a single frame image is relatively simple. The file +must remain open until the ``load()`` or ``close()`` function is +called. + +Multi-frame images are more complicated. The ``load()`` method is not +a terminal method, so it should not close the underlying file. The +current behavior of ``seek()`` closing the underlying file on +accessing the last frame is presumably a heuristic for closing the +file after iterating through the entire sequence. In general, Pillow +does not know if there are going to be any requests for additional +data until the caller has explicitly closed the image. + + +Complications +------------- + +* TiffImagePlugin has some code to pass the underlying file descriptor + into libtiff (if working on an actual file). Since libtiff closes + the file descriptor internally, it is duplicated prior to passing it + into libtiff. + +* ``decoder.handles_eof`` This slightly misnamed flag indicates that + the decoder wants to be called with a 0 length buffer when reads are + done. Despite the comments in ``ImageFile.load()``, the only decoder + that actually uses this flag is the Jpeg2K decoder. The use of this + flag in Jpeg2K predated the change to the decoder that added the + pulls_fd flag, and is therefore not used. + +* I don't think that there's any way to make this safe without + changing the lazy loading:: + + # Dangerous FAIL: + with open('test.jpg', 'rb') as f: + im5 = Image.open(f) + im5.load() # FAILS, closed file + + +Proposed File Handling +---------------------- + +* ``Image.Image.load()`` should close the image file, unless there are + multiple frames. + +* ``Image.Image.seek()`` should never close the image file. + +* Users of the library should call ``Image.Image.close()`` on any + multi-frame image to ensure that the underlying file is closed. +