mirror of
				https://github.com/python-pillow/Pillow.git
				synced 2025-11-04 01:47:47 +03:00 
			
		
		
		
	
		
			
				
	
	
		
			87 lines
		
	
	
		
			3.0 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			87 lines
		
	
	
		
			3.0 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
.. _arrow-support:
 | 
						|
 | 
						|
=============
 | 
						|
Arrow support
 | 
						|
=============
 | 
						|
 | 
						|
`Arrow <https://arrow.apache.org/>`__
 | 
						|
is an in-memory data exchange format that is the spiritual
 | 
						|
successor to the NumPy array interface. It provides for zero-copy
 | 
						|
access to columnar data, which in our case is ``Image`` data.
 | 
						|
 | 
						|
The goal with Arrow is to provide native zero-copy interoperability
 | 
						|
with any Arrow provider or consumer in the Python ecosystem.
 | 
						|
 | 
						|
.. warning:: Zero-copy does not mean zero allocation -- the internal
 | 
						|
  memory layout of Pillow images contains an allocation for row
 | 
						|
  pointers, so there is a non-zero, but significantly smaller than a
 | 
						|
  full-copy memory cost to reading an Arrow image.
 | 
						|
 | 
						|
 | 
						|
Data formats
 | 
						|
============
 | 
						|
 | 
						|
Pillow currently supports exporting Arrow images in all modes.
 | 
						|
 | 
						|
For single-band images, the exported array is width*height elements,
 | 
						|
with each pixel corresponding to the appropriate Arrow type.
 | 
						|
 | 
						|
For multiband images, the exported array is width*height fixed-length
 | 
						|
four-element arrays of uint8. This is memory compatible with the raw
 | 
						|
image storage of four bytes per pixel.
 | 
						|
 | 
						|
Mode ``1`` images are exported as one uint8 byte/pixel, as this is
 | 
						|
consistent with the internal storage.
 | 
						|
 | 
						|
Pillow will accept, but not produce, one other format. For any
 | 
						|
multichannel image with 32-bit storage per pixel, Pillow will accept
 | 
						|
an array of width*height int32 elements, which will then be
 | 
						|
interpreted using the mode-specific interpretation of the bytes.
 | 
						|
 | 
						|
The image mode must match the Arrow band format when reading single
 | 
						|
channel images.
 | 
						|
 | 
						|
Memory allocator
 | 
						|
================
 | 
						|
 | 
						|
Pillow's default memory allocator, the :ref:`block_allocator`,
 | 
						|
allocates up to a 16 MB block for images by default. Larger images
 | 
						|
overflow into additional blocks. Arrow requires a single continuous
 | 
						|
memory allocation, so images allocated in multiple blocks cannot be
 | 
						|
exported in the Arrow format.
 | 
						|
 | 
						|
To enable the single block allocator::
 | 
						|
 | 
						|
  from PIL import Image
 | 
						|
  Image.core.set_use_block_allocator(1)
 | 
						|
 | 
						|
Note that this is a global setting, not a per-image setting.
 | 
						|
 | 
						|
Unsupported features
 | 
						|
====================
 | 
						|
 | 
						|
* Table/dataframe protocol. We support a single array.
 | 
						|
* Null markers, producing or consuming. Null values are inferred from
 | 
						|
  the mode, e.g. RGB images are stored in the first three bytes of
 | 
						|
  each 32-bit pixel, and the last byte is an implied null.
 | 
						|
* Schema negotiation. There is an optional schema for the requested
 | 
						|
  datatype in the Arrow source interface. We ignore that
 | 
						|
  parameter.
 | 
						|
* Array metadata.
 | 
						|
 | 
						|
Internal details
 | 
						|
================
 | 
						|
 | 
						|
Python Arrow C interface:
 | 
						|
https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html
 | 
						|
 | 
						|
The memory that is exported from the Arrow interface is shared -- not
 | 
						|
copied, so the lifetime of the memory allocation is no longer strictly
 | 
						|
tied to the life of the Python object.
 | 
						|
 | 
						|
The core imaging struct now has a refcount associated with it, and the
 | 
						|
lifetime of the core image struct is now divorced from the Python
 | 
						|
image object. Creating an arrow reference to the image increments the
 | 
						|
refcount, and the imaging struct is only released when the refcount
 | 
						|
reaches zero.
 |