diff --git a/CHANGES.SIMD.rst b/CHANGES.SIMD.rst new file mode 100644 index 000000000..b5cce2787 --- /dev/null +++ b/CHANGES.SIMD.rst @@ -0,0 +1,101 @@ +Changelog (Pillow-SIMD) +======================= + +4.3.0.post0 +----------- + +- Float-based filters, single-band: 3x3 SSE4, 5x5 SSE4 +- Float-based filters, multi-band: 3x3 SSE4 & AVX2, 5x5 SSE4 +- Int-based filters, multi-band: 3x3 SSE4 & AVX2, 5x5 SSE4 & AVX2 +- Box blur: fast path for radius < 1 +- Alpha composite: fast div approximation +- Color conversion: RGB to L SSE4, fast div in RGBa to RGBA +- Resampling: optimized coefficients loading +- Split and get_channel: SSE4 + +3.4.1.post1 +----------- + +- Critical memory error for some combinations of source/destination + sizes is fixed. + +3.4.1.post0 +----------- + +- A lot of optimizations in resampling including 16-bit + intermediate color representation and heavy unrolling. + +3.3.2.post0 +----------- + +- Maintenance release + +3.3.0.post2 +----------- + +- Fixed error in RGBa -> RGBA conversion + +3.3.0.post1 +----------- + +Alpha compositing +~~~~~~~~~~~~~~~~~ + +- SSE4 and AVX2 fixed-point full loading implementation. + Up to 4.6x faster. + +3.3.0.post0 +----------- + +Resampling +~~~~~~~~~~ + +- SSE4 and AVX2 fixed-point full loading horizontal pass. +- SSE4 and AVX2 fixed-point full loading vertical pass. + +Conversion +~~~~~~~~~~ + +- RGBA -> RGBa SSE4 and AVX2 fixed-point full loading implementations. + Up to 2.6x faster. +- RGBa -> RGBA AVX2 implementation using gather instructions. + Up to 5x faster. + + +3.2.0.post3 +----------- + +Resampling +~~~~~~~~~~ + +- SSE4 and AVX2 float full loading horizontal pass. +- SSE4 float full loading vertical pass. + + +3.2.0.post2 +----------- + +Resampling +~~~~~~~~~~ + +- SSE4 and AVX2 float full loading horizontal pass. +- SSE4 float per-pixel loading vertical pass. + + +2.9.0.post1 +----------- + +Resampling +~~~~~~~~~~ + +- SSE4 and AVX2 float per-pixel loading horizontal pass. +- SSE4 float per-pixel loading vertical pass. +- SSE4: Up to 2x for downscaling. Up to 3.5x for upscaling. +- AVX2: Up to 2.7x for downscaling. Up to 3.5x for upscaling. + + +Box blur +~~~~~~~~ + +- Simple SSE4 fixed-point implementations with per-pixel loading. +- Up to 2.1x faster. diff --git a/MANIFEST.in b/MANIFEST.in index 40b2ef5d7..38850bf3a 100644 --- a/MANIFEST.in +++ b/MANIFEST.in @@ -15,6 +15,7 @@ graft depends graft winbuild graft docs prune docs/_static +prune Tests # build/src control detritus exclude .appveyor.yml diff --git a/PyPI.rst b/PyPI.rst new file mode 100644 index 000000000..e63270f75 --- /dev/null +++ b/PyPI.rst @@ -0,0 +1,6 @@ + +`Pillow-SIMD repo and readme `_ + +`Pillow-SIMD changelog `_ + +`Pillow documentation `_ diff --git a/README.md b/README.md new file mode 100644 index 000000000..21b0eca66 --- /dev/null +++ b/README.md @@ -0,0 +1,127 @@ +# Pillow-SIMD + +Pillow-SIMD is "following" [Pillow][original-docs]. +Pillow-SIMD versions are 100% compatible +drop-in replacements for Pillow of the same version. +For example, `Pillow-SIMD 3.2.0.post3` is a drop-in replacement for +`Pillow 3.2.0`, and `Pillow-SIMD 3.3.3.post0` — for `Pillow 3.3.3`. + +For more information on the original Pillow, please refer to: +[read the documentation][original-docs], +[check the changelog][original-changelog] and +[find out how to contribute][original-contribute]. + + +## Why SIMD + +There are multiple ways to tweak image processing performance. +To name a few, such ways can be: utilizing better algorithms, optimizing existing implementations, +using more processing power and/or resources. +One of the great examples of using a more efficient algorithm is [replacing][gaussian-blur-changes] +a convolution-based Gaussian blur with a sequential-box one. + +Such examples are rather rare, though. It is also known, that certain processes might be optimized +by using parallel processing to run the respective routines. +But a more practical key to optimizations might be making things work faster +using the resources at hand. For instance, SIMD computing might be the case. + +SIMD stands for "single instruction, multiple data" and its essence is +in performing the same operation on multiple data points simultaneously +by using multiple processing elements. +Common CPU SIMD instruction sets are MMX, SSE-SSE4, AVX, AVX2, AVX512, NEON. + +Currently, Pillow-SIMD can be [compiled](#installation) with SSE4 (default) or AVX2 support. + + +## Status + +Pillow-SIMD project is production-ready. +The project is supported by Uploadcare, a SAAS for cloud-based image storing and processing. + +[![Uploadcare][uploadcare.logo]][uploadcare.com] + +In fact, Uploadcare has been running Pillow-SIMD for about three years now. + +The following image operations are currently SIMD-accelerated: + +- Resize (convolution-based resampling): SSE4, AVX2 +- Gaussian and box blur: SSE4 +- Alpha composition: SSE4, AVX2 +- RGBA → RGBa (alpha premultiplication): SSE4, AVX2 +- RGBa → RGBA (division by alpha): SSE4, AVX2 +- RGB → L (grayscale): SSE4 +- 3x3 and 5x5 kernel filters: SSE4, AVX2 +- Split and get_channel: SSE4 + + +## Benchmarks + +Tons of tests can be found on the [Pillow Performance][pillow-perf-page] page. +There are benchmarks against different versions of Pillow and Pillow-SIMD +as well as ImageMagick, Skia, OpenCV and IPP. + +The results show that for resizing Pillow is always faster than ImageMagick, +Pillow-SIMD, in turn, is even faster than the original Pillow by the factor of 4-6. +In general, Pillow-SIMD with AVX2 is always **16 to 40 times faster** than +ImageMagick and outperforms Skia, the high-speed graphics library used in Chromium. + + +## Why Pillow itself is so fast + +No cheats involved. We've used identical high-quality resize and blur methods for the benchmark. +Outcomes produced by different libraries are in almost pixel-perfect agreement. +The difference in measured rates is only provided with the performance of every involved algorithm. + + +## Why Pillow-SIMD is even faster + +Because of the SIMD computing, of course. But there's more to it: +heavy loops unrolling, specific instructions, which aren't available for scalar data types. + + +## Why do not contribute SIMD to the original Pillow + +Well, it's not that simple. First of all, the original Pillow supports +a large number of architectures, not just x86. +But even for x86 platforms, Pillow is often distributed via precompiled binaries. +In order for us to integrate SIMD into the precompiled binaries +we'd need to execute runtime CPU capabilities checks. +To compile the code this way we need to pass the `-mavx2` option to the compiler. +But with the option included, a compiler will inject AVX instructions even +for SSE functions (i.e. interchange them) since every SSE instruction has its AVX equivalent. +So there is no easy way to compile such library, especially with setuptools. + + +## Installation + +If there's a copy of the original Pillow installed, it has to be removed first +with `$ pip uninstall -y pillow`. +The installation itself is simple just as running `$ pip install pillow-simd`, +and if you're using SSE4-capable CPU everything should run smoothly. +If you'd like to install the AVX2-enabled version, +you need to pass the additional flag to a C compiler. +The easiest way to do so is to define the `CC` variable during the compilation. + +```bash +$ pip uninstall pillow +$ CC="cc -mavx2" pip install -U --force-reinstall pillow-simd +``` + + +## Contributing to Pillow-SIMD + +Please be aware that Pillow-SIMD and Pillow are two separate projects. +Please submit bugs and improvements not related to SIMD to the [original Pillow][original-issues]. +All bugfixes to the original Pillow will then be transferred to the next Pillow-SIMD version automatically. + + + [original-homepage]: https://python-pillow.org/ + [original-docs]: https://pillow.readthedocs.io/ + [original-issues]: https://github.com/python-pillow/Pillow/issues/new + [original-changelog]: https://github.com/python-pillow/Pillow/blob/master/CHANGES.rst + [original-contribute]: https://github.com/python-pillow/Pillow/blob/master/.github/CONTRIBUTING.md + [gaussian-blur-changes]: https://pillow.readthedocs.io/en/3.2.x/releasenotes/2.7.0.html#gaussian-blur-and-unsharp-mask + [pillow-perf-page]: https://python-pillow.github.io/pillow-perf/ + [pillow-perf-repo]: https://github.com/python-pillow/pillow-perf + [uploadcare.com]: https://uploadcare.com/?utm_source=github&utm_medium=description&utm_campaign=pillow-simd + [uploadcare.logo]: https://ucarecdn.com/74c4d283-f7cf-45d7-924c-fc77345585af/uploadcare.svg diff --git a/README.rst b/README.rst deleted file mode 100644 index b88a103b0..000000000 --- a/README.rst +++ /dev/null @@ -1,77 +0,0 @@ -Pillow -====== - -Python Imaging Library (Fork) ------------------------------ - -Pillow is the friendly PIL fork by `Alex Clark and Contributors `_. PIL is the Python Imaging Library by Fredrik Lundh and Contributors. - -.. start-badges - -.. list-table:: - :stub-columns: 1 - - * - docs - - |docs| - * - tests - - |linux| |macos| |windows| |coverage| - * - package - - |zenodo| |version| - * - social - - |gitter| |twitter| - -.. |docs| image:: https://readthedocs.org/projects/pillow/badge/?version=latest - :target: https://pillow.readthedocs.io/?badge=latest - :alt: Documentation Status - -.. |linux| image:: https://img.shields.io/travis/python-pillow/Pillow/master.svg?label=Linux%20build - :target: https://travis-ci.org/python-pillow/Pillow - :alt: Travis CI build status (Linux) - -.. |macos| image:: https://img.shields.io/travis/python-pillow/pillow-wheels/latest.svg?label=macOS%20build - :target: https://travis-ci.org/python-pillow/pillow-wheels - :alt: Travis CI build status (macOS) - -.. |windows| image:: https://img.shields.io/appveyor/ci/python-pillow/Pillow/master.svg?label=Windows%20build - :target: https://ci.appveyor.com/project/python-pillow/Pillow - :alt: AppVeyor CI build status (Windows) - -.. |coverage| image:: https://coveralls.io/repos/python-pillow/Pillow/badge.svg?branch=master&service=github - :target: https://coveralls.io/github/python-pillow/Pillow?branch=master - :alt: Code coverage - -.. |zenodo| image:: https://zenodo.org/badge/17549/python-pillow/Pillow.svg - :target: https://zenodo.org/badge/latestdoi/17549/python-pillow/Pillow - -.. |version| image:: https://img.shields.io/pypi/v/pillow.svg - :target: https://pypi.org/project/Pillow/ - :alt: Latest PyPI version - -.. |gitter| image:: https://badges.gitter.im/python-pillow/Pillow.svg - :target: https://gitter.im/python-pillow/Pillow?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge - :alt: Join the chat at https://gitter.im/python-pillow/Pillow - -.. |twitter| image:: https://img.shields.io/badge/tweet-on%20Twitter-00aced.svg - :target: https://twitter.com/PythonPillow - :alt: Follow on https://twitter.com/PythonPillow - -.. end-badges - - - -More Information ----------------- - -- `Documentation `_ - - - `Installation `_ - - `Handbook `_ - -- `Contribute `_ - - - `Issues `_ - - `Pull requests `_ - -- `Changelog `_ - - - `Pre-fork `_ diff --git a/setup.py b/setup.py index 15d81e465..6c2e78e73 100755 --- a/setup.py +++ b/setup.py @@ -134,7 +134,7 @@ except (ImportError, OSError): # pypy emits an oserror _tkinter = None -NAME = 'Pillow' +NAME = 'Pillow-SIMD' PILLOW_VERSION = get_version() JPEG_ROOT = None JPEG2K_ROOT = None @@ -630,7 +630,8 @@ class pil_build_ext(build_ext): exts = [(Extension("PIL._imaging", files, libraries=libs, - define_macros=defs))] + define_macros=defs, + extra_compile_args=['-msse4']))] # # additional libraries @@ -767,10 +768,10 @@ try: setup(name=NAME, version=PILLOW_VERSION, description='Python Imaging Library (Fork)', - long_description=_read('README.rst').decode('utf-8'), + long_description=_read('PyPI.rst').decode('utf-8'), author='Alex Clark (Fork Author)', author_email='aclark@aclark.net', - url='http://python-pillow.org', + url='https://github.com/uploadcare/pillow-simd', classifiers=[ "Development Status :: 6 - Mature", "Topic :: Multimedia :: Graphics", diff --git a/src/PIL/_version.py b/src/PIL/_version.py index b5e4f0d75..eee9c701d 100644 --- a/src/PIL/_version.py +++ b/src/PIL/_version.py @@ -1,2 +1,2 @@ # Master version for Pillow -__version__ = '5.3.0' +__version__ = '5.3.0.post0'