SIMD. Rewritten the Pillow-SIMD readme

SIMD. Updated according to the review

SIMD. fix markup
This commit is contained in:
Elijah 2016-10-07 16:54:22 +05:00 committed by Alexander Karpinsky
parent 5628c31bb3
commit 9f511d459a

152
README.md
View File

@ -1,12 +1,12 @@
# Pillow-SIMD # Pillow-SIMD
Pillow-SIMD is "following" Pillow fork (which is PIL fork itself). Pillow-SIMD is "following" Pillow (which is a PIL's fork itself).
"Following" means than Pillow-SIMD versions are 100% compatible "Following" here means than Pillow-SIMD versions are 100% compatible
drop-in replacement for Pillow with the same version number. drop-in replacements for Pillow of the same version.
For example, `Pillow-SIMD 3.2.0.post3` is drop-in replacement for For example, `Pillow-SIMD 3.2.0.post3` is a drop-in replacement for
`Pillow 3.2.0` and `Pillow-SIMD 3.3.3.post0` for `Pillow 3.3.3`. `Pillow 3.2.0`, and `Pillow-SIMD 3.3.3.post0` for `Pillow 3.3.3`.
For more information about original Pillow, please For more information on the original Pillow, please refer to:
[read the documentation][original-docs], [read the documentation][original-docs],
[check the changelog][original-changelog] and [check the changelog][original-changelog] and
[find out how to contribute][original-contribute]. [find out how to contribute][original-contribute].
@ -14,35 +14,35 @@ For more information about original Pillow, please
## Why SIMD ## Why SIMD
There are many ways to improve the performance of image processing. There are multiple ways to tweak image processing performance.
You can use better algorithms for the same task, you can make better To name a few, such ways can be: utilizing better algorithms, optimizing existing implementations,
implementation for current algorithms, or you can use more processing unit using more processing power and/or resources.
resources. It is perfect when you can just use more efficient algorithm like One of the great examples of using a more efficient algorithm is [replacing][gaussian-blur-changes]
when gaussian blur based on convolutions [was replaced][gaussian-blur-changes] a convolution-based Gaussian blur with a sequential-box one.
by sequential box filters. But a number of such improvements are very limited.
It is also very tempting to use more processor unit resources
(via parallelization) when they are available. But it is handier just
to make things faster on the same resources. And that is where SIMD works better.
SIMD stands for "single instruction, multiple data". This is a way to perform Such examples are rather rare, though. It is also known, that certain processes might be optimized
same operations against the huge amount of homogeneous data. by using parallel processing to run the respective routines.
Modern CPU have different SIMD instructions sets like But a more practical key to optimizations might be making things work faster
MMX, SSE-SSE4, AVX, AVX2, AVX512, NEON. using the resources at hand. For instance, SIMD computing might be the case.
Currently, Pillow-SIMD can be [compiled](#installation) with SSE4 (default) SIMD stands for "single instruction, multiple data" and its essence is
and AVX2 support. in performing the same operation on multiple data points simultaneously
by using multiple processing elements.
Common CPU SIMD instruction sets are MMX, SSE-SSE4, AVX, AVX2, AVX512, NEON.
Currently, Pillow-SIMD can be [compiled](#installation) with SSE4 (default) or AVX2 support.
## Status ## Status
Pillow-SIMD project is production-ready.
The project is supported by Uploadcare, a SAAS for cloud-based image storing and processing.
[![Uploadcare][uploadcare.logo]][uploadcare.com] [![Uploadcare][uploadcare.logo]][uploadcare.com]
Pillow-SIMD can be used in production. Pillow-SIMD has been operating on In fact, Uploadcare has been running Pillow-SIMD for about two years now.
[Uploadcare][uploadcare.com] servers for more than 1 year.
Uploadcare is SAAS for image storing and processing in the cloud
and the main sponsor of Pillow-SIMD project.
Currently, following operations are accelerated: The following image operations are currently SIMD-accelerated:
- Resize (convolution-based resampling): SSE4, AVX2 - Resize (convolution-based resampling): SSE4, AVX2
- Gaussian and box blur: SSE4 - Gaussian and box blur: SSE4
@ -50,14 +50,17 @@ Currently, following operations are accelerated:
- RGBA → RGBa (alpha premultiplication): SSE4, AVX2 - RGBA → RGBa (alpha premultiplication): SSE4, AVX2
- RGBa → RGBA (division by alpha): AVX2 - RGBa → RGBA (division by alpha): AVX2
See [CHANGES](CHANGES.SIMD.rst). See [CHANGES](CHANGES.SIMD.rst) for more information.
## Benchmarks ## Benchmarks
The numbers in the table represent processed megapixels of source RGB 2560x1600 In order for you to clearly assess the productivity of implementing SIMD computing into Pillow image processing,
image per second. For example, if resize of 2560x1600 image is done we ran a number of benchmarks. The respective results can be found in the table below (the more — the better).
in 0.5 seconds, the result will be 8.2 Mpx/s. The numbers represent processing rates in megapixels per second (Mpx/s).
For instance, the rate at which a 2560x1600 RGB image is processed in 0.5 seconds equals to 8.2 Mpx/s.
Here is the list of libraries and their versions we've been up to during the benchmarks:
- Skia 53 - Skia 53
- ImageMagick 6.9.3-8 Q8 x86_64 - ImageMagick 6.9.3-8 Q8 x86_64
@ -83,89 +86,84 @@ Operation | Filter | IM | Pillow| SIMD SSE4| SIMD AVX2| Skia 53
| 100px | 0.34| 16.93| 35.53| | | 100px | 0.34| 16.93| 35.53| |
### Some conclusion ### A brief conclusion
Pillow is always faster than ImageMagick. And Pillow-SIMD is faster The results show that Pillow is always faster than ImageMagick,
than Pillow in 4—5 times. In general, Pillow-SIMD with AVX2 always Pillow-SIMD, in turn, is even faster than the original Pillow by the factor of 4-5.
**16-40 times faster** than ImageMagick and overperforms Skia, In general, Pillow-SIMD with AVX2 is always **16 to 40 times faster** than
high-speed graphics library used in Chromium, up to 2 times. ImageMagick and outperforms Skia, the high-speed graphics library used in Chromium.
### Methodology ### Methodology
All tests were performed on Ubuntu 14.04 64-bit running on All rates were measured using the following setup: Ubuntu 14.04 64-bit,
Intel Core i5 4258U with AVX2 CPU on the single thread. single-thread AVX2-enabled Intel i5 4258U CPU.
ImageMagick performance was measured with the `convert` command-line tool
ImageMagick performance was measured with command-line tool `convert` with followed by `-verbose` and `-bench` arguments.
`-verbose` and `-bench` arguments. I use command line because Such approach was used because there's usually a need in testing
I need to test the latest version and this is the easiest way to do that. the latest software versions and command-line is the easiest way to do that.
All the routines involved with the testing procedure produced identic results.
All operations produce exactly the same results.
Resizing filters compliance: Resizing filters compliance:
- PIL.Image.BILINEAR == Triangle - PIL.Image.BILINEAR == Triangle
- PIL.Image.BICUBIC == Catrom - PIL.Image.BICUBIC == Catrom
- PIL.Image.LANCZOS == Lanczos - PIL.Image.LANCZOS == Lanczos
In ImageMagick, the radius of gaussian blur is called sigma and the second In ImageMagick, Gaussian blur operation invokes two parameters:
parameter is called radius. In fact, there should not be additional parameters the first is called 'radius' and the second is called 'sigma'.
for *gaussian blur*, because if the radius is too small, this is *not* In fact, in order for the blur operation to be Gaussian, there should be no additional parameters.
gaussian blur anymore. And if the radius is big this does not give any When the radius value is too small the blur procedure ceases to be Gaussian and
advantages but makes operation slower. For the test, I set the radius if the value is excessively big the operation gets slowed down with zero benefits in exchange.
to sigma × 2.5. For the benchmarking purposes, the radius was set to `sigma × 2.5`.
Following script was used for testing: Following script was used for the benchmarking procedure:
https://gist.github.com/homm/f9b8d8a84a57a7e51f9c2a5828e40e63 https://gist.github.com/homm/f9b8d8a84a57a7e51f9c2a5828e40e63
## Why Pillow itself is so fast ## Why Pillow itself is so fast
There are no cheats. High-quality resize and blur methods are used for all No cheats involved. We've used identical high-quality resize and blur methods for the benchmark.
benchmarks. Results are almost pixel-perfect. The difference is only effective Outcomes produced by different libraries are in almost pixel-perfect agreement.
algorithms. Resampling in Pillow was rewritten in version 2.7 with The difference in measured rates is only provided with the performance of every involved algorithm.
minimal usage of floating point numbers, precomputed coefficients and
cache-awareness transposition. This result was improved in 3.3 & 3.4 with
integer-only arithmetics and other optimizations.
## Why Pillow-SIMD is even faster ## Why Pillow-SIMD is even faster
Because of SIMD, of course. But this is not all. Heavy loops unrolling, Because of the SIMD computing, of course. But there's more to it:
specific instructions, which not available for scalar. heavy loops unrolling, specific instructions, which aren't available for scalar data types.
## Why do not contribute SIMD to the original Pillow ## Why do not contribute SIMD to the original Pillow
Well, that's not simple. First of all, Pillow supports a large number Well, it's not that simple. First of all, the original Pillow supports
of architectures, not only x86. But even for x86 platforms, Pillow is often a large number of architectures, not just x86.
distributed via precompiled binaries. To integrate SIMD in precompiled binaries But even for x86 platforms, Pillow is often distributed via precompiled binaries.
we need to do runtime checks of CPU capabilities. In order for us to integrate SIMD into the precompiled binaries
To compile the code with runtime checks we need to pass `-mavx2` option we'd need to execute runtime CPU capabilities checks.
to the compiler. But with that option compiller will inject AVX instructions To compile the code this way we need to pass the `-mavx2` option to the compiler.
enev for SSE functions, because every SSE instruction has AVX equivalent. But with the option included, a compiler will inject AVX instructions even
for SSE functions (i.e. interchange them) since every SSE instruction has its AVX equivalent.
So there is no easy way to compile such library, especially with setuptools. So there is no easy way to compile such library, especially with setuptools.
## Installation ## Installation
In general, you need to do `pip install pillow-simd` as always and if you If there's a copy of the original Pillow installed, it has to be removed first
are using SSE4-capable CPU everything should run smoothly. with `$ pip uninstall -y pillow`.
Do not forget to remove original Pillow package first. The installation itself is simple just as running `$ pip install pillow-simd`,
and if you're using SSE4-capable CPU everything should run smoothly.
If you want the AVX2-enabled version, you need to pass the additional flag to C If you'd like to install the AVX2-enabled version,
compiler. The easiest way to do that is define `CC` variable while compilation. you need to pass the additional flag to a C compiler.
The easiest way to do so is to define the `CC` variable during the compilation.
```bash ```bash
$ pip uninstall pillow $ pip uninstall pillow
$ CC="cc -mavx2" pip install -U --force-reinstall pillow-simd $ CC="cc -mavx2" pip install -U --force-reinstall pillow-simd
``` ```
## Contributing to Pillow-SIMD ## Contributing to Pillow-SIMD
Pillow-SIMD and Pillow are two separate projects. Please be aware that Pillow-SIMD and Pillow are two separate projects.
Please submit bugs and improvements not related to SIMD to Please submit bugs and improvements not related to SIMD to the [original Pillow][original-issues].
[original Pillow][original-issues]. All bugs and fixes in Pillow All bugfixes to the original Pillow will then be transferred to the next Pillow-SIMD version automatically.
will appear in next Pillow-SIMD version automatically.
[original-docs]: http://pillow.readthedocs.io/ [original-docs]: http://pillow.readthedocs.io/