mirror of
https://github.com/python-pillow/Pillow.git
synced 2025-08-20 20:24:45 +03:00
SIMD. Rewritten the Pillow-SIMD readme
SIMD. Updated according to the review SIMD. fix markup
This commit is contained in:
parent
5628c31bb3
commit
9f511d459a
152
README.md
152
README.md
|
@ -1,12 +1,12 @@
|
|||
# Pillow-SIMD
|
||||
|
||||
Pillow-SIMD is "following" Pillow fork (which is PIL fork itself).
|
||||
"Following" means than Pillow-SIMD versions are 100% compatible
|
||||
drop-in replacement for Pillow with the same version number.
|
||||
For example, `Pillow-SIMD 3.2.0.post3` is drop-in replacement for
|
||||
`Pillow 3.2.0` and `Pillow-SIMD 3.3.3.post0` for `Pillow 3.3.3`.
|
||||
Pillow-SIMD is "following" Pillow (which is a PIL's fork itself).
|
||||
"Following" here means than Pillow-SIMD versions are 100% compatible
|
||||
drop-in replacements for Pillow of the same version.
|
||||
For example, `Pillow-SIMD 3.2.0.post3` is a drop-in replacement for
|
||||
`Pillow 3.2.0`, and `Pillow-SIMD 3.3.3.post0` — for `Pillow 3.3.3`.
|
||||
|
||||
For more information about original Pillow, please
|
||||
For more information on the original Pillow, please refer to:
|
||||
[read the documentation][original-docs],
|
||||
[check the changelog][original-changelog] and
|
||||
[find out how to contribute][original-contribute].
|
||||
|
@ -14,35 +14,35 @@ For more information about original Pillow, please
|
|||
|
||||
## Why SIMD
|
||||
|
||||
There are many ways to improve the performance of image processing.
|
||||
You can use better algorithms for the same task, you can make better
|
||||
implementation for current algorithms, or you can use more processing unit
|
||||
resources. It is perfect when you can just use more efficient algorithm like
|
||||
when gaussian blur based on convolutions [was replaced][gaussian-blur-changes]
|
||||
by sequential box filters. But a number of such improvements are very limited.
|
||||
It is also very tempting to use more processor unit resources
|
||||
(via parallelization) when they are available. But it is handier just
|
||||
to make things faster on the same resources. And that is where SIMD works better.
|
||||
There are multiple ways to tweak image processing performance.
|
||||
To name a few, such ways can be: utilizing better algorithms, optimizing existing implementations,
|
||||
using more processing power and/or resources.
|
||||
One of the great examples of using a more efficient algorithm is [replacing][gaussian-blur-changes]
|
||||
a convolution-based Gaussian blur with a sequential-box one.
|
||||
|
||||
SIMD stands for "single instruction, multiple data". This is a way to perform
|
||||
same operations against the huge amount of homogeneous data.
|
||||
Modern CPU have different SIMD instructions sets like
|
||||
MMX, SSE-SSE4, AVX, AVX2, AVX512, NEON.
|
||||
Such examples are rather rare, though. It is also known, that certain processes might be optimized
|
||||
by using parallel processing to run the respective routines.
|
||||
But a more practical key to optimizations might be making things work faster
|
||||
using the resources at hand. For instance, SIMD computing might be the case.
|
||||
|
||||
Currently, Pillow-SIMD can be [compiled](#installation) with SSE4 (default)
|
||||
and AVX2 support.
|
||||
SIMD stands for "single instruction, multiple data" and its essence is
|
||||
in performing the same operation on multiple data points simultaneously
|
||||
by using multiple processing elements.
|
||||
Common CPU SIMD instruction sets are MMX, SSE-SSE4, AVX, AVX2, AVX512, NEON.
|
||||
|
||||
Currently, Pillow-SIMD can be [compiled](#installation) with SSE4 (default) or AVX2 support.
|
||||
|
||||
|
||||
## Status
|
||||
|
||||
Pillow-SIMD project is production-ready.
|
||||
The project is supported by Uploadcare, a SAAS for cloud-based image storing and processing.
|
||||
|
||||
[![Uploadcare][uploadcare.logo]][uploadcare.com]
|
||||
|
||||
Pillow-SIMD can be used in production. Pillow-SIMD has been operating on
|
||||
[Uploadcare][uploadcare.com] servers for more than 1 year.
|
||||
Uploadcare is SAAS for image storing and processing in the cloud
|
||||
and the main sponsor of Pillow-SIMD project.
|
||||
In fact, Uploadcare has been running Pillow-SIMD for about two years now.
|
||||
|
||||
Currently, following operations are accelerated:
|
||||
The following image operations are currently SIMD-accelerated:
|
||||
|
||||
- Resize (convolution-based resampling): SSE4, AVX2
|
||||
- Gaussian and box blur: SSE4
|
||||
|
@ -50,14 +50,17 @@ Currently, following operations are accelerated:
|
|||
- RGBA → RGBa (alpha premultiplication): SSE4, AVX2
|
||||
- RGBa → RGBA (division by alpha): AVX2
|
||||
|
||||
See [CHANGES](CHANGES.SIMD.rst).
|
||||
See [CHANGES](CHANGES.SIMD.rst) for more information.
|
||||
|
||||
|
||||
|
||||
## Benchmarks
|
||||
|
||||
The numbers in the table represent processed megapixels of source RGB 2560x1600
|
||||
image per second. For example, if resize of 2560x1600 image is done
|
||||
in 0.5 seconds, the result will be 8.2 Mpx/s.
|
||||
In order for you to clearly assess the productivity of implementing SIMD computing into Pillow image processing,
|
||||
we ran a number of benchmarks. The respective results can be found in the table below (the more — the better).
|
||||
The numbers represent processing rates in megapixels per second (Mpx/s).
|
||||
For instance, the rate at which a 2560x1600 RGB image is processed in 0.5 seconds equals to 8.2 Mpx/s.
|
||||
Here is the list of libraries and their versions we've been up to during the benchmarks:
|
||||
|
||||
- Skia 53
|
||||
- ImageMagick 6.9.3-8 Q8 x86_64
|
||||
|
@ -83,89 +86,84 @@ Operation | Filter | IM | Pillow| SIMD SSE4| SIMD AVX2| Skia 53
|
|||
| 100px | 0.34| 16.93| 35.53| |
|
||||
|
||||
|
||||
### Some conclusion
|
||||
### A brief conclusion
|
||||
|
||||
Pillow is always faster than ImageMagick. And Pillow-SIMD is faster
|
||||
than Pillow in 4—5 times. In general, Pillow-SIMD with AVX2 always
|
||||
**16-40 times faster** than ImageMagick and overperforms Skia,
|
||||
high-speed graphics library used in Chromium, up to 2 times.
|
||||
The results show that Pillow is always faster than ImageMagick,
|
||||
Pillow-SIMD, in turn, is even faster than the original Pillow by the factor of 4-5.
|
||||
In general, Pillow-SIMD with AVX2 is always **16 to 40 times faster** than
|
||||
ImageMagick and outperforms Skia, the high-speed graphics library used in Chromium.
|
||||
|
||||
### Methodology
|
||||
|
||||
All tests were performed on Ubuntu 14.04 64-bit running on
|
||||
Intel Core i5 4258U with AVX2 CPU on the single thread.
|
||||
|
||||
ImageMagick performance was measured with command-line tool `convert` with
|
||||
`-verbose` and `-bench` arguments. I use command line because
|
||||
I need to test the latest version and this is the easiest way to do that.
|
||||
|
||||
All operations produce exactly the same results.
|
||||
All rates were measured using the following setup: Ubuntu 14.04 64-bit,
|
||||
single-thread AVX2-enabled Intel i5 4258U CPU.
|
||||
ImageMagick performance was measured with the `convert` command-line tool
|
||||
followed by `-verbose` and `-bench` arguments.
|
||||
Such approach was used because there's usually a need in testing
|
||||
the latest software versions and command-line is the easiest way to do that.
|
||||
All the routines involved with the testing procedure produced identic results.
|
||||
Resizing filters compliance:
|
||||
|
||||
- PIL.Image.BILINEAR == Triangle
|
||||
- PIL.Image.BICUBIC == Catrom
|
||||
- PIL.Image.LANCZOS == Lanczos
|
||||
|
||||
In ImageMagick, the radius of gaussian blur is called sigma and the second
|
||||
parameter is called radius. In fact, there should not be additional parameters
|
||||
for *gaussian blur*, because if the radius is too small, this is *not*
|
||||
gaussian blur anymore. And if the radius is big this does not give any
|
||||
advantages but makes operation slower. For the test, I set the radius
|
||||
to sigma × 2.5.
|
||||
In ImageMagick, Gaussian blur operation invokes two parameters:
|
||||
the first is called 'radius' and the second is called 'sigma'.
|
||||
In fact, in order for the blur operation to be Gaussian, there should be no additional parameters.
|
||||
When the radius value is too small the blur procedure ceases to be Gaussian and
|
||||
if the value is excessively big the operation gets slowed down with zero benefits in exchange.
|
||||
For the benchmarking purposes, the radius was set to `sigma × 2.5`.
|
||||
|
||||
Following script was used for testing:
|
||||
Following script was used for the benchmarking procedure:
|
||||
https://gist.github.com/homm/f9b8d8a84a57a7e51f9c2a5828e40e63
|
||||
|
||||
|
||||
## Why Pillow itself is so fast
|
||||
|
||||
There are no cheats. High-quality resize and blur methods are used for all
|
||||
benchmarks. Results are almost pixel-perfect. The difference is only effective
|
||||
algorithms. Resampling in Pillow was rewritten in version 2.7 with
|
||||
minimal usage of floating point numbers, precomputed coefficients and
|
||||
cache-awareness transposition. This result was improved in 3.3 & 3.4 with
|
||||
integer-only arithmetics and other optimizations.
|
||||
|
||||
No cheats involved. We've used identical high-quality resize and blur methods for the benchmark.
|
||||
Outcomes produced by different libraries are in almost pixel-perfect agreement.
|
||||
The difference in measured rates is only provided with the performance of every involved algorithm.
|
||||
|
||||
## Why Pillow-SIMD is even faster
|
||||
|
||||
Because of SIMD, of course. But this is not all. Heavy loops unrolling,
|
||||
specific instructions, which not available for scalar.
|
||||
Because of the SIMD computing, of course. But there's more to it:
|
||||
heavy loops unrolling, specific instructions, which aren't available for scalar data types.
|
||||
|
||||
|
||||
## Why do not contribute SIMD to the original Pillow
|
||||
|
||||
Well, that's not simple. First of all, Pillow supports a large number
|
||||
of architectures, not only x86. But even for x86 platforms, Pillow is often
|
||||
distributed via precompiled binaries. To integrate SIMD in precompiled binaries
|
||||
we need to do runtime checks of CPU capabilities.
|
||||
To compile the code with runtime checks we need to pass `-mavx2` option
|
||||
to the compiler. But with that option compiller will inject AVX instructions
|
||||
enev for SSE functions, because every SSE instruction has AVX equivalent.
|
||||
Well, it's not that simple. First of all, the original Pillow supports
|
||||
a large number of architectures, not just x86.
|
||||
But even for x86 platforms, Pillow is often distributed via precompiled binaries.
|
||||
In order for us to integrate SIMD into the precompiled binaries
|
||||
we'd need to execute runtime CPU capabilities checks.
|
||||
To compile the code this way we need to pass the `-mavx2` option to the compiler.
|
||||
But with the option included, a compiler will inject AVX instructions even
|
||||
for SSE functions (i.e. interchange them) since every SSE instruction has its AVX equivalent.
|
||||
So there is no easy way to compile such library, especially with setuptools.
|
||||
|
||||
|
||||
## Installation
|
||||
|
||||
In general, you need to do `pip install pillow-simd` as always and if you
|
||||
are using SSE4-capable CPU everything should run smoothly.
|
||||
Do not forget to remove original Pillow package first.
|
||||
|
||||
If you want the AVX2-enabled version, you need to pass the additional flag to C
|
||||
compiler. The easiest way to do that is define `CC` variable while compilation.
|
||||
If there's a copy of the original Pillow installed, it has to be removed first
|
||||
with `$ pip uninstall -y pillow`.
|
||||
The installation itself is simple just as running `$ pip install pillow-simd`,
|
||||
and if you're using SSE4-capable CPU everything should run smoothly.
|
||||
If you'd like to install the AVX2-enabled version,
|
||||
you need to pass the additional flag to a C compiler.
|
||||
The easiest way to do so is to define the `CC` variable during the compilation.
|
||||
|
||||
```bash
|
||||
$ pip uninstall pillow
|
||||
$ CC="cc -mavx2" pip install -U --force-reinstall pillow-simd
|
||||
```
|
||||
|
||||
|
||||
## Contributing to Pillow-SIMD
|
||||
|
||||
Pillow-SIMD and Pillow are two separate projects.
|
||||
Please submit bugs and improvements not related to SIMD to
|
||||
[original Pillow][original-issues]. All bugs and fixes in Pillow
|
||||
will appear in next Pillow-SIMD version automatically.
|
||||
Please be aware that Pillow-SIMD and Pillow are two separate projects.
|
||||
Please submit bugs and improvements not related to SIMD to the [original Pillow][original-issues].
|
||||
All bugfixes to the original Pillow will then be transferred to the next Pillow-SIMD version automatically.
|
||||
|
||||
|
||||
[original-docs]: http://pillow.readthedocs.io/
|
||||
|
|
Loading…
Reference in New Issue
Block a user