mirror of
https://github.com/python-pillow/Pillow.git
synced 2025-08-21 04:34:47 +03:00
SIMD. Rewritten the Pillow-SIMD readme
SIMD. Updated according to the review SIMD. fix markup
This commit is contained in:
parent
5628c31bb3
commit
9f511d459a
152
README.md
152
README.md
|
@ -1,12 +1,12 @@
|
||||||
# Pillow-SIMD
|
# Pillow-SIMD
|
||||||
|
|
||||||
Pillow-SIMD is "following" Pillow fork (which is PIL fork itself).
|
Pillow-SIMD is "following" Pillow (which is a PIL's fork itself).
|
||||||
"Following" means than Pillow-SIMD versions are 100% compatible
|
"Following" here means than Pillow-SIMD versions are 100% compatible
|
||||||
drop-in replacement for Pillow with the same version number.
|
drop-in replacements for Pillow of the same version.
|
||||||
For example, `Pillow-SIMD 3.2.0.post3` is drop-in replacement for
|
For example, `Pillow-SIMD 3.2.0.post3` is a drop-in replacement for
|
||||||
`Pillow 3.2.0` and `Pillow-SIMD 3.3.3.post0` for `Pillow 3.3.3`.
|
`Pillow 3.2.0`, and `Pillow-SIMD 3.3.3.post0` — for `Pillow 3.3.3`.
|
||||||
|
|
||||||
For more information about original Pillow, please
|
For more information on the original Pillow, please refer to:
|
||||||
[read the documentation][original-docs],
|
[read the documentation][original-docs],
|
||||||
[check the changelog][original-changelog] and
|
[check the changelog][original-changelog] and
|
||||||
[find out how to contribute][original-contribute].
|
[find out how to contribute][original-contribute].
|
||||||
|
@ -14,35 +14,35 @@ For more information about original Pillow, please
|
||||||
|
|
||||||
## Why SIMD
|
## Why SIMD
|
||||||
|
|
||||||
There are many ways to improve the performance of image processing.
|
There are multiple ways to tweak image processing performance.
|
||||||
You can use better algorithms for the same task, you can make better
|
To name a few, such ways can be: utilizing better algorithms, optimizing existing implementations,
|
||||||
implementation for current algorithms, or you can use more processing unit
|
using more processing power and/or resources.
|
||||||
resources. It is perfect when you can just use more efficient algorithm like
|
One of the great examples of using a more efficient algorithm is [replacing][gaussian-blur-changes]
|
||||||
when gaussian blur based on convolutions [was replaced][gaussian-blur-changes]
|
a convolution-based Gaussian blur with a sequential-box one.
|
||||||
by sequential box filters. But a number of such improvements are very limited.
|
|
||||||
It is also very tempting to use more processor unit resources
|
|
||||||
(via parallelization) when they are available. But it is handier just
|
|
||||||
to make things faster on the same resources. And that is where SIMD works better.
|
|
||||||
|
|
||||||
SIMD stands for "single instruction, multiple data". This is a way to perform
|
Such examples are rather rare, though. It is also known, that certain processes might be optimized
|
||||||
same operations against the huge amount of homogeneous data.
|
by using parallel processing to run the respective routines.
|
||||||
Modern CPU have different SIMD instructions sets like
|
But a more practical key to optimizations might be making things work faster
|
||||||
MMX, SSE-SSE4, AVX, AVX2, AVX512, NEON.
|
using the resources at hand. For instance, SIMD computing might be the case.
|
||||||
|
|
||||||
Currently, Pillow-SIMD can be [compiled](#installation) with SSE4 (default)
|
SIMD stands for "single instruction, multiple data" and its essence is
|
||||||
and AVX2 support.
|
in performing the same operation on multiple data points simultaneously
|
||||||
|
by using multiple processing elements.
|
||||||
|
Common CPU SIMD instruction sets are MMX, SSE-SSE4, AVX, AVX2, AVX512, NEON.
|
||||||
|
|
||||||
|
Currently, Pillow-SIMD can be [compiled](#installation) with SSE4 (default) or AVX2 support.
|
||||||
|
|
||||||
|
|
||||||
## Status
|
## Status
|
||||||
|
|
||||||
|
Pillow-SIMD project is production-ready.
|
||||||
|
The project is supported by Uploadcare, a SAAS for cloud-based image storing and processing.
|
||||||
|
|
||||||
[![Uploadcare][uploadcare.logo]][uploadcare.com]
|
[![Uploadcare][uploadcare.logo]][uploadcare.com]
|
||||||
|
|
||||||
Pillow-SIMD can be used in production. Pillow-SIMD has been operating on
|
In fact, Uploadcare has been running Pillow-SIMD for about two years now.
|
||||||
[Uploadcare][uploadcare.com] servers for more than 1 year.
|
|
||||||
Uploadcare is SAAS for image storing and processing in the cloud
|
|
||||||
and the main sponsor of Pillow-SIMD project.
|
|
||||||
|
|
||||||
Currently, following operations are accelerated:
|
The following image operations are currently SIMD-accelerated:
|
||||||
|
|
||||||
- Resize (convolution-based resampling): SSE4, AVX2
|
- Resize (convolution-based resampling): SSE4, AVX2
|
||||||
- Gaussian and box blur: SSE4
|
- Gaussian and box blur: SSE4
|
||||||
|
@ -50,14 +50,17 @@ Currently, following operations are accelerated:
|
||||||
- RGBA → RGBa (alpha premultiplication): SSE4, AVX2
|
- RGBA → RGBa (alpha premultiplication): SSE4, AVX2
|
||||||
- RGBa → RGBA (division by alpha): AVX2
|
- RGBa → RGBA (division by alpha): AVX2
|
||||||
|
|
||||||
See [CHANGES](CHANGES.SIMD.rst).
|
See [CHANGES](CHANGES.SIMD.rst) for more information.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
## Benchmarks
|
## Benchmarks
|
||||||
|
|
||||||
The numbers in the table represent processed megapixels of source RGB 2560x1600
|
In order for you to clearly assess the productivity of implementing SIMD computing into Pillow image processing,
|
||||||
image per second. For example, if resize of 2560x1600 image is done
|
we ran a number of benchmarks. The respective results can be found in the table below (the more — the better).
|
||||||
in 0.5 seconds, the result will be 8.2 Mpx/s.
|
The numbers represent processing rates in megapixels per second (Mpx/s).
|
||||||
|
For instance, the rate at which a 2560x1600 RGB image is processed in 0.5 seconds equals to 8.2 Mpx/s.
|
||||||
|
Here is the list of libraries and their versions we've been up to during the benchmarks:
|
||||||
|
|
||||||
- Skia 53
|
- Skia 53
|
||||||
- ImageMagick 6.9.3-8 Q8 x86_64
|
- ImageMagick 6.9.3-8 Q8 x86_64
|
||||||
|
@ -83,89 +86,84 @@ Operation | Filter | IM | Pillow| SIMD SSE4| SIMD AVX2| Skia 53
|
||||||
| 100px | 0.34| 16.93| 35.53| |
|
| 100px | 0.34| 16.93| 35.53| |
|
||||||
|
|
||||||
|
|
||||||
### Some conclusion
|
### A brief conclusion
|
||||||
|
|
||||||
Pillow is always faster than ImageMagick. And Pillow-SIMD is faster
|
The results show that Pillow is always faster than ImageMagick,
|
||||||
than Pillow in 4—5 times. In general, Pillow-SIMD with AVX2 always
|
Pillow-SIMD, in turn, is even faster than the original Pillow by the factor of 4-5.
|
||||||
**16-40 times faster** than ImageMagick and overperforms Skia,
|
In general, Pillow-SIMD with AVX2 is always **16 to 40 times faster** than
|
||||||
high-speed graphics library used in Chromium, up to 2 times.
|
ImageMagick and outperforms Skia, the high-speed graphics library used in Chromium.
|
||||||
|
|
||||||
### Methodology
|
### Methodology
|
||||||
|
|
||||||
All tests were performed on Ubuntu 14.04 64-bit running on
|
All rates were measured using the following setup: Ubuntu 14.04 64-bit,
|
||||||
Intel Core i5 4258U with AVX2 CPU on the single thread.
|
single-thread AVX2-enabled Intel i5 4258U CPU.
|
||||||
|
ImageMagick performance was measured with the `convert` command-line tool
|
||||||
ImageMagick performance was measured with command-line tool `convert` with
|
followed by `-verbose` and `-bench` arguments.
|
||||||
`-verbose` and `-bench` arguments. I use command line because
|
Such approach was used because there's usually a need in testing
|
||||||
I need to test the latest version and this is the easiest way to do that.
|
the latest software versions and command-line is the easiest way to do that.
|
||||||
|
All the routines involved with the testing procedure produced identic results.
|
||||||
All operations produce exactly the same results.
|
|
||||||
Resizing filters compliance:
|
Resizing filters compliance:
|
||||||
|
|
||||||
- PIL.Image.BILINEAR == Triangle
|
- PIL.Image.BILINEAR == Triangle
|
||||||
- PIL.Image.BICUBIC == Catrom
|
- PIL.Image.BICUBIC == Catrom
|
||||||
- PIL.Image.LANCZOS == Lanczos
|
- PIL.Image.LANCZOS == Lanczos
|
||||||
|
|
||||||
In ImageMagick, the radius of gaussian blur is called sigma and the second
|
In ImageMagick, Gaussian blur operation invokes two parameters:
|
||||||
parameter is called radius. In fact, there should not be additional parameters
|
the first is called 'radius' and the second is called 'sigma'.
|
||||||
for *gaussian blur*, because if the radius is too small, this is *not*
|
In fact, in order for the blur operation to be Gaussian, there should be no additional parameters.
|
||||||
gaussian blur anymore. And if the radius is big this does not give any
|
When the radius value is too small the blur procedure ceases to be Gaussian and
|
||||||
advantages but makes operation slower. For the test, I set the radius
|
if the value is excessively big the operation gets slowed down with zero benefits in exchange.
|
||||||
to sigma × 2.5.
|
For the benchmarking purposes, the radius was set to `sigma × 2.5`.
|
||||||
|
|
||||||
Following script was used for testing:
|
Following script was used for the benchmarking procedure:
|
||||||
https://gist.github.com/homm/f9b8d8a84a57a7e51f9c2a5828e40e63
|
https://gist.github.com/homm/f9b8d8a84a57a7e51f9c2a5828e40e63
|
||||||
|
|
||||||
|
|
||||||
## Why Pillow itself is so fast
|
## Why Pillow itself is so fast
|
||||||
|
|
||||||
There are no cheats. High-quality resize and blur methods are used for all
|
No cheats involved. We've used identical high-quality resize and blur methods for the benchmark.
|
||||||
benchmarks. Results are almost pixel-perfect. The difference is only effective
|
Outcomes produced by different libraries are in almost pixel-perfect agreement.
|
||||||
algorithms. Resampling in Pillow was rewritten in version 2.7 with
|
The difference in measured rates is only provided with the performance of every involved algorithm.
|
||||||
minimal usage of floating point numbers, precomputed coefficients and
|
|
||||||
cache-awareness transposition. This result was improved in 3.3 & 3.4 with
|
|
||||||
integer-only arithmetics and other optimizations.
|
|
||||||
|
|
||||||
|
|
||||||
## Why Pillow-SIMD is even faster
|
## Why Pillow-SIMD is even faster
|
||||||
|
|
||||||
Because of SIMD, of course. But this is not all. Heavy loops unrolling,
|
Because of the SIMD computing, of course. But there's more to it:
|
||||||
specific instructions, which not available for scalar.
|
heavy loops unrolling, specific instructions, which aren't available for scalar data types.
|
||||||
|
|
||||||
|
|
||||||
## Why do not contribute SIMD to the original Pillow
|
## Why do not contribute SIMD to the original Pillow
|
||||||
|
|
||||||
Well, that's not simple. First of all, Pillow supports a large number
|
Well, it's not that simple. First of all, the original Pillow supports
|
||||||
of architectures, not only x86. But even for x86 platforms, Pillow is often
|
a large number of architectures, not just x86.
|
||||||
distributed via precompiled binaries. To integrate SIMD in precompiled binaries
|
But even for x86 platforms, Pillow is often distributed via precompiled binaries.
|
||||||
we need to do runtime checks of CPU capabilities.
|
In order for us to integrate SIMD into the precompiled binaries
|
||||||
To compile the code with runtime checks we need to pass `-mavx2` option
|
we'd need to execute runtime CPU capabilities checks.
|
||||||
to the compiler. But with that option compiller will inject AVX instructions
|
To compile the code this way we need to pass the `-mavx2` option to the compiler.
|
||||||
enev for SSE functions, because every SSE instruction has AVX equivalent.
|
But with the option included, a compiler will inject AVX instructions even
|
||||||
|
for SSE functions (i.e. interchange them) since every SSE instruction has its AVX equivalent.
|
||||||
So there is no easy way to compile such library, especially with setuptools.
|
So there is no easy way to compile such library, especially with setuptools.
|
||||||
|
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
In general, you need to do `pip install pillow-simd` as always and if you
|
If there's a copy of the original Pillow installed, it has to be removed first
|
||||||
are using SSE4-capable CPU everything should run smoothly.
|
with `$ pip uninstall -y pillow`.
|
||||||
Do not forget to remove original Pillow package first.
|
The installation itself is simple just as running `$ pip install pillow-simd`,
|
||||||
|
and if you're using SSE4-capable CPU everything should run smoothly.
|
||||||
If you want the AVX2-enabled version, you need to pass the additional flag to C
|
If you'd like to install the AVX2-enabled version,
|
||||||
compiler. The easiest way to do that is define `CC` variable while compilation.
|
you need to pass the additional flag to a C compiler.
|
||||||
|
The easiest way to do so is to define the `CC` variable during the compilation.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
$ pip uninstall pillow
|
$ pip uninstall pillow
|
||||||
$ CC="cc -mavx2" pip install -U --force-reinstall pillow-simd
|
$ CC="cc -mavx2" pip install -U --force-reinstall pillow-simd
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
## Contributing to Pillow-SIMD
|
## Contributing to Pillow-SIMD
|
||||||
|
|
||||||
Pillow-SIMD and Pillow are two separate projects.
|
Please be aware that Pillow-SIMD and Pillow are two separate projects.
|
||||||
Please submit bugs and improvements not related to SIMD to
|
Please submit bugs and improvements not related to SIMD to the [original Pillow][original-issues].
|
||||||
[original Pillow][original-issues]. All bugs and fixes in Pillow
|
All bugfixes to the original Pillow will then be transferred to the next Pillow-SIMD version automatically.
|
||||||
will appear in next Pillow-SIMD version automatically.
|
|
||||||
|
|
||||||
|
|
||||||
[original-docs]: http://pillow.readthedocs.io/
|
[original-docs]: http://pillow.readthedocs.io/
|
||||||
|
|
Loading…
Reference in New Issue
Block a user