From 9f511d459ab588b7bb340c5016a7bd00f2d1b5f6 Mon Sep 17 00:00:00 2001
From: Elijah <ilya@uploadcare.com>
Date: Fri, 7 Oct 2016 16:54:22 +0500
Subject: [PATCH] SIMD. Rewritten the Pillow-SIMD readme

SIMD. Updated according to the review

SIMD. fix markup
---
 README.md | 152 +++++++++++++++++++++++++++---------------------------
 1 file changed, 75 insertions(+), 77 deletions(-)

diff --git a/README.md b/README.md
index e37d51f53..5742d40d4 100644
--- a/README.md
+++ b/README.md
@@ -1,12 +1,12 @@
 # Pillow-SIMD
 
-Pillow-SIMD is "following" Pillow fork (which is PIL fork itself).
-"Following" means than Pillow-SIMD versions are 100% compatible
-drop-in replacement for Pillow with the same version number.
-For example, `Pillow-SIMD 3.2.0.post3` is drop-in replacement for
-`Pillow 3.2.0` and  `Pillow-SIMD 3.3.3.post0` for `Pillow 3.3.3`.
+Pillow-SIMD is "following" Pillow (which is a PIL's fork itself).
+"Following" here means than Pillow-SIMD versions are 100% compatible
+drop-in replacements for Pillow of the same version.
+For example, `Pillow-SIMD 3.2.0.post3` is a drop-in replacement for
+`Pillow 3.2.0`, and  `Pillow-SIMD 3.3.3.post0` — for `Pillow 3.3.3`.
 
-For more information about original Pillow, please
+For more information on the original Pillow, please refer to:
 [read the documentation][original-docs],
 [check the changelog][original-changelog] and
 [find out how to contribute][original-contribute].
@@ -14,35 +14,35 @@ For more information about original Pillow, please
 
 ## Why SIMD
 
-There are many ways to improve the performance of image processing.
-You can use better algorithms for the same task, you can make better
-implementation for current algorithms, or you can use more processing unit
-resources. It is perfect when you can just use more efficient algorithm like
-when gaussian blur based on convolutions [was replaced][gaussian-blur-changes]
-by sequential box filters. But a number of such improvements are very limited.
-It is also very tempting to use more processor unit resources 
-(via parallelization) when they are available. But it is handier just
-to make things faster on the same resources. And that is where SIMD works better.
+There are multiple ways to tweak image processing performance.
+To name a few, such ways can be: utilizing better algorithms, optimizing existing implementations, 
+using more processing power and/or resources. 
+One of the great examples of using a more efficient algorithm is [replacing][gaussian-blur-changes] 
+a convolution-based Gaussian blur with a sequential-box one.
 
-SIMD stands for "single instruction, multiple data". This is a way to perform
-same operations against the huge amount of homogeneous data. 
-Modern CPU have different SIMD instructions sets like
-MMX, SSE-SSE4, AVX, AVX2, AVX512, NEON.
+Such examples are rather rare, though. It is also known, that certain processes might be optimized 
+by using parallel processing to run the respective routines.
+But a more practical key to optimizations might be making things work faster 
+using the resources at hand. For instance, SIMD computing might be the case.
 
-Currently, Pillow-SIMD can be [compiled](#installation) with SSE4 (default)
-and AVX2 support.
+SIMD stands for "single instruction, multiple data" and its essence is 
+in performing the same operation on multiple data points simultaneously 
+by using multiple processing elements. 
+Common CPU SIMD instruction sets are MMX, SSE-SSE4, AVX, AVX2, AVX512, NEON.
+
+Currently, Pillow-SIMD can be [compiled](#installation) with SSE4 (default) or AVX2 support.
 
 
 ## Status
 
+Pillow-SIMD project is production-ready.
+The project is supported by Uploadcare, a SAAS for cloud-based image storing and processing.
+
 [![Uploadcare][uploadcare.logo]][uploadcare.com]
 
-Pillow-SIMD can be used in production. Pillow-SIMD has been operating on
-[Uploadcare][uploadcare.com] servers for more than 1 year.
-Uploadcare is SAAS for image storing and processing in the cloud
-and the main sponsor of Pillow-SIMD project.
+In fact, Uploadcare has been running Pillow-SIMD for about two years now.
 
-Currently, following operations are accelerated:
+The following image operations are currently SIMD-accelerated:
 
 - Resize (convolution-based resampling): SSE4, AVX2
 - Gaussian and box blur: SSE4
@@ -50,14 +50,17 @@ Currently, following operations are accelerated:
 - RGBA → RGBa (alpha premultiplication): SSE4, AVX2
 - RGBa → RGBA (division by alpha): AVX2
 
-See [CHANGES](CHANGES.SIMD.rst).
+See [CHANGES](CHANGES.SIMD.rst) for more information.
+
 
 
 ## Benchmarks
 
-The numbers in the table represent processed megapixels of source RGB 2560x1600
-image per second. For example, if resize of 2560x1600 image is done
-in 0.5 seconds, the result will be 8.2 Mpx/s.
+In order for you to clearly assess the productivity of implementing SIMD computing into Pillow image processing, 
+we ran a number of benchmarks. The respective results can be found in the table below (the more — the better). 
+The numbers represent processing rates in megapixels per second (Mpx/s). 
+For instance, the rate at which a 2560x1600 RGB image is processed in 0.5 seconds equals to 8.2 Mpx/s.
+Here is the list of libraries and their versions we've been up to during the benchmarks:
 
 - Skia 53
 - ImageMagick 6.9.3-8 Q8 x86_64
@@ -83,89 +86,84 @@ Operation               | Filter  | IM   | Pillow| SIMD SSE4| SIMD AVX2| Skia 53
                         | 100px   |  0.34|  16.93|     35.53|          |        
 
 
-### Some conclusion
+### A brief conclusion
 
-Pillow is always faster than ImageMagick. And Pillow-SIMD is faster
-than Pillow in 4—5 times. In general, Pillow-SIMD with AVX2 always
-**16-40 times faster** than ImageMagick and overperforms Skia,
-high-speed graphics library used in Chromium, up to 2 times.
+The results show that Pillow is always faster than ImageMagick, 
+Pillow-SIMD, in turn, is even faster than the original Pillow by the factor of 4-5. 
+In general, Pillow-SIMD with AVX2 is always **16 to 40 times faster** than 
+ImageMagick and outperforms Skia, the high-speed graphics library used in Chromium.
 
 ### Methodology
 
-All tests were performed on Ubuntu 14.04 64-bit running on
-Intel Core i5 4258U with AVX2 CPU on the single thread.
-
-ImageMagick performance was measured with command-line tool `convert` with
-`-verbose` and `-bench` arguments. I use command line because
-I need to test the latest version and this is the easiest way to do that.
-
-All operations produce exactly the same results.
+All rates were measured using the following setup: Ubuntu 14.04 64-bit, 
+single-thread AVX2-enabled Intel i5 4258U CPU.
+ImageMagick performance was measured with the `convert` command-line tool 
+followed by `-verbose` and `-bench` arguments.
+Such approach was used because there's usually a need in testing 
+the latest software versions and command-line is the easiest way to do that.
+All the routines involved with the testing procedure produced identic results.
 Resizing filters compliance:
 
 - PIL.Image.BILINEAR == Triangle
 - PIL.Image.BICUBIC == Catrom
 - PIL.Image.LANCZOS == Lanczos
 
-In ImageMagick, the radius of gaussian blur is called sigma and the second
-parameter is called radius. In fact, there should not be additional parameters
-for *gaussian blur*, because if the radius is too small, this is *not*
-gaussian blur anymore. And if the radius is big this does not give any
-advantages but makes operation slower. For the test, I set the radius
-to sigma × 2.5.
+In ImageMagick, Gaussian blur operation invokes two parameters: 
+the first is called 'radius' and the second is called 'sigma'.
+In fact, in order for the blur operation to be Gaussian, there should be no additional parameters. 
+When the radius value is too small the blur procedure ceases to be Gaussian and 
+if the value is excessively big the operation gets slowed down with zero benefits in exchange. 
+For the benchmarking purposes, the radius was set to `sigma × 2.5`.
 
-Following script was used for testing:
+Following script was used for the benchmarking procedure:
 https://gist.github.com/homm/f9b8d8a84a57a7e51f9c2a5828e40e63
 
 
 ## Why Pillow itself is so fast
 
-There are no cheats. High-quality resize and blur methods are used for all
-benchmarks. Results are almost pixel-perfect. The difference is only effective
-algorithms. Resampling in Pillow was rewritten in version 2.7 with 
-minimal usage of floating point numbers, precomputed coefficients and
-cache-awareness transposition. This result was improved in 3.3 & 3.4 with
-integer-only arithmetics and other optimizations.
-
+No cheats involved. We've used identical high-quality resize and blur methods for the benchmark. 
+Outcomes produced by different libraries are in almost pixel-perfect agreement. 
+The difference in measured rates is only provided with the performance of every involved algorithm. 
 
 ## Why Pillow-SIMD is even faster
 
-Because of SIMD, of course. But this is not all. Heavy loops unrolling,
-specific instructions, which not available for scalar.
+Because of the SIMD computing, of course. But there's more to it: 
+heavy loops unrolling, specific instructions, which aren't available for scalar data types.
 
 
 ## Why do not contribute SIMD to the original Pillow
 
-Well, that's not simple. First of all, Pillow supports a large number
-of architectures, not only x86. But even for x86 platforms, Pillow is often
-distributed via precompiled binaries. To integrate SIMD in precompiled binaries
-we need to do runtime checks of CPU capabilities.
-To compile the code with runtime checks we need to pass `-mavx2` option
-to the compiler. But with that option compiller will inject AVX instructions
-enev for SSE functions, because every SSE instruction has AVX equivalent.
+Well, it's not that simple. First of all, the original Pillow supports 
+a large number of architectures, not just x86.
+But even for x86 platforms, Pillow is often distributed via precompiled binaries.
+In order for us to integrate SIMD into the precompiled binaries 
+we'd need to execute runtime CPU capabilities checks.
+To compile the code this way we need to pass the `-mavx2` option to the compiler.
+But with the option included, a compiler will inject AVX instructions even
+for SSE functions (i.e. interchange them) since every SSE instruction has its AVX equivalent.
 So there is no easy way to compile such library, especially with setuptools.
 
 
 ## Installation
 
-In general, you need to do `pip install pillow-simd` as always and if you
-are using SSE4-capable CPU everything should run smoothly.
-Do not forget to remove original Pillow package first.
-
-If you want the AVX2-enabled version, you need to pass the additional flag to C
-compiler. The easiest way to do that is define `CC` variable while compilation.
+If there's a copy of the original Pillow installed, it has to be removed first
+with `$ pip uninstall -y pillow`.
+The installation itself is simple just as running `$ pip install pillow-simd`, 
+and if you're using SSE4-capable CPU everything should run smoothly.
+If you'd like to install the AVX2-enabled version, 
+you need to pass the additional flag to a C compiler. 
+The easiest way to do so is to define the `CC` variable during the compilation.
 
 ```bash
 $ pip uninstall pillow
 $ CC="cc -mavx2" pip install -U --force-reinstall pillow-simd
 ```
 
-
 ## Contributing to Pillow-SIMD
 
-Pillow-SIMD and Pillow are two separate projects.
-Please submit bugs and improvements not related to SIMD to 
-[original Pillow][original-issues]. All bugs and fixes in Pillow
-will appear in next Pillow-SIMD version automatically.
+Please be aware that Pillow-SIMD and Pillow are two separate projects.
+Please submit bugs and improvements not related to SIMD to the [original Pillow][original-issues].
+All bugfixes to the original Pillow will then be transferred to the next Pillow-SIMD version automatically.
 
 
   [original-docs]: http://pillow.readthedocs.io/