Commit Graph

7794 Commits

Author SHA1 Message Date
Alexander
d9cc462106 Merge branch 'simd/rgba-convert' into simd/5.3.x 2018-10-17 14:52:37 +03:00
Alexander
b646ac278f Merge branch 'simd/resample' into simd/5.3.x 2018-10-17 14:52:32 +03:00
Alexander
dd99b65d78 Merge branch 'simd/filters' into simd/5.3.x 2018-10-17 14:52:26 +03:00
Alexander
32f3dff6f5 Merge branch 'simd/box-blur' into simd/5.3.x 2018-10-17 14:51:58 +03:00
Alexander
0fc8680360 Merge branch 'simd/alpha-composite' into simd/5.3.x 2018-10-17 14:51:45 +03:00
Alexander
8c38010f7d Speedup other 2L convertions 2018-10-05 13:57:28 +03:00
Alexander
87385595ce RGB → L 2.2 times faster 2018-10-05 13:57:28 +03:00
Alexander
ca27f8197b fix rounding and speedup a bit 2018-10-05 13:57:28 +03:00
Alexander
7c7d7018b1 use 16bit arithmetics 2018-10-05 13:57:28 +03:00
Alexander
7f2b368e85 sse4 version (still 1.4x faster than previous avx2 implementation) 2018-10-05 13:57:28 +03:00
Alexander
89ddb0d95a use float div instead of gather 2018-10-05 13:57:28 +03:00
homm
a92659f65c fix RGBa → RGBA conversion on AVX2 2018-10-05 13:57:28 +03:00
homm
a880dd08e9 RGBa → RGBA convert using gather 2018-10-05 13:57:28 +03:00
homm
880fede485 avx2 implementation 2018-10-05 13:57:28 +03:00
homm
096aaa1e6c faster implementation 2018-10-05 13:57:28 +03:00
homm
fdef92c60a sse4 implementation 2018-10-05 13:57:28 +03:00
Alexander
adc2e0302d move files 2018-10-05 13:55:10 +03:00
Alexander
ef1692649d add parentheses around var declarations 2018-10-05 13:55:10 +03:00
Alexander
80a64c013e optimize coefficients loading for horizontal pass
wtf is xmax / 2

optimize coefficients loading for vertical pass
2018-10-05 13:55:10 +03:00
homm
b7b3b26483 SIMD resample: unrolled SSE4 & AVX2 2018-10-05 13:55:10 +03:00
Alexander
ff5ed4f6d5 move files 2018-10-05 13:52:49 +03:00
Alexander
1713b71c0a fix memory access for:
3x3f_u8
3x3i_4u8
5x5i_4u8
2018-10-05 13:52:49 +03:00
Alexander
94ea64c416 5x5i_4u8 AVX2 2018-10-05 13:52:49 +03:00
Alexander
3da294ca21 advanced 5x5i_4u8 SSE4 2018-10-05 13:52:49 +03:00
Alexander
96b367c571 5x5i_4u8 SSE4 2018-10-05 13:52:49 +03:00
Alexander
7bd48c8f63 finish 3x3i_4u8 2018-10-05 13:52:49 +03:00
Alexander
cb68d00256 avx2 version 2018-10-05 13:52:49 +03:00
Alexander
3b7b833f45 rearrange operations 2018-10-05 13:52:49 +03:00
Alexander
c4085db81e reduce number of registers 2018-10-05 13:52:49 +03:00
Alexander
0b3550c24f Rearrange instruction for speedup 2018-10-05 13:52:49 +03:00
Alexander
e4c9528d55 better loading 2018-10-05 13:52:49 +03:00
Alexander
8695387f05 better macros 2018-10-05 13:52:49 +03:00
Alexander
3e8574ae26 3x3i 2018-10-05 13:52:49 +03:00
Alexander
44c56befbd move ImagingFilterxxx functions to separate files 2018-10-05 13:52:49 +03:00
Alexander
98bed5abae fix offset 2018-10-05 13:52:49 +03:00
Alexander
db69139906 5x5 single channel SSE4 (tests failed) 2018-10-05 13:52:49 +03:00
Alexander
cdde46ae17 consider last pixel in AVX 2018-10-05 13:52:49 +03:00
Alexander
5ca47243f8 unroll AVX (with no profit) 2018-10-05 13:52:49 +03:00
Alexander
c30554ca64 Macros for AVX 2018-10-05 13:52:49 +03:00
Alexander
0d36fd05ee unroll AVX 2 times 2018-10-05 13:52:49 +03:00
Alexander
3c3623265c First AVX try 2018-10-05 13:52:49 +03:00
Alexander
ee7158d8d5 3x3 SSE4 singleband: 2 lines 2018-10-05 13:52:49 +03:00
Alexander
9966e832e0 reuse loaded values 2018-10-05 13:52:49 +03:00
Alexander
32c372a616 faster 3x3 singleband SSE4 2018-10-05 13:52:49 +03:00
Alexander
86c8aac6f8 3x3 SSE4 singleband 2018-10-05 13:52:49 +03:00
Alexander
bef019f9cf use macros in 3x3 2018-10-05 13:52:49 +03:00
Alexander
78e99deaef use macros 2018-10-05 13:52:49 +03:00
Alexander
328bf4593e rearrange 3x3 filter to match 5x5 2018-10-05 13:52:48 +03:00
Alexander
8a351e1e31 improve locality in 5x5 filter 2018-10-05 13:52:48 +03:00
Alexander
9c8a9014c4 a bit faster 5x5 filter 2018-10-05 13:52:48 +03:00