Alexander
|
b646ac278f
|
Merge branch 'simd/resample' into simd/5.3.x
|
2018-10-17 14:52:32 +03:00 |
|
Alexander
|
dd99b65d78
|
Merge branch 'simd/filters' into simd/5.3.x
|
2018-10-17 14:52:26 +03:00 |
|
Alexander
|
32f3dff6f5
|
Merge branch 'simd/box-blur' into simd/5.3.x
|
2018-10-17 14:51:58 +03:00 |
|
Alexander
|
0fc8680360
|
Merge branch 'simd/alpha-composite' into simd/5.3.x
|
2018-10-17 14:51:45 +03:00 |
|
Alexander
|
adc2e0302d
|
move files
|
2018-10-05 13:55:10 +03:00 |
|
Alexander
|
ef1692649d
|
add parentheses around var declarations
|
2018-10-05 13:55:10 +03:00 |
|
Alexander
|
80a64c013e
|
optimize coefficients loading for horizontal pass
wtf is xmax / 2
optimize coefficients loading for vertical pass
|
2018-10-05 13:55:10 +03:00 |
|
homm
|
b7b3b26483
|
SIMD resample: unrolled SSE4 & AVX2
|
2018-10-05 13:55:10 +03:00 |
|
Alexander
|
ff5ed4f6d5
|
move files
|
2018-10-05 13:52:49 +03:00 |
|
Alexander
|
1713b71c0a
|
fix memory access for:
3x3f_u8
3x3i_4u8
5x5i_4u8
|
2018-10-05 13:52:49 +03:00 |
|
Alexander
|
94ea64c416
|
5x5i_4u8 AVX2
|
2018-10-05 13:52:49 +03:00 |
|
Alexander
|
3da294ca21
|
advanced 5x5i_4u8 SSE4
|
2018-10-05 13:52:49 +03:00 |
|
Alexander
|
96b367c571
|
5x5i_4u8 SSE4
|
2018-10-05 13:52:49 +03:00 |
|
Alexander
|
7bd48c8f63
|
finish 3x3i_4u8
|
2018-10-05 13:52:49 +03:00 |
|
Alexander
|
cb68d00256
|
avx2 version
|
2018-10-05 13:52:49 +03:00 |
|
Alexander
|
3b7b833f45
|
rearrange operations
|
2018-10-05 13:52:49 +03:00 |
|
Alexander
|
c4085db81e
|
reduce number of registers
|
2018-10-05 13:52:49 +03:00 |
|
Alexander
|
0b3550c24f
|
Rearrange instruction for speedup
|
2018-10-05 13:52:49 +03:00 |
|
Alexander
|
e4c9528d55
|
better loading
|
2018-10-05 13:52:49 +03:00 |
|
Alexander
|
8695387f05
|
better macros
|
2018-10-05 13:52:49 +03:00 |
|
Alexander
|
3e8574ae26
|
3x3i
|
2018-10-05 13:52:49 +03:00 |
|
Alexander
|
44c56befbd
|
move ImagingFilterxxx functions to separate files
|
2018-10-05 13:52:49 +03:00 |
|
Alexander
|
98bed5abae
|
fix offset
|
2018-10-05 13:52:49 +03:00 |
|
Alexander
|
db69139906
|
5x5 single channel SSE4 (tests failed)
|
2018-10-05 13:52:49 +03:00 |
|
Alexander
|
cdde46ae17
|
consider last pixel in AVX
|
2018-10-05 13:52:49 +03:00 |
|
Alexander
|
5ca47243f8
|
unroll AVX (with no profit)
|
2018-10-05 13:52:49 +03:00 |
|
Alexander
|
c30554ca64
|
Macros for AVX
|
2018-10-05 13:52:49 +03:00 |
|
Alexander
|
0d36fd05ee
|
unroll AVX 2 times
|
2018-10-05 13:52:49 +03:00 |
|
Alexander
|
3c3623265c
|
First AVX try
|
2018-10-05 13:52:49 +03:00 |
|
Alexander
|
ee7158d8d5
|
3x3 SSE4 singleband: 2 lines
|
2018-10-05 13:52:49 +03:00 |
|
Alexander
|
9966e832e0
|
reuse loaded values
|
2018-10-05 13:52:49 +03:00 |
|
Alexander
|
32c372a616
|
faster 3x3 singleband SSE4
|
2018-10-05 13:52:49 +03:00 |
|
Alexander
|
86c8aac6f8
|
3x3 SSE4 singleband
|
2018-10-05 13:52:49 +03:00 |
|
Alexander
|
bef019f9cf
|
use macros in 3x3
|
2018-10-05 13:52:49 +03:00 |
|
Alexander
|
78e99deaef
|
use macros
|
2018-10-05 13:52:49 +03:00 |
|
Alexander
|
328bf4593e
|
rearrange 3x3 filter to match 5x5
|
2018-10-05 13:52:48 +03:00 |
|
Alexander
|
8a351e1e31
|
improve locality in 5x5 filter
|
2018-10-05 13:52:48 +03:00 |
|
Alexander
|
9c8a9014c4
|
a bit faster 5x5 filter
|
2018-10-05 13:52:48 +03:00 |
|
Alexander
|
9e9a1a493b
|
fast 3x3 filter
|
2018-10-05 13:52:48 +03:00 |
|
Alexander
|
72f5b73df0
|
5x5 implementation
|
2018-10-05 13:52:48 +03:00 |
|
Alexander
|
f79e583365
|
3x3 implementation
|
2018-10-05 13:52:48 +03:00 |
|
Alexander
|
cd8f9c64e7
|
faster box blur for radius < 1
|
2018-10-05 13:52:29 +03:00 |
|
Alexander
|
ba42e0b201
|
add parentheses around var declarations
|
2018-10-05 13:52:29 +03:00 |
|
homm
|
f9c162b34a
|
sse4 ImagingBoxBlur implementation
|
2018-10-05 13:52:29 +03:00 |
|
Alexander
|
c76f541dad
|
fast div aproximation
|
2018-10-05 13:52:06 +03:00 |
|
homm
|
cae99973db
|
move declarations to beginning of the blocks
|
2018-10-05 13:52:06 +03:00 |
|
homm
|
786fd3d64d
|
fix bugs
|
2018-10-05 13:52:05 +03:00 |
|
homm
|
01563e732e
|
speedup avx2 by using _mm256_mullo_epi16 instead of _mm256_mullo_epi32
|
2018-10-05 13:52:05 +03:00 |
|
homm
|
46d274a7d9
|
speedup sse4 by using _mm_mullo_epi16 instead of _mm_mullo_epi32
|
2018-10-05 13:52:05 +03:00 |
|
homm
|
ab46181de5
|
increase precision
|
2018-10-05 13:52:05 +03:00 |
|