SIMD AlphaComposite. avx2 implementation
SIMD AlphaComposite. increase precision
SIMD AlphaComposite. speedup sse4 by using _mm_mullo_epi16 instead of _mm_mullo_epi32
SIMD AlphaComposite. speedup avx2 by using _mm256_mullo_epi16 instead of _mm256_mullo_epi32
SIMD AlphaComposite. fix bugs
SIMD AlphaComposite. move declarations to beginning of the blocks
SIMD AlphaComposite. fast div aproximation