While thinking about optimizing for ARM NEON SIMD, after a lot hesitation I decided to create "common encapsulation", basically a set of SIMD types common for SSE2 and ARM NEON and rewrote all SIMD optimised algorithms with it. It is in Core/SIMD_SSE2.h (I yet have to create NEON implementation).
While mostly intended for internal use, it could make developing SIMD code easier. It basically defines 4 128bit types and operations on them...