What my code does is reading a struct instance from the vector and editing the two integer fields in the sub-struct. What I did now, also stated in my first post, is copying the Vector.GetCount() / 8 count of structs from the vector into an array and performing operations there. Afterwards I copy them back into the vector at the original positions.
As I stated, it seems just as fast, the profiler also notes so. Can I conclude from that finding that this operation is faster to prevent cacheline sharing?