|
|
Home » U++ Library support » U++ Core » Vector performance on a specific situation
|
|
Re: Vector performance on a specific situation [message #40135 is a reply to message #40131] |
Tue, 18 June 2013 19:11 |
Novo
Messages: 1361 Registered: December 2006
|
Ultimate Contributor |
|
|
crydev wrote on Tue, 18 June 2013 03:01 | Hello,
I have a question about the Vector's performance in a specific situation. I have a program that utilizes 8 threads on new systems, heavy utilization of paralellism. Say I have a Vector containing 300 items. I split the indexes of those items over 8 threads, meaning the Vector will be accessed from 8 threads simultaniously, but every thread accesses a different item. The same memory location is never modified.
I have read something about Vector cache lines. What is the performance of the U++ implementation of the Vector in this situation? I tried to copy the thread-specific data into arrays and passed them into the functions, but it seems like just as fast.
If there is a better way to do this, I appreciate any suggestions.
|
If you are just reading data there will be no problems. But if you write to elements (even if they are not shared among threads) you get yourself into false sharing problem. Basically, the idea is that CPU doesn't work with words, it works with cache lines. The simplest way to fix that is to add padding to your data. Example: instead of using raw int you can use a structure below.
struct MyData {
int data;
char padding[64 - sizeof(data)];
};
Size of cache line is usually 64 bytes, so you need to add padding to make you data land onto different cache lines.
Regards,
Novo
[Updated on: Fri, 21 June 2013 02:06] Report message to a moderator
|
|
|
|
|
Re: Vector performance on a specific situation [message #40139 is a reply to message #40131] |
Wed, 19 June 2013 09:37 |
crydev
Messages: 151 Registered: October 2012 Location: Netherlands
|
Experienced Member |
|
|
The computation on these elements is not very heavy, but the information in these structs is used to read over gigabytes of memory and compare every byte. If I use only one thread to do that it will be busy for a few minutes, where 8 threads will handle it in a few seconds.
The amount of elements differs per process running on a windows machine. A small process has around 300 pages, which makes the vector contain 300 elements, but bigger processes can contain over 2000 pages, which increases workload a lot.
I have not yet benchmarked it for one thread, because I think it doesn't matter. If you use only one thread you simultaniously read from 0 to the end, where this problem is not really applicable. When 8 threads operate on the Vector, the first one operates on 0-49, the next on 50-99, and so on.
|
|
|
Goto Forum:
Current Time: Wed Jun 05 06:52:23 CEST 2024
Total time taken to generate the page: 0.01311 seconds
|
|
|