|
Joined: Sep 2006
Posts: 28,201
Legend
|
Legend
Joined: Sep 2006
Posts: 28,201 |
Quote:
So I've been researching into waitfree algorithms like waitfreestacks where you can place objects into a LIFO structure across N threads without locks and still be thread safe.
Nowadays with processors like the Core i7 having 8 or more logical cores, not knowing how to distribute your tasks is going to leave you in the dust performance wise, so I'm learning it, and got some good personal code running on it with good results so far. You can still deadlock or livelock in .NET but it does help catch many of the other gotchas you may run into like editing windows controls from multiple thread contexts.
Can you share some resources that you've found that were helpful? I've done some threading in a few utilities I've written, but I've mostly relied on locks to this point to help control access to objects.
Browns is the Browns
... there goes Joe Thomas, the best there ever was in this game.
|
|
|
|
Joined: Sep 2006
Posts: 28,201
Legend
|
Legend
Joined: Sep 2006
Posts: 28,201 |
Groovy, thanks 
Browns is the Browns
... there goes Joe Thomas, the best there ever was in this game.
|
|
|
|
Joined: Nov 2006
Posts: 3,259
Hall of Famer
|
Hall of Famer
Joined: Nov 2006
Posts: 3,259 |
no problem!
As an example I wrote a thread job that could iterate over elements in an array, where each thread (8 total) ran the same function like so, with the same counter:
volatile LONG m_Counter = 0;
while( true ) { int index = InterlockedIncrement( &m_Counter );
if( index > arraySize ) break;
operateOnArrayElement( mArray[index-1] ); }
My sample data took 3ms to complete in singlethreaded mode, distributing across 8 threads dropped that down to 1ms, and its one of those things where as the data gets heavier your overall speed improvements get better and better! It's basically a C++ version of the .NET Parallels.For method.
#gmstrong
|
|
|
|
Joined: Sep 2006
Posts: 28,201
Legend
|
Legend
Joined: Sep 2006
Posts: 28,201 |
very cool... and because InterlockedIncrement() is handling the sync'ing for you, you don't have to sweat the details about two threads perhaps getting the same index?.... very nice  Of course, it seems that you could just write a quick-n-dirty lock function that does the same thing - block other threads by taking out a lock on the counter, increment a counter, unblock and return the value. I would think that the performance difference between the two would be nearly identical.
Browns is the Browns
... there goes Joe Thomas, the best there ever was in this game.
|
|
|
|
Joined: Sep 2006
Posts: 4,480
Hall of Famer
|
Hall of Famer
Joined: Sep 2006
Posts: 4,480 |
Here is a pretty good article comparing normal sequential processing vs. Parallel.for and CUDA GPU if you are interested: http://www.c-sharpcorner.com/UploadFile/rafaelwo/4398/I recoded the c portions using a CUDA .NET class I found but the results are similar.
#gmstrong
|
|
|
|
Joined: Sep 2006
Posts: 28,201
Legend
|
Legend
Joined: Sep 2006
Posts: 28,201 |
That's something I'd like to play with, but I just need a project to play with that holds my interest to give me a goal to work on.
It strikes me that CUDA is best used with massive parallelism and not so much something that only has a few things going on.
Browns is the Browns
... there goes Joe Thomas, the best there ever was in this game.
|
|
|
|
Joined: Nov 2006
Posts: 3,259
Hall of Famer
|
Hall of Famer
Joined: Nov 2006
Posts: 3,259 |
To Prp, you can do the same thing with critical sections of course, but when you call the interlocked functions, it calls asm: http://www.codemaestro.com/reviews/8the interlocked functions are basically macros depending on if your targetting x86, x64, or itanium (for windows at least). A critical section is a kernel function. You can do the same thing with critical sections but your going to be costing yourself an order of magnitude more cycles to do it. For a game to run at 60fps you need to complete your game loop in 16ms. Thats not a lot of time, so anytime you can save cycles like that is a big win  And ya while a CPU based threadsystem these days is what, 12 threads max? On GPGPU code you need at least scale to 1024 otherwise your wasting a ton of pipelines running your work, thats why you dont see the throughput gains until you can get all the pipelines running at once.
#gmstrong
|
|
|
|
Joined: Sep 2006
Posts: 30,826
Legend
|
Legend
Joined: Sep 2006
Posts: 30,826 |
j/c
There seems to be an issue with the board. It only happens in this thread - but for some reason, I can barely make out any english in this thread - the rest seems to be some foreign language I've never heard of.....
|
|
|
|
Joined: Sep 2006
Posts: 15,015
Legend
|
Legend
Joined: Sep 2006
Posts: 15,015 |
We don't have to agree with each other, to respect each others opinion.
|
|
|
|
Joined: Nov 2006
Posts: 3,259
Hall of Famer
|
Hall of Famer
Joined: Nov 2006
Posts: 3,259 |
it is a foreign language, C++ 
#gmstrong
|
|
|
|
Joined: Sep 2006
Posts: 14,248
Legend
|
Legend
Joined: Sep 2006
Posts: 14,248 |
If witty response then witty response else ignore end if
|
|
|
DawgTalkers.net
Forums DawgTalk Tailgate Forum Need Help - I Wanna Learn
Programming
|
|