What is TPL ( Task Parallel Library) and how it differs from threads (c# interview questions) ?

What is TPL ( Task Parallel Library) and how it differs from threads (c# interview questions) ?


In this video we will talk about What exactly is Task Parallel Library and how different it is from our normal Threading so before i get into TPL and i start demonstrating TPL and i start explaining TPL let’s first try to understand problems which revolve around threading now the hole point we programmers used system.threading namespaces or we creates objects because we want to run logic parallel we want rally run the logic parallei in other words for example now you can see here i have very simple function here called as RunMillionIterations and what this function does is it actually goes and it runs for for-loop millions times and it does simple concatenation inside the for-loop and this method RunMillionsIterations i have invoke inside a threads here called as O1 so you can see that i have created thread object called as O1 and inside the thread object actually i am executing this method RunMillionIterations now just the quick note here befor i processed in-case you have come to this video directly or in-case you do not have any Idea of threading what my suggestion would be to go and see the C# Threading Q & A Videos where we are talk about threading where we have talk about :- how to apply Locks in threading how to use :- Mutex, Semaphore & SemaphoreSlim how to use :- AutoResetEvent ManualResetEvent and :- How to do thread pooling :- How Debug threads etc. Befor you go and see this video i would suggest you to go and once run through this videos over here so there you can understand what i am speaking here about Threading so let’s go back to our Console application here now what our expectation here is as programmer that when this RunMillionIterations method actually runs inside this O1 Our exceptions is that this RunMillionIterations shoul utilize the CPU power to the maximum so for example now let’s say that you have a P2 machines or you have P4 machine then my exceptions would be to make this rally Multi threaded i would like that for this Million records half of the Million record should execute on core 1 and half of the Million records logic should execute on Core 2 and then finally all the data should be concatenated inside this X displayed my explanation her is that when this RunMillionIterations method runs i would like that he should actually utilize my complete hardware power of my computer but in reality these is not happen in other words even though i have invoke this RunMillionIterations inside this thread over here what happens is this complete RunMillionIterations for-loop actually runs just only one processor so let me go and Run the perfmon tool now perfmon or i will say performance monitor is tool which help you to monitor different aspect of your computer like processor, Memory or probably any application from different perspectives currently we are interested that our application how does it utilize all the four processors sol what i will do is i will go and delete this default counter what we have here go and delete default counter here and let me go and add counters which are important from the processor prescriptive currently we are interested to know that how much our processor time is been utilize so i am going to add all the four instances of my processor you can see 0, 1, 2, 3 now each of this instance actually represent each of a processor so i am going to add ok there it is you can see now these blue lines Red lines yellow lines and green line say indicate all our processors let me also go and make this lines break take because it is looking very thing over here let me go to the properties here and let me make the width bit thicker so that we can see things in a more clearer way go and apply everywhere here there we are now we can clearly see that how our processors are actually loaded now let’s go ahead and run our application and then let’s try to do see that how these processor actually functions so let me go and run this application over here let quickly switch back to my performance monitor and let clear everything and let me start monitoring what it is doing now you can see clearly over here all the processor are not getting optimally utilize for example you can see this processor 1 which is indicated by the red sign here it is being leased utilized or i will say this all this three processor are the least utilized and look here you can see this blue 1 that is my processor 3 P3 core 3 has been utilized to the maximum in other words that is getting loaded more than any other processor but what my intention was or what i was thinking is because i expand thread i was thinning that all the processor should be optimally utilize which is actually not happening so in other words when we say that we are doing threading we are actually not doing threading we are actually not doing parallel execution but we are doing time slicing what do i mean by that now time slicing means contact switching so for instance let say that you have single processor here and the on this single processor let say you have thread1 or thread2 to running so now because we have only one processor or we have multiple thread running in-sided so what to processor does is it actually distribute time between those threads so for example let’s say it will first start with thread 1 and that time it will not start thread 2 it will gives times to thread1 it will execute some logic of thread1 after that he will make a switch to thread2 and that it will start executing thread2 and at that time when he has switch to thread 2 and executing thread 2 he will not execute thread 1 after that he will again stop executing thread 2 and switch thread 1 so in this way the processor actually switches time or actually distributes time between those thread and it keeps switching from thread 1 to thread 2 thread 2 to thread 1 so in other words actually multi threading is not happening time slicing is happening contact switching is happening also one important FYI if the processor is going to switch time like this between thread 1 and thread 2 in other words it’s going to do a contact switch between thread 1 to thread 2 and then from thread 2 to thread 1 what is means is that probably rather then increasing performance it can probably decreasing performance so in other words what we rally want is we want thread 1 to execute on processor 1 and we want thread 2 to execute on processor 2 in simple words this what we are looking at we want that in-case it is a P2 machine let assume it’s P2 machine now what is P2 machine have become quit old but let’s assume that it’s P2 machine and if i have 2 thread running then i would expect depending on scenarios depending on how much loaded those processor are i would like that thread 1 to be loaded on processor 1 executed on processor 1 and thread 2 be executed processor 2 if i remember the previous loop 1 millions time which i actually looked when i showed you perfmon counter you saw that how only 1 processor was getting loaded all the other processor where getting lass loaded in other words even though i have executed my million time loop in the thread multi threading is not happening time slicing is happening so in order to achieve parallelism what we rally want is we want that our logic should get executed optimally on all the processors if you look at today world the way our computers are actually going up right from P1 we have literally now we come to P8 i have seen 8 core processors it will probably go behind that so the hardware is going to be available at the chipper cost and it will grow so our application should have the capability to utilize that hardware and that’s what exactly TPL does when you execute that for-loop using task parallel Library it will actually smartly go and execute those logic on the processors depending on who is loaded and who is not loaded let me sow you demo of it now in order to implement TPL you need to first go and import this namespace here system.threading.Tasks; you can see that task actually the TPL library actually belongs to threading namespace and that is logical because at the end of the day task is a encapsulation our threading so what we will do is once your important this namespace now ready to use the TPL library so let me first go and comments that threading code so i am going to comments this in order to go and run this RunMillionIteratons one million time what we can do is we can use the parallel.For from the TPL library now you can see that there are lot’s of flavor of parallel.For but for now very quickly to demonstrate parallelism using TPL we will use the parallel.for so you can see that there are three important needed for the parallel.for method one is first from where you want to start the iteration that is 0 till where you want to go that is one million time i want to go and what you want to run the action that is the RunMillionIteratins method so i am going to run this RunMillionIterations using the parallel.for method from TPL now my expectation here is that this parallel.for in other words because i am going to use TPL here my expectation here is that this RunMillionIterations should actually optimally use my course of my machine so for example let’s say that if i have 2 core machine then it should probably run half million loops on core 1 and half the million loops on core 2 so i would like to see now optimal utilization of my processors of my machine what i am going to do is i am going to run this application over here and i am going to track the processor utilization using the perfmon tool so let me go and run this i am going to run this and let me quickly go and jump to my performance monitor now you can see that there are amazing results here if you remember right our previous perfmon data all the three course where utilize less and there was only one core which was getting loaded but look at look over here you can see that all the course are getting optimally utilize and that’s what exactly TPL is for TPL what is does is it actually takes your task break some into peaces and then says ok which of the processor currently are loaded less and then he goes and try to execute those logic on those processor in other words the big benefits of TPL over threading is it will actually execute or take the maximum uses from all the processor as compare to threading where it had affinity with the processor now let me rectify statement which i mad previously as said that threads have core affinity core affinity means once a thread actually runs on one core it will always run on that core but you as developers you can write your own logic and you can make that thread run different cores but that you as developer you are responsible to check which core is less loaded then to go and divide your logical pieces and then execute those logic on those cores and whatever data you get you are also responsible to aggregate that data finally then give it back as a result to the program now definitely that is lot of work for the developer to do to go and see which cores are less loaded then go and query them and then divide your task and executed then sink data etc. TPL encapsulate everything for you so the first big benefit of TPL over threads is that TPL encapsulate multi core execution for you you just concentrate on your functions and method to execute all the other things parallelism across core is taken care by TPL the second benefit of TPL is thread pooling ok now this video has become quite big over here so what i will do is i am going to divide this video into two parts so in the first part we talk about how TPL does multi core execution now in the second part we will talk about how TPL automatically does thread pooling ok so in-case you are running thousand of threads how it actually goes and rather then creating those thread from scratch again and again actually does thread pooling and with that it actually also utilize your memory properly so in this second part we will see how TPL automatically does thread pooling…:) THANK YOU SO MUCH 🙂

Only registered users can comment.

  1. you should have put the parallel for inside the run million iterations… technically you are showing two diff examples…..

  2. You are so right. But well the TPL example was super loaded
    with 1 million time more but still performed better. So it sill conveys the same message TPL is better.

    So sorry i was so engrossed in the demo……Should avoid night recordings.

  3. "P2" is typically used to refer to the "Pentium 2" processor. A processor with 2 cores is referred to as a "dual core" processor and a system with 2 processors is referred to as a "dual processor" system (a dual processor system can have more than 2 cores).

  4. To increase utilization, you should create multiple threads. As you have 4 cores, creating 4 threads with each running the RunMillionIterations would show you a high CPU utilization.

  5. Good video showing concept of TPL but in this example diagram showing running multiple cores of 1/2 millions iterations each but code shows running multi cores of millions iterations which I don't quite follows. How do you knows that tasks of computing spread equally.

  6. how to implement, below code using TPL. Any Idea?

    foreach (GridViewRow row in grdSearch.Rows)
                { //do smth
    }

  7. You are awesome sir.. really you always explain complex things in simple way. that a layman can understand..  thank you for sharing ..

  8. Correct me if am wrong, Parallel.For runs  from 0 to million each time it calls a function RunMillionIterations(). This means that the each function call will be divided among processors/ available threads. However, i see the explanation saying the whole iteration is divided among iteration that is kind of confusing. 

  9. I think we do not need "for loop" in the method while we are using parallel class in the example mentioned in the video..   

  10. Author seems to be under impression that PII processor means processor have two cores. Rather PI to PIV are all single core processors, only after arrival of dual core pentiums we had multiple cores and later it was followed by core 2, I3,I5,I7.

  11. After inserting the Parallel.For-Call the total amount of iterations is 1,000,000 * 1,000,000. For unexperienced viewers leaving the old for loop in the program could be confusing.

  12. How do thread branches execute? If I spawn a thread from a thread does it time slice or does it parallelize. I guess I could experiment and find out.

  13. If you spawn two or more threads manually it will parallelize but you will need callbacks unless its fire and forget.

  14. Questpond has top notch videos and the explanations are among the best anywhere. Especially the design patterns series – simple to grasp some complex subjects! Excellent job Questpond! I'll likely sign up with them shortly.

  15. There seems to be a misconception here, when we spawn a new thread like shown in first part off video, it is not for completing task faster, it is just for making calling thread free and assigning task to another thread. By this its obvious that all cores of cpu will not take same load. Because spawned thread running on one core is processing assigned task and meanwhile calling thread finishes with main function.

  16. The explanation here is not correct. He took a single threaded "for loop" and placed it on a single thread. Well, that is in fact multi threading. Your fist thread is the thread of your application itself. The second thread is the "for loop", for a total of 2 threads. What you are really talking about is parallelizing work that is "normally" single threaded, which is distinctly different.

  17. Is this code not doing 1 million * RunMillionIterations() I think you can drop the one of the for… probably the one inside RunMillionIterations

  18. Fast-forward to 8:31 if you already understand threading and using perfmon and actually wish to get to TPL.

  19. I have to say that concatenating a string with s = s+"x" one million times is absolute a killer for memory and GC

  20. Nice video. thanks to the creator. However, it raises a few questions:

    1. What if I wish to run a task forever according to its own logic
    (in which case the 'For" of the Parallel.For is not required).
    Thus, I need a one time launch … and let the thread run as long as its own logic says so.

    2. How do I enquire a core's overload to correctly set task-core affinity?

    3. Which API is used to set affinity?

    4. I don't understand why the Parallel.For is used … there is a for () loop statement in the task's code?

    5. It is unclear to me from the core overload chart whether we see 4 identical tasks running on 4 cores concurrently?

    If this is a single task running on all 4 cores. then:
    (a) what exactly controls splitting its work among cores and then combine the results as if it was executed on a single core.
    (b) how come the overload of all cores is similar to the overload of the execution of a single core. Where is the benefit?
    in the execution period length?

Leave a Reply

Your email address will not be published. Required fields are marked *