"We are committed to providing our Customers with excellent software solutions.
We can handle everything - complex business, mobile, game, web applications and present perfect result at the end," Dmitry Starostenkov, CEO
click to start a new project
start live dialog with Enterra
About Us
Tech Zone

Parallel programming in . NET Framework 4.0

Denis Rechkunov

In the 4th version of Microsoft framework a lot of improvements have been done in the parallel programming field: fundamentally new things for .NET were added and already existing mechanisms were improved, used by developers before.

Starting from version 4.0 in the framework a separate library was additionally picked out for solving parallel tasks, the so-called TPL (Task Parallel Library), including high-level engines for execution, planning and synchronization of parallel actions.

Let’s consider a general scheme of program, using parallel algorithms in .NET Framework 4.0:

Figure 1 Parallel Programming in the .NET Framework (from MSDN)

From the scheme can be seen that any program written by .NET developer and using parallel algorithms ultimately uses 3 modules of .NET Framework:

  • Data Structures for Coordination (set of thread-safe data structures, synchronization primitives, implementations of parallel programming patterns and managed wrappers over operating system’s threads)
  • Task Parallel Library (high-level library of parallel tasks, built over the previous module)
  • PLINQ Execution Engine (mechanisms for high-level parallel data processing, which are ultimately implemented over TPL)

Two last modules appeared exactly in .NET Framework 4.0, and the first has been greatly improved with this version. Further a brief overview of listed modules will be given.

Data Structures for Coordination

All elements of this module are contained in the System.Threading namespace and can be divided into several groups:

  • Synchronization primitives (Semaphore, Mutex, Monitor, Barrier, ReaderWriterLock, LazyInitializer, SpinLock, SpinWait)
  • Tools for operating system’s threads (Thread, ThreadPool, ThreadLocal)
  • Atomic operations (Interlocked)
  • Planning of actions (CountdownEvent, Timer)
  • Data structures for concurrent use (BlockingCollection, ConcurrentBag, ConcurrentDictionary, ConcurrentQueue, ConcurrentStack)


The class implements a classic synchronization primitive, which allows synchronizing in it a specified number of threads. There are two basic operations – take a semaphore (WaitOne) and release (Release). When creating a semaphore, you can specify the number of threads, which can call WaitOne without waiting for Release from another thread, so in other words, how many threads, can take a semaphore before the synchronization will be needed. Besides the number of threads you can specify a global semaphore identifier in the system, i.e. to make a named semaphore to use it not within one process but for interprocess synchronization. To get a named semaphore (even if it was created in another process) you can just use the OpenExisting method.

When using the Semaphore class it is necessary to take into consideration several features:

  • Semaphore doesn’t have an identification of threads, any thread can release the semaphore, even if it was taken by another thread (in fact it’s just an increase/decrease of counter in the semaphore).
  • If, for example, two threads have called WaitOne, and then one of them has called twice Release, then when the second thread will try to release semaphore it will get SemaphoreFullException exception.


Also a classic synchronization primitive, in fact, it is a special case of semaphore for a specified number of threads equal to 1, but with a little difference. It is used for an exclusive synchronized access to the resource. In contrast to the semaphore, Mutex identifies the threads that hold and release it, so it can be released only by the thread that holds it. If the thread, which took Mutex, has completed and hasn’t released it, then Mutex is taken under the next in the waiting queue thread’s possession and raises in it the AbandonedMutexException exception.


One more classic synchronization primitive, similar in use to Mutex, but Monitor by itself is not a synchronization object and it’s just a static class. For Monitor you need to specify an object, which will be an indication of synchronization. To simplify the use of monitors in C# there is a lock compiler’s instruction, which organizes the block of code, in which an exclusive synchronized access to the object is guaranteed, and also a release of monitor after leaving the code block, even in case of an exception.

It is worth to remember several Monitor features:

  • Primitive works exclusively with reference types
  • It’s not recommended to use public objects for synchronization, because this can lead to the deadlock
  • It’s recommended to use wherever possible a lock instruction instead of explicit calling of class methods


Imagine that you are a project manager and at your disposal there are several employees, who are solving some problems. You have a new immense task, which can be started only when all your employees will complete their current tasks. In that case you need a synchronization primitive Barrier. The main task of primitive is to register participants (AddParticipant), who are working on their tasks and upon their completion report about it and hang up (SignalAndWait) until all participants will complete their tasks. Then all the threads are unfreezed and continue the next “round” of their tasks following the same scheme, and Barrier counts these “rounds” or phases.


As evident from the name, the class performs a task of thread-safe lazy object initialization (EnsureInitialized(ref target)). But do not confuse this primitive with Singleton pattern, because class doesn’t guarantee that the object will be created only once, with concurrent access to the synchronization primitive several objects can be created, but only one will be saved in target in all threads.


Implementation of parallel programming pattern in which there is a control of access to the object according to the scheme “Many readers OR one writer”. Threads, producing reading from the object, should take a read lock (AcquireReaderLock), and upon completion of reading process unlock (ReleaseReaderLock), the same way threads-writers should follow (AcquireWriterLock, ReleaseWriterLock). ReaderWriterLock identifies threads so one thread can’t take two types of block simultaneously.

When using the primitive it’s necessary to take into consideration some features:

  • Recording will be done when there are no read locks, so it will happen rarely
  • If a request for lock can’t be satisfied, then the thread gets ApplicationException.

SpinLock and SpinWait

These structures are used for low-level locks and waiting, it is not recommended to use them without need, only if the performance is critical and you have carefully read all the information about these structures in MSDN.

Thread, ThreadLocal, ThreadPool

Thread is .NET wrapper for operating system’s threads, which allows you to perform the operation asynchronously and to control its execution. The priority of execution, pause of execution, interruption collections and also cross-thread synchronization are supported (calling of Join method blocks caused thread until the thread, which method was called, won’t complete).
ThreadLocal provides a local data storage for each thread. So if the thread has initialized this storage with some data, then they will be available only from this thread, and in other thread the object won’t be initialized.

ThreadPool is an idea of threads pool similar with semaphore: there is a limited number of threads that can exist in the pool, each thread is provided with a task on execution, but when the next task comes (QueueUserWorkItem) and there are no free threads in the pool, the task is placed into queue until some of the threads won’t be released.

When using ThreadPool it is necessary to take into consideration several features:

  • All threads in the pool are background (Background = true)
  • Timers and registered wait operations in .NET also use ThreadPool
  • For each process one pool of threads is created, which by default contains 250 working threads per each processor and 1000 threads to complete the input/output
  • When all free threads receive a task, then for the next task thread is created with a delay of 0.5 seconds

Concurrent Collections

From the presented data structures a special attention deserves BlockingCollection and ConcurrentBag, all the rest of collections are only adapted for concurrent use earlier existed collections.


The aim of this collection is to provide a limited in size storage, and if a thread will try to write to the filled collection, it will be blocked until in the collection a place won’t be released, and if a thread will try to read an element from empty collection, then it will be blocked until at least one element won’t appear there.


This collection has a high performance because it doesn’t care about saving of the order of added to it elements, it is useful when you need to collect data from different threads to send the result set as parameter in the next method.

Task Parallel Library

The main namespace for the library is System.Threading.Tasks. The library contains high-level tools for parallel tasks, in which there is no need for developer to think about synchronization, locks, creation and management of threads.

Conditionally the library can be divided into two parts:

  • Parallel - class realizing a parallel loop operator with its variations and Invoke method for quick asynchronous calling of the action via Task.
  • Task, TaskFactory, TaskScheduler – classes related to the planning and statement of parallel tasks


Task class is a high-level analog of Thread class, but in reality is run as a delegate in ThreadPool. Besides the advanced tools for combining and scheduling of tasks Microsoft claims that TPL more effectively distributes and controls the resources, uses in it ThreadPool, improved with algorithms, which optimally adapt the number of threads. And also an obvious advantage is a more flexible processing of results and exceptions in the tasks.

When using Task it is necessary to take into consideration several features:

  • If the task returns a result , it is placed in Result property and the access to it is synchronized with task execution completion
  • If during the task execution an exception has occurred, it will be put in AggregateException and placed in Exception property, which is also synchronized, and also flag IsFaulted will be set.
  • If there are threads waiting for Result field, in them this exception will be automatically thrown.
  • One parallel task can have many subtasks, building of tree structure of tasks is possible.

When formulating a task developer can specify the parameters of its execution TaskCreationOptions:

  • None – behavior by default
  • PreferFairness – Recommendation for TaskScheduler to plan tasks while maximally saving the priority by task formulating order (earlier start – earlier completion).
  • LongRunning – notifies the scheduler that the task will be executed for a long time, such tasks are not placed in ThreadPool.
  • AttachedToParent – notifies the scheduler that the task is connected to the parent task and placed in the local queue of the parent task, not in ThreadPool.


This structure is nothing but realization of parallel programming pattern «Conditional Variable». In TPL it is adopted to use exactly this structure for breaking of parallel tasks, it is supported by Task class and PLINQ mechanisms.

Access to the structure is synchronized, you can pass it to the task and then call Cancel() method, putting the state that the task should be canceled. Meanwhile inside the task developer decides by himself, when to check that the task was canceled and to break it. There is no abnormal completion of any data processing, developer has an ability to handle the breaking of task and to prepare a normal state (for example, complete iteration of the loop).

Figure 2 Example of using a mechanism of tasks breaking in TPL


Besides TPL in the new version of .NET Framework Microsoft decided to improve LINQ and presented its parallel realization – PLINQ (Parallel Language Integrated Query). Familiar to .NET developers LINQ got a realization that allows to process data in parallel without thinking about thread safety, synchronization, collecting of resulting set from different threads, optimal size of subsets on which the data for parallel processing are divided. PLINQ provides high-level approaches to parallel data processing, but if necessary it provides a developer the freedom to configure independently
basic parameters of the process, that makes PLINQ also rather flexible.

PLINQ can be used with all .NET collections, which implement System.Collections.IEnumerable, because in .NET Framework 4.0 appeared a class of extension methods ParallelEnumerable and in it an extension AsParallel for IEnumerable, which returns the wrapper over collection of ParallelQuery type and allows with almost the same interface as in LINQ to work with the collection in parallel. Meanwhile the data are divided into segments, and the type of collection for optimal partition is considered, and when there is no opportunity of data processing in parallel a serial mode is automatically used. As was already mentioned, you can use CancellationToken for breaking of collection processing.

Before using of PLINQ you should think about mechanism overhead costs on segmenting, collecting and organizing of data, what is the degree of parallelism in your system (how many cores/processors), whether you have large enough amount of data to process it in parallel or it will be cheaper to process it sequentially. An important factor is also a good knowledge of PLINQ documentation, in which are described the cases, when the mechanism starts to use a serial mode. If you do not trust the segmentation, which runs by default in PLINQ, you can use your segmenter by creating it with Partitioner.Create method.


Without any doubt in the 4th version of .NET Framework a huge work was done, the result of which makes it easier to use parallel algorithms and all the benefits of multi-core systems, which are already installed by the majority of users, and as these tools are very high-level this decreases the possibility of an error during the application development, running in several threads.

© Copyright 2001-2012 Enterra Inc - Software outsourcing and Offshore Software development company.
All rights reserved. All content is copyrighted.

All trademarks mentioned in this blog belong to their respective owners.
Entries (RSS) and Comments (RSS).