Task Parallelism is a parallel programming model where computation is decomposed into discrete tasks with dependency relationships — a directed acyclic graph (DAG) of work units is executed by a runtime scheduler that assigns tasks to threads as dependencies are satisfied.
Task vs. Data Parallelism
- Data Parallelism: Same operation on all elements (SIMD, GPU kernels). Regular structure.
- Task Parallelism: Different operations with complex dependencies. Irregular structure.
- Real applications: Often both — outer task parallelism, inner data parallelism.
Task DAG (Directed Acyclic Graph)
- Each node = task (function, lambda, coroutine).
- Directed edge A→B = B cannot start until A completes (data dependency).
- Critical path: Longest chain of dependent tasks — limits parallelism.
- Span (D): Critical path length. Work (T1): Total computation.
- Parallelism: T1/D — ideal speedup with infinite processors.
Work Stealing Scheduler
- Each thread maintains a deque (double-ended queue) of ready tasks.
- Thread pops tasks from its own deque bottom (LIFO — cache friendly).
- Idle thread "steals" task from another thread's deque top (FIFO — older work).
- Efficiency: Work stealing achieves near-optimal load balance with O(P × D) overhead.
- Used by: Intel TBB, OpenMP tasks, Cilk, std::async.
TBB (Threading Building Blocks) Example
``cpp`
tbb::task_group tg;
tg.run([&]{ task_A(); }); // Launch task A
tg.run([&]{ task_B(); }); // Launch task B in parallel
tg.wait(); // Wait for both
task_C(); // C runs after A and B
Frameworks
- Intel TBB: C++ task group, parallel_for, pipeline.
- OpenMP Tasks: #pragma omp task with depend clauses.cilk_spawn
- Cilk: / cilk_sync` — fork-join model.
- Python Dask: Task graph for distributed data processing.
- C++ Taskflow: Header-only, GPU-CPU heterogeneous task graph.
Task parallelism is the key to exploiting irregular parallelism in real-world workloads — compilers, data analytics, simulation pipelines, and AI inference all benefit from task DAG scheduling over static partitioning.