Task Parallelism

Task Parallelism is a parallel programming model where computation is decomposed into discrete tasks with dependency relationships — a directed acyclic graph (DAG) of work units is executed by a runtime scheduler that assigns tasks to threads as dependencies are satisfied.

Task vs. Data Parallelism

- Data Parallelism: Same operation on all elements (SIMD, GPU kernels). Regular structure.
- Task Parallelism: Different operations with complex dependencies. Irregular structure.
- Real applications: Often both — outer task parallelism, inner data parallelism.

Task DAG (Directed Acyclic Graph)

- Each node = task (function, lambda, coroutine).
- Directed edge A→B = B cannot start until A completes (data dependency).
- Critical path: Longest chain of dependent tasks — limits parallelism.
- Span (D): Critical path length. Work (T1): Total computation.
- Parallelism: T1/D — ideal speedup with infinite processors.

Work Stealing Scheduler

- Each thread maintains a deque (double-ended queue) of ready tasks.
- Thread pops tasks from its own deque bottom (LIFO — cache friendly).
- Idle thread "steals" task from another thread's deque top (FIFO — older work).
- Efficiency: Work stealing achieves near-optimal load balance with O(P × D) overhead.
- Used by: Intel TBB, OpenMP tasks, Cilk, std::async.

TBB (Threading Building Blocks) Example

``cpp tbb::task_group tg; tg.run([&]{ task_A(); }); // Launch task A tg.run([&]{ task_B(); }); // Launch task B in parallel tg.wait(); // Wait for both task_C(); // C runs after A and B`

Frameworks

- Intel TBB: C++ task group, parallel_for, pipeline. - OpenMP Tasks:#pragma omp task with dependclauses. - Cilk:cilk_spawn / cilk_sync` — fork-join model.
- Python Dask: Task graph for distributed data processing.
- C++ Taskflow: Header-only, GPU-CPU heterogeneous task graph.

Task parallelism is the key to exploiting irregular parallelism in real-world workloads — compilers, data analytics, simulation pipelines, and AI inference all benefit from task DAG scheduling over static partitioning.

Want to learn more?