Home Knowledge Base Task Graph Execution and Runtime Systems

Task Graph Execution and Runtime Systems is the parallel computing model where computation is expressed as a directed acyclic graph (DAG) of tasks with explicit data dependencies — enabling runtime systems to automatically schedule, parallelize, and manage heterogeneous resources (CPU cores, GPUs, remote nodes) by exploiting dependency analysis to identify which tasks can execute in parallel, while deferring scheduling decisions to runtime to adapt to actual resource availability and data locality. Task-based runtimes like Legion, StarPU, OpenMP tasks, and Intel TBB implement this model.

Why Task Graphs Over Fork-Join

Task DAG Structure

Task A → Task B → Task D → Task F (Output)
      ↘ Task C ↗

- A must complete before B and C start
- B and C run in parallel (no dependency between them)
- D waits for both B and C to complete
- Critical path: A → B → D → F (or A → C → D → F)

Critical Path and Span

OpenMP Task-Based Parallelism

#pragma omp parallel
#pragma omp single
{
    #pragma omp task depend(out: A_data)
    compute_A();
    
    #pragma omp task depend(in: A_data) depend(out: B_data)
    compute_B();  // B depends on A
    
    #pragma omp task depend(in: A_data) depend(out: C_data)
    compute_C();  // C depends on A, runs parallel with B
    
    #pragma omp task depend(in: B_data, C_data)
    compute_D();  // D depends on both B and C
}

Intel TBB (Threading Building Blocks)

Legion Runtime

StarPU

Task Graph for Deep Learning (MLCommons)

Scheduling Algorithms

AlgorithmStrategyGood For
List schedulingSort tasks by HEFT priority → schedule greedilyHeterogeneous tasks
Work-stealingIdle workers steal from busy onesDynamic, irregular
Critical-pathPrioritize tasks on critical pathBalanced DAGs
HEFTHeterogeneous Earliest Finish TimeMulti-device scheduling

Task graph execution is the runtime intelligence that makes heterogeneous parallel computing tractable — by expressing computation as an explicit dependency graph and delegating scheduling to a runtime that can observe actual resource availability, data locality, and task performance, task-based systems adaptively achieve parallelism that static fork-join programs would require manual tuning to approach, enabling scientific codes and AI frameworks to scale naturally across the diverse mix of CPUs, GPUs, and memory hierarchies that characterize modern heterogeneous computing infrastructure.

task graph executiondag executiontasking runtimelegion runtimestarputask-based parallelism

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.