Concolic execution

Keywords: concolic execution,software engineering

Concolic execution (concrete + symbolic) is a hybrid program analysis technique that combines concrete execution with symbolic execution — running programs with actual input values while simultaneously tracking symbolic constraints, enabling more scalable path exploration than pure symbolic execution while maintaining systematic coverage.

What Is Concolic Execution?

- Concolic = Concrete + Symbolic: Execute program both concretely and symbolically at the same time.
- Concrete Execution: Run with actual input values — handles complex operations naturally.
- Symbolic Tracking: Track symbolic constraints on the concrete execution path.
- Iterative Exploration: Use constraints to generate new inputs that explore different paths.

How Concolic Execution Works

1. Initial Input: Start with a random or user-provided concrete input.

2. Concrete Execution: Run the program with the concrete input.

3. Symbolic Tracking: Simultaneously track symbolic constraints along the executed path.

4. Path Constraint Collection: Collect the sequence of branch conditions that led to this execution.

5. Constraint Negation: Negate one branch condition to explore an alternative path.

6. Constraint Solving: Solve the modified constraints to generate a new concrete input.

7. Iteration: Execute with the new input, repeat the process.

8. Coverage: Continue until desired coverage is achieved or time limit is reached.

Example: Concolic Execution

``python
def test_function(x, y):
if x > 0: # Branch 1
if y < 10: # Branch 2
return "A"
else:
return "B"
else:
return "C"

# Iteration 1: Concrete input x=5, y=3
# Concrete execution: x=5 > 0 (true), y=3 < 10 (true) → "A"
# Symbolic constraints: α > 0 AND β < 10
# Path explored: True, True

# Iteration 2: Negate last branch
# New constraints: α > 0 AND β >= 10
# Solve: α=5, β=10
# Concrete execution: x=5 > 0 (true), y=10 < 10 (false) → "B"
# Path explored: True, False

# Iteration 3: Negate first branch
# New constraints: α <= 0
# Solve: α=0, β=0
# Concrete execution: x=0 > 0 (false) → "C"
# Path explored: False

# Result: All 3 paths covered with 3 test inputs!
`

Concolic vs. Pure Symbolic Execution

- Pure Symbolic Execution:
- Pros: Explores all paths systematically, no concrete values needed.
- Cons: Path explosion, complex constraints, environment modeling challenges.

- Concolic Execution:
- Pros: Handles complex operations concretely, more scalable, easier environment interaction.
- Cons: Explores one path at a time (slower than forking), may miss some paths.

Advantages of Concolic Execution

- Handles Complex Operations: Concrete execution naturally handles operations that are hard to model symbolically.
- Example: Hash functions, encryption, floating-point arithmetic.
- Symbolic execution struggles with these; concolic execution just executes them.

- Environment Interaction: Concrete execution can interact with real environment.
- Example: File I/O, network, system calls.
- No need for complex symbolic models.

- Scalability: More scalable than pure symbolic execution.
- Explores one path at a time — no exponential path explosion.
- Constraint solving is simpler — constraints from single path, not merged paths.

- Practical: Works on real programs with libraries and system dependencies.

Concolic Execution Tools

- DART: The original concolic execution tool.
- CUTE: Concolic unit testing engine for C.
- SAGE: Microsoft's concolic fuzzer for x86 binaries — found many Windows bugs.
- jCUTE: Concolic execution for Java.
- Driller: Combines fuzzing with concolic execution.

Applications

- Automated Test Generation: Generate test inputs that achieve high coverage.
- Bug Finding: Find crashes, assertion violations, security vulnerabilities.
- Fuzzing Enhancement: Use concolic execution to get past complex checks that block fuzzers.
- Exploit Generation: Generate inputs that trigger specific vulnerabilities.

Example: Finding Buffer Overflow

`c
void process(char *input) {
if (input[0] == 'M' &&
input[1] == 'A' &&
input[2] == 'G' &&
input[3] == 'I' &&
input[4] == 'C') {
// Magic string found
char buffer[10];
strcpy(buffer, input + 5); // Potential overflow
}
}

// Random fuzzing struggles to find "MAGIC" prefix
// Concolic execution:
// Iteration 1: input = "AAAAA..." → fails first check
// Constraints: input[0] != 'M'
// Negate: input[0] == 'M', solve → input = "MAAAA..."

// Iteration 2: input = "MAAAA..." → fails second check
// Constraints: input[0] == 'M' AND input[1] != 'A'
// Negate: input[0] == 'M' AND input[1] == 'A', solve → input = "MAAAA..."

// ... continues until "MAGIC" is found ...
// Then explores overflow path with long input after "MAGIC"
``

Hybrid Fuzzing (Fuzzing + Concolic)

- Driller Approach:
1. Start with coverage-guided fuzzing (fast, explores many paths).
2. When fuzzing gets stuck (no new coverage), use concolic execution.
3. Concolic execution generates inputs to get past complex checks.
4. Return to fuzzing with new inputs.

- Benefits: Combines speed of fuzzing with precision of concolic execution.

Challenges

- Constraint Complexity: Even single-path constraints can be complex.
- Path Selection: Which path to explore next? Heuristics needed.
- Loops: Unbounded loops create infinitely many paths.
- Symbolic Pointers: Pointer arithmetic and dereferencing can be challenging.
- Floating Point: Floating-point constraints are difficult for SMT solvers.

Optimization Techniques

- Incremental Solving: Reuse solver state across iterations.
- Path Prioritization: Explore paths likely to find bugs or increase coverage first.
- Constraint Caching: Cache constraint solving results.
- Symbolic Simplification: Simplify constraints before solving.

LLMs and Concolic Execution

- Path Selection: LLMs can suggest which paths to explore based on code analysis.
- Seed Input Generation: LLMs can generate good initial inputs.
- Constraint Interpretation: LLMs can explain what constraints mean and why paths are infeasible.
- Bug Triage: LLMs can analyze bugs found by concolic execution and prioritize them.

Benefits

- Systematic Coverage: Explores paths systematically, not randomly.
- Handles Complexity: Concrete execution handles operations that symbolic execution struggles with.
- Practical: Works on real programs with libraries and system calls.
- Effective Bug Finding: Finds deep bugs requiring specific input sequences.

Limitations

- One Path at a Time: Slower than pure symbolic execution's path forking.
- Incomplete: May not explore all paths due to time/resource limits.
- Constraint Solving: Still requires SMT solver — can be slow for complex constraints.

Concolic execution is a practical and effective program analysis technique — it combines the best of concrete and symbolic execution to achieve systematic path exploration while handling real-world program complexity, making it widely used in automated testing and security analysis.

Want to learn more?

Search 13,225+ semiconductor and AI topics or chat with our AI assistant.

Search Topics Chat with CFSGPT