Home Knowledge Base Data Clumps

Data Clumps are a code smell where the same group of 3 or more data items repeatedly appear together across function parameter lists, class fields, and object initializations — indicating a missing domain abstraction that should encapsulate the group into a named object, transforming scattered parallel variables into a coherent concept with its own identity, validation logic, and behavior.

What Are Data Clumps?

A data clump is recognized by the fact that removing one member of the group renders the others meaningless or incomplete:

Why Data Clumps Matter

Refactoring: Introduce Parameter Object / Value Object

1. Identify the recurring group of data items. 2. Create a new class (Value Object) encapsulating them. 3. Add validation in the constructor. 4. Add behavior that naturally belongs with the data (often migrating Feature Envy methods). 5. Replace all parameter clumps with the new object.

# Before: Data Clump
def send_package(from_street, from_city, from_zip,
                  to_street, to_city, to_zip):
    ...

# After: Introduce Parameter Object
@dataclass
class Address:
    street: str
    city: str
    zip_code: str
    def validate(self): ...

def send_package(from_address: Address, to_address: Address):
    ...

Detection

Automated tools detect Data Clumps by:

Tools

Data Clumps are the fingerprints of missing objects — recurring patterns of data that travel together everywhere, silently begging to be recognized as a domain concept, named, encapsulated, and given the validation logic and behavior that belongs with the data they represent.

data clumpscode ai

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.