Home Knowledge Base nlpaug

nlpaug is a Python library specifically designed for augmenting text data in NLP pipelines — providing character-level (typo simulation, keyboard errors), word-level (synonym replacement via WordNet or word embeddings, random insertion/deletion/swap), and sentence-level (back-translation, contextual word replacement using BERT/GPT-2) augmentation techniques that generate diverse synthetic training examples to reduce overfitting and improve model robustness on text classification, named entity recognition, and other NLP tasks.

What Is nlpaug?

Three Augmentation Levels

LevelTechniqueExamplePreserves Meaning?
CharacterKeyboard error"hello" → "heklo"Mostly (simulates typos)
CharacterOCR error"hello" → "he11o"Mostly (simulates scan errors)
CharacterRandom insert/delete"hello" → "helllo"Mostly
WordSynonym (WordNet)"The quick fox" → "The fast fox"Yes
WordWord embedding (Word2Vec)"happy" → "joyful"Yes
WordTF-IDF basedReplace low-TF-IDF wordsYes
WordRandom swap"I love cats" → "love I cats"Partial
SentenceBack-translation"I love cats" → "J'adore les chats" → "I adore cats"Yes
SentenceContextual (BERT)"The [MASK] fox" → "The brown fox"Usually
SentenceAbstractive summarizationRephrase entire sentenceYes

Code Examples

import nlpaug.augmenter.word as naw
import nlpaug.augmenter.char as nac
import nlpaug.augmenter.sentence as nas

# Synonym replacement (WordNet)
aug = naw.SynonymAug(aug_src='wordnet')
aug.augment("The quick brown fox jumps over the lazy dog.")
# "The fast brown fox leaps over the lazy dog."

# Contextual word replacement (BERT)
aug = naw.ContextualWordEmbsAug(
    model_path='bert-base-uncased', action='substitute'
)
aug.augment("The weather is nice today.")
# "The weather is pleasant today."

# Character-level keyboard errors
aug = nac.KeyboardAug()
aug.augment("Machine learning is powerful.")
# "Machone learning is powerfyl."

nlpaug vs Alternatives

LibraryStrengthsLimitations
nlpaugUnified API, three levels, transformer supportSlower for BERT-based augmentation
TextAttackAdversarial examples + augmentationMore complex API
EDA (Easy Data Augmentation)Dead simple, 4 operationsNo embedding/transformer support
AugLy (Meta)Multi-modal (text + image + audio)Heavier dependency
Custom Back-TranslationHighest quality paraphrasesRequires translation API/model

When to Use nlpaug

ScenarioRecommended AugmenterWhy
Small dataset (<1K examples)Synonym + Back-translationMaximum diversity with meaning preservation
Typo robustnessCharacter-level keyboard augTrain model to handle real-world typos
Text classificationWord-level synonym + contextualDiverse lexical variation
NER / Token classificationCharacter-level onlyWord-level changes can shift entity boundaries

nlpaug is the standard Python library for NLP data augmentation — providing a clean, unified API across character, word, and sentence-level augmentation that generates linguistically diverse training examples, with transformer-based contextual augmentation (BERT, GPT-2) producing the highest-quality synthetic text for improving model robustness on small NLP datasets.

nlpaugtextaugmentation

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.