Home Knowledge Base SELFIES (Self-Referencing Embedded Strings)

SELFIES (Self-Referencing Embedded Strings) is a molecular string representation designed to guarantee that every possible string corresponds to a valid molecular graph — eliminating the validity problem that plagues SMILES-based generation by using a context-free grammar with derivation rules that make syntactic or chemical invalidity mathematically impossible, enabling unconstrained exploration of string space with 100% valid molecular output.

What Is SELFIES?

Why SELFIES Matters

SELFIES vs. SMILES Comparison

PropertySMILESSELFIES
Validity guaranteeNo — many strings are invalidYes — every string is valid
Random string validity~0.1% of random strings are valid100% of random strings are valid
Mutation robustnessMutations often break validityAll mutations produce valid molecules
ReadabilityHuman-readableLess intuitive for humans
GrammarContext-sensitive (brackets, digits)Context-free (self-referencing)
AdoptionUniversal standard in chemistryGrowing adoption in ML for molecules

SELFIES is crash-proof chemistry — a molecular representation language engineered so that any possible string of tokens always decodes to a valid molecule, transforming molecular generation from a constrained optimization problem (generate valid molecules) into an unconstrained one (generate any string and it will be valid).

selfiesselfieschemistry ai

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.