ST.25 vs. ST.26 Sequence Listings

September 13, 2022 / Biologics & Biosimilars, BioPharma, Chemistry & Nanotechnology, Additional Topics

For the second article in our Big Bang Blog Series, we break down the differences between the WIPO Standards ST.25 and ST.26. Read along for details including a direct comparison of the old and the new.

What are the benefits of new WIPO standard ST.26?

ST.26 allows standardization of sequence listing filing across multiple patent offices.
Data could be lost during ST.25 transfer to sequence databases, while ST.26 is compliant with current public sequence database requirements.
With ST.26 sequence listings, Offices and applicants will benefit from automated validation and comprehensive searching capabilities.

What are the main differences between ST.25 and ST.26?

ST.25-compliant sequence listings could be filed as TXT or PDF files, while ST.26-compliant sequence listings are to be filed in XML (extensible markup language) format.
ST.25 does not require inclusion of D-amino acids, linear portions of branched sequences, or nucleotide analogs, while ST.26 does.
ST.25 does permit inclusion of sequences with less than 10 nucleotides and less than 4 amino acids, while such sequences are prohibited in ST.26.
DNA and RNA molecule types must be further described.
For more details on ST.25 to ST.26 changes, see the helpful table below reproduced from WIPO’s ST.26 Introduction Webinar: WIPO ST.26: Introduction..

WIPO ST.25

WIPO ST.26

ASCII .txt with numeric identifiers

XML with elements and attributes

Not required to include:

– D-amino acids

– Linear portions of branched sequences

– Nucleotide analogs

Must include:

– D-amino acids

– Linear portions of branched sequences

– Nucleotide analogs

Annotation of sequences:

– Feature keys only

Annotation of sequences:

– Feature keys and qualifiers

Permitted to include sequences:

– < 10 specifically defined nucleotides

– < 4 specifically defined amino acids

Prohibited sequences:

– < 10 specifically defined nucleotides

– < 4 specifically defined amino acids

ALL priority application information may be included

ONLY the earliest priority application can be included

ALL applicant and inventor names may be included

ONLY one applicant AND optionally ONE inventor may be included

One invention title permitted

Multiple invention titles permitted, each one in a different language

Applicant/inventor names and invention titles must be in basic Latin characters

Applicant/inventor names may be included using any valid Unicode character along with a basic Latin translation or transliteration

Sequences identified as DNA, RNA, or PRT only

Sequences identified as DNA, RNA, or AA along with a mandatory mol_type qualifier to further describe the molecule

Organism names:

– Latin genus/species

– Virus name

– “artificial sequence”

– “unknown”

Organisms names:

– Latin genus/species

– Virus name

– “synthetic construct”

– “unidentified”

“u” represents uracil in nucleotide sequences

“t” represents uracil in RNA sequences and thymine in DNA sequences

Amino acid sequences represented by three letter abbreviations

Amino acid sequences represented by one letter abbreviations

“n” and “Xaa” variables must have a definition provided in a feature

Default value assumed for “n” and “X” variables with no definition

Feature location format not clearly defined

Strictly defined feature location formats; permits use of “<” and “>” in all sequence types, and “^”, “join”, “order”, and “complement” in nucleotide sequences

“Mixed mode” sequences permitted – nucleotide sequence with amino acid translation shown below

NO “mixed mode”; nucleotide translations are included in “translation” qualifiers only