Abbreviation Handling

How abbreviation detection works

Snapper uses Unicode UAX #29 sentence boundary detection as a baseline. UAX #29 sometimes splits at periods that belong to abbreviations rather than sentence endings. Snapper post-processes the split results, merging segments where the break occurred at a known abbreviation.

Select a language with --lang to use the appropriate abbreviation set:

snapper --lang de paper.tex    # German abbreviations
snapper --lang fr article.md   # French abbreviations

Available languages: en (default), de, fr, is, pl. Set in config: lang = "de" in .snapperrc.toml.

Built-in abbreviations (English, default)

Titles and honorifics

Mr., Mrs., Ms., Dr., Prof., Sr., Jr., St., Rev., Gen., Gov., Sgt., Cpl., Pvt., Capt., Lt., Col., Maj., Cmdr., Adm.

Academic and scientific

Fig., Figs., Eq., Eqs., Ref., Refs., Tab., Sec., Ch., Vol., No., Nos., Ed., Eds., Trans., Dept., Thm., Lem., Prop., Def., Cor., Rem., Ex.

Latin

e.g., i.e., et al., cf., etc., viz., ibid., ca., approx., v.s.

Time and dates

Jan., Feb., Mar., Apr., Jun., Jul., Aug., Sep., Oct., Nov., Dec., Mon., Tue., Wed., Thu., Fri., Sat., Sun., a.m., p.m.

Common

vs., misc., est., govt., dept., univ., inc., corp., ltd., Ave., Blvd., Rd., pp., pg., pt., pts.

Single-letter initials

  1. through Z. (for names like J. K. Rowling).

German abbreviations (--lang de)

Hr., Fr., Dr., Prof., Abb., Bd., Hrsg., Kap., Nr., S., Verl., Aufl., Jg., Anm., Anh., Beil., Tab., Gl., Abschn., Bsp., Str., Pl., bzw., ca., etc., evtl., ggf., vgl., usw.

Multi-word: z.B., d.h., u.a., o.g., s.o., u.U.

French abbreviations (--lang fr)

M., Mme., Mlle., Dr., Prof., Me., fig., eq., chap., vol., p., pp., ed., trad., n., t., av., apr., env., cf., etc.

Multi-word: c.-a-d., p.ex.

Icelandic abbreviations (--lang is)

Hr., Fr., Dr., sbr., frk., sk., nr.

Multi-word: m.a., o.fl.

Polish abbreviations (--lang pl)

dr., mgr., prof., doc., rys., tab., wyd., red., t., s., nr., poz., zob., por., ul., al., pl., os.

Multi-word: m.in., t.j., j.w., t.zw., b.r.

Adding project-specific abbreviations

Create a .snapperrc.toml in your project root:

extra_abbreviations = ["GROMACS", "LAMMPS", "DFT", "VASP", "Abstr", "Suppl"]

These merge with the built-in list at runtime.

Inline token protection

Periods inside inline tokens never trigger sentence breaks, regardless of abbreviation lists:

  • Org links: [[https://example.com][Ex. Site]]

  • LaTeX math: $x = 3.14$

  • LaTeX commands: \cite{smith.2024}

  • Markdown links: [Example Inc.](url)

  • Inline code: ~std.io.Read~,

These tokens get replaced with safe placeholders before sentence detection and restored afterward.