Abbreviation Handling¶
How abbreviation detection works¶
Snapper uses Unicode UAX #29 sentence boundary detection as a baseline. UAX #29 sometimes splits at periods that belong to abbreviations rather than sentence endings. Snapper post-processes the split results, merging segments where the break occurred at a known abbreviation.
Select a language with --lang to use the appropriate abbreviation set:
snapper --lang de paper.tex # German abbreviations
snapper --lang fr article.md # French abbreviations
Available languages: en (default), de, fr, is, pl.
Set in config: lang = "de" in .snapperrc.toml.
Built-in abbreviations (English, default)¶
Titles and honorifics¶
Mr., Mrs., Ms., Dr., Prof., Sr., Jr., St., Rev., Gen., Gov., Sgt., Cpl., Pvt., Capt., Lt., Col., Maj., Cmdr., Adm.
Academic and scientific¶
Fig., Figs., Eq., Eqs., Ref., Refs., Tab., Sec., Ch., Vol., No., Nos., Ed., Eds., Trans., Dept., Thm., Lem., Prop., Def., Cor., Rem., Ex.
Latin¶
e.g., i.e., et al., cf., etc., viz., ibid., ca., approx., v.s.
Time and dates¶
Jan., Feb., Mar., Apr., Jun., Jul., Aug., Sep., Oct., Nov., Dec., Mon., Tue., Wed., Thu., Fri., Sat., Sun., a.m., p.m.
Common¶
vs., misc., est., govt., dept., univ., inc., corp., ltd., Ave., Blvd., Rd., pp., pg., pt., pts.
Single-letter initials¶
through Z. (for names like J. K. Rowling).
German abbreviations (--lang de)¶
Hr., Fr., Dr., Prof., Abb., Bd., Hrsg., Kap., Nr., S., Verl., Aufl., Jg., Anm., Anh., Beil., Tab., Gl., Abschn., Bsp., Str., Pl., bzw., ca., etc., evtl., ggf., vgl., usw.
Multi-word: z.B., d.h., u.a., o.g., s.o., u.U.
French abbreviations (--lang fr)¶
M., Mme., Mlle., Dr., Prof., Me., fig., eq., chap., vol., p., pp., ed., trad., n., t., av., apr., env., cf., etc.
Multi-word: c.-a-d., p.ex.
Icelandic abbreviations (--lang is)¶
Hr., Fr., Dr., sbr., frk., sk., nr.
Multi-word: m.a., o.fl.
Polish abbreviations (--lang pl)¶
dr., mgr., prof., doc., rys., tab., wyd., red., t., s., nr., poz., zob., por., ul., al., pl., os.
Multi-word: m.in., t.j., j.w., t.zw., b.r.
Adding project-specific abbreviations¶
Create a .snapperrc.toml in your project root:
extra_abbreviations = ["GROMACS", "LAMMPS", "DFT", "VASP", "Abstr", "Suppl"]
These merge with the built-in list at runtime.
Inline token protection¶
Periods inside inline tokens never trigger sentence breaks, regardless of abbreviation lists:
Org links:
[[https://example.com][Ex. Site]]LaTeX math:
$x = 3.14$LaTeX commands:
\cite{smith.2024}Markdown links:
[Example Inc.](url)Inline code:
~std.io.Read~,
These tokens get replaced with safe placeholders before sentence detection and restored afterward.