Supported Formats¶

snapper classifies text into prose regions (reflowed at sentence boundaries), structure regions (passed through unchanged), and code regions (fenced or delimited source blocks). Code regions keep fence/open/close lines as structure; the body reflows comment lines when [code.<lang>] is configured, and may run an external formatter when --format-code is set. The classification depends on the format.

Org-mode (`--format org`)¶

Structure regions (preserved)¶

Non-source #+BEGIN_* …

#+END_* blocks (example, quote, etc.)

:PROPERTIES: …

:END: drawers

#+KEYWORD: directives (TITLE, AUTHOR, DATE, OPTIONS, etc.)
Table rows (lines starting with |)
Comment lines (starting with # but not #+ )
Full headline lines (stars, optional TODO keyword, and title text)
List item markers (-, +, 1.)
LaTeX environments (\begin{equation} … \end{equation}, \begin{align}, etc.)
Display math (\[ … \])
Inline export snippets (@@latex:\newpage@@, @@html:<br>@@)

Code regions (`#+BEGIN_SRC` … `#+END_SRC`)¶

Header and footer lines are structure
Body non-comment lines pass through verbatim
Body comment lines reflow at sentence boundaries when the language has line_comment and/or block_comment under [code.<lang>]
With --format-code, optional formatter argv runs on the body (graceful fallback on failure)

Prose regions (reflowed)¶

Paragraph text
List item text (after the marker)

Inline tokens (kept atomic)¶

These tokens within prose are not split across lines:

Links: [[url][description]]
Emphasis: \*bold*, /italic/, _underline_, +strike+
Inline code: ~code~, ==verbatim==
Inline export snippets: @@backend:value@@
URLs: https://... (trailing sentence punctuation not swallowed)

LaTeX (`--format latex`)¶

Structure regions (preserved)¶

Preamble (everything before \begin{document})
Non-prose environments: equation, align, figure, table, tabular, tikzpicture, and their starred variants (plus other non-code envs)
Display math: \[...\]
Comment lines (starting with %)
\end{document}
Full sectioning command lines (\section{...}, \subsection{...}, and friends, including title text)

Code regions (`minted`, `lstlisting`, `verbatim`)¶

\\begin{...} / \\end{...} lines are structure
Body follows the same comment-reflow and optional --format-code rules as other formats when language is known (minted language arg, lstlisting language= option)

Prose regions (reflowed)¶

Body text between structural elements

Markdown (`--format markdown`)¶

Structure regions (preserved)¶

Front matter (--- or +++ delimited at file start)
Full ATX heading lines (# … ###### including title text)
Setext headings (title line plus ==== or —— underline)
List item markers (-, \*, +, 1.)
Pipe tables

Code regions (fenced ``` / `~~~`)¶

Opening and closing fence lines are structure
Indented fence bodies preserve indentation on reflowed comment lines
Language from the fence info string selects [code.<lang>]; unknown or missing lang passes the body through unchanged (unless --format-code is not applicable without a formatter entry)

Prose regions (reflowed)¶

Paragraph text
List item text (after the marker)

reStructuredText (`--format rst`)¶

Structure regions (preserved)¶

Non-code directives (.. math::, .. image::, etc.) and their indented bodies
Literal blocks (text after :: with indented content)
Section titles and underlines (===, -----, etc.)
Field lists (:Author:, :Date:, etc.)
Comments (.. without a directive)
Grid and simple tables (lines starting with | or +)

Code regions (`.. code-block:: LANG`)¶

Directive line and trailing blank handling stay structure
Indented body uses the language token for [code.<lang>] comment reflow and optional --format-code

Prose regions (reflowed)¶

Paragraph text between structural elements

Auto-detection¶

Extensions: .rst, .rest

Plaintext (`--format plaintext`)¶

Everything is prose. Blank lines are preserved as paragraph separators.

Sentence Detection¶

snapper uses Unicode UAX #29 sentence boundary detection as a baseline (or optional --neural / nnsplit), then applies the same post-pipeline: abbreviation merges, then delimiter-span rejoin so dialogue and balanced ()[]{} spans are not fractured.

Delimiter-span policy (residual cases)¶

Balanced ASCII/curly/guillemet quotes, LaTeX ``…'', and ()[]{} must not gain a semantic line break mid-span (see tests/sentence_delim_props.rs).
Unclosed " (or open ‘ without ’) glues the rest of the paragraph; snapper does not invent closers.
Nested semantic ASCII quotes (say "hi" now with an inner pair) remain toggle-ambiguous; prefer typographic quotes or escapes in source.
Apostrophes in contractions (don't, it's) are not treated as dialogue openers.
Markdown fences (```) are not treated as LaTeX `` openers.
--neural runs the same abbreviation + span post-pipeline after the model proposes cuts (English papers can still prefer the rules path for fully offline, deterministic CI).

snapper merges false splits caused by known abbreviations:

Titles¶

Mr., Mrs., Ms., Dr., Prof., Sr., Jr., St., Rev., Gen., etc.

Academic¶

Fig., Figs., Eq., Eqs., Ref., Refs., Tab., Sec., Ch., Vol., No., Thm., Lem., Prop., Def., Cor., Rem., Ex.

Latin¶

e.g., i.e., et al., cf., etc., viz., ibid., ca., approx.

Single initials¶

A., B., C., … Z.

Date and time¶

Jan., Feb., …, Dec., Mon., Tue., …, Sun., a.m., p.m.

Quoted and parenthesized punctuation¶

Sentence punctuation inside quotes or parentheses does not trigger a false split when the next word starts lowercase. For example, He said "wow!" and left. stays on one line because "!" followed by lowercase and signals a continuation, not a new sentence. Patterns handled: !", ?", .", !), ?), .), and similar combinations with single quotes or brackets.

Supported Formats¶

Org-mode (--format org)¶

Structure regions (preserved)¶

Code regions (#+BEGIN_SRC … #+END_SRC)¶

Prose regions (reflowed)¶

Inline tokens (kept atomic)¶

LaTeX (--format latex)¶

Structure regions (preserved)¶

Code regions (minted, lstlisting, verbatim)¶

Prose regions (reflowed)¶

Markdown (--format markdown)¶

Structure regions (preserved)¶

Code regions (fenced ``` / ~~~)¶

Prose regions (reflowed)¶

reStructuredText (--format rst)¶

Structure regions (preserved)¶

Code regions (.. code-block:: LANG)¶

Prose regions (reflowed)¶

Auto-detection¶

Plaintext (--format plaintext)¶

Sentence Detection¶

Delimiter-span policy (residual cases)¶

Titles¶

Academic¶

Latin¶

Single initials¶

Date and time¶

Quoted and parenthesized punctuation¶

Org-mode (`--format org`)¶

Code regions (`#+BEGIN_SRC` … `#+END_SRC`)¶

LaTeX (`--format latex`)¶

Code regions (`minted`, `lstlisting`, `verbatim`)¶

Markdown (`--format markdown`)¶

Code regions (fenced ``` / `~~~`)¶

reStructuredText (`--format rst`)¶

Code regions (`.. code-block:: LANG`)¶

Plaintext (`--format plaintext`)¶