Supported Formats¶

snapper classifies text into prose regions (reflowed at sentence boundaries) and structure regions (passed through unchanged). The classification depends on the format.

Org-mode (`--format org`)¶

Structure regions (preserved)¶

#+BEGIN_* …

#+END_* blocks (source, example, quote, etc.)

:PROPERTIES: …

:END: drawers

#+KEYWORD: directives (TITLE, AUTHOR, DATE, OPTIONS, etc.)
Table rows (lines starting with |)
Comment lines (starting with # but not #+ )
Headline stars and TODO keywords
List item markers (-, +, 1.)

Prose regions (reflowed)¶

Paragraph text
Headline text (after the stars and keyword)
List item text (after the marker)

Inline tokens (kept atomic)¶

These tokens within prose are not split across lines:

Links: [[url][description]]
Inline code: ~code~, ==verbatim==

LaTeX (`--format latex`)¶

Structure regions (preserved)¶

Preamble (everything before \begin{document})
Non-prose environments: equation, align, figure, table, tabular, lstlisting, verbatim, minted, tikzpicture, and their starred variants
Display math: \[...\]
Comment lines (starting with %)
\end{document}

Prose regions (reflowed)¶

Body text between structural elements

Markdown (`--format markdown`)¶

Structure regions (preserved)¶

Fenced code blocks (``` or ~~~)
Front matter (--- or +++ delimited at file start)
Heading markers (#, ##, etc.)
List item markers (-, \*, +, 1.)

Prose regions (reflowed)¶

Paragraph text
Heading text (after the marker)
List item text (after the marker)

reStructuredText (`--format rst`)¶

Structure regions (preserved)¶

Directives (.. code-block::, .. math::, .. image::, etc.) and their indented bodies
Literal blocks (text after :: with indented content)
Section titles and underlines (===, -----, etc.)
Field lists (:Author:, :Date:, etc.)
Comments (.. without a directive)
Grid and simple tables (lines starting with | or +)

Prose regions (reflowed)¶

Paragraph text between structural elements

Auto-detection¶

Extensions: .rst, .rest

Plaintext (`--format plaintext`)¶

Everything is prose. Blank lines are preserved as paragraph separators.

Sentence Detection¶

snapper uses Unicode UAX #29 sentence boundary detection as a baseline, then merges false splits caused by known abbreviations:

Titles¶

Mr., Mrs., Ms., Dr., Prof., Sr., Jr., St., Rev., Gen., etc.

Academic¶

Fig., Figs., Eq., Eqs., Ref., Refs., Tab., Sec., Ch., Vol., No., Thm., Lem., Prop., Def., Cor., Rem., Ex.

Latin¶

e.g., i.e., et al., cf., etc., viz., ibid., ca., approx.

Single initials¶

A., B., C., … Z.

Date and time¶

Jan., Feb., …, Dec., Mon., Tue., …, Sun., a.m., p.m.

Supported Formats¶

Org-mode (--format org)¶

Structure regions (preserved)¶

Prose regions (reflowed)¶

Inline tokens (kept atomic)¶

LaTeX (--format latex)¶

Structure regions (preserved)¶

Prose regions (reflowed)¶

Markdown (--format markdown)¶

Structure regions (preserved)¶

Prose regions (reflowed)¶

reStructuredText (--format rst)¶

Structure regions (preserved)¶

Prose regions (reflowed)¶

Auto-detection¶

Plaintext (--format plaintext)¶

Sentence Detection¶

Titles¶

Academic¶

Latin¶

Single initials¶

Date and time¶

Org-mode (`--format org`)¶

LaTeX (`--format latex`)¶

Markdown (`--format markdown`)¶

reStructuredText (`--format rst`)¶

Plaintext (`--format plaintext`)¶