Obsługiwane formaty¶

snapper classifies text into prose regions (reflowed at sentence boundaries), structure regions (passed through unchanged), and code regions (fenced or delimited source blocks). Code regions keep fence/open/close lines as structure; the body reflows comment lines when [code.<lang>] is configured, and may run an external formatter when --format-code is set. The classification depends on the format.

Org-mode (`--format org`)¶

Regiony strukturalne (zachowane)¶

Non-source #+BEGIN_* …

#+END_* blocks (example, quote, etc.)

:PROPERTIES: …

szuflady :END:

dyrektywy #+KEYWORD: (TITLE, AUTHOR, DATE, OPTIONS itp.)
Wiersze tabeli (linie zaczynające się od |)
Linie komentarzy (zaczynające się od #, ale nie #+)
Full headline lines (stars, optional TODO keyword, and title text)
Znaczniki elementów listy (-, +, 1.)
LaTeX environments (\begin{equation} … \end{equation}, \begin{align}, etc.)
Display math (\[ … \])
Inline export snippets (@@latex:\newpage@@, @@html:<br>@@)

Code regions (`#+BEGIN_SRC` … `#+END_SRC`)¶

Header and footer lines are structure
Body non-comment lines pass through verbatim
Body comment lines reflow at sentence boundaries when the language has line_comment and/or block_comment under [code.<lang>]
With --format-code, optional formatter argv runs on the body (graceful fallback on failure)

Regiony prozy (przeformatowywane)¶

Tekst akapitowy
Tekst elementu listy (po znaczniku)

Tokeny inline (zachowane jako niepodzielne)¶

Te tokeny wewnątrz prozy nie są dzielone między liniami:

Linki: [[url][opis]]
Emphasis: \*bold*, /italic/, _underline_, +strike+
Kod inline: ~code~, ==verbatim==
Inline export snippets: @@backend:value@@
URLs: https://... (trailing sentence punctuation not swallowed)

LaTeX (`--format latex`)¶

Regiony strukturalne (zachowane)¶

Preambuła (wszystko przed \begin{document})
Non-prose environments: equation, align, figure, table, tabular, tikzpicture, and their starred variants (plus other non-code envs)
Wzory wyświetlane: \[...\]
Linie komentarzy (zaczynające się od %)
\end{document}
Full sectioning command lines (\section{...}, \subsection{...}, and friends, including title text)

Code regions (`minted`, `lstlisting`, `verbatim`)¶

\\begin{...} / \\end{...} lines are structure
Body follows the same comment-reflow and optional --format-code rules as other formats when language is known (minted language arg, lstlisting language= option)

Regiony prozy (przeformatowywane)¶

Tekst główny pomiędzy elementami strukturalnymi

Markdown (`--format markdown`)¶

Regiony strukturalne (zachowane)¶

Front matter (--- lub +++ na początku pliku)
Full ATX heading lines (# … ###### including title text)
Setext headings (title line plus ==== or —— underline)
Znaczniki elementów listy (-, \*, +, 1.)
Pipe tables

Code regions (fenced ``` / `~~~`)¶

Opening and closing fence lines are structure
Indented fence bodies preserve indentation on reflowed comment lines
Language from the fence info string selects [code.<lang>]; unknown or missing lang passes the body through unchanged (unless --format-code is not applicable without a formatter entry)

Regiony prozy (przeformatowywane)¶

Tekst akapitowy
Tekst elementu listy (po znaczniku)

reStructuredText (`--format rst`)¶

Regiony strukturalne (zachowane)¶

Non-code directives (.. math::, .. image::, etc.) and their indented bodies
Literal blocks (text after :: with indented content)
Section titles and underlines (===, -----, etc.)
Field lists (:Author:, :Date:, etc.)
Comments (.. without a directive)
Grid and simple tables (lines starting with | or +)

Code regions (`.. code-block:: LANG`)¶

Directive line and trailing blank handling stay structure
Indented body uses the language token for [code.<lang>] comment reflow and optional --format-code

Regiony prozy (przeformatowywane)¶

Paragraph text between structural elements

Auto-detection¶

Extensions: .rst, .rest

Zwykły tekst (`--format plaintext`)¶

Całość traktowana jako proza. Puste linie są zachowane jako separatory akapitów.

Wykrywanie zdań¶

snapper uses Unicode UAX #29 sentence boundary detection as a baseline (or optional --neural / nnsplit), then applies the same post-pipeline: abbreviation merges, then delimiter-span rejoin so dialogue and balanced ()[]{} spans are not fractured.

Delimiter-span policy (residual cases)¶

Balanced ASCII/curly/guillemet quotes, LaTeX ``…'', and ()[]{} must not gain a semantic line break mid-span (see tests/sentence_delim_props.rs).
Unclosed " (or open ‘ without ’) glues the rest of the paragraph; snapper does not invent closers.
Nested semantic ASCII quotes (say "hi" now with an inner pair) remain toggle-ambiguous; prefer typographic quotes or escapes in source.
Apostrophes in contractions (don't, it's) are not treated as dialogue openers.
Markdown fences (```) are not treated as LaTeX `` openers.
--neural runs the same abbreviation + span post-pipeline after the model proposes cuts (English papers can still prefer the rules path for fully offline, deterministic CI).

snapper merges false splits caused by known abbreviations:

Tytuły¶

Mr., Mrs., Ms., Dr., Prof., Sr., Jr., St., Rev., Gen. itp.

Naukowe¶

Fig., Figs., Eq., Eqs., Ref., Refs., Tab., Sec., Ch., Vol., No., Thm., Lem., Prop., Def., Cor., Rem., Ex.

Łacińskie¶

e.g., i.e., et al., cf., etc., viz., ibid., ca., approx.

Pojedyncze inicjały¶

A., B., C., … Z.

Data i czas¶

Jan., Feb., …, Dec., Mon., Tue., …, Sun., a.m., p.m.

Quoted and parenthesized punctuation¶

Sentence punctuation inside quotes or parentheses does not trigger a false split when the next word starts lowercase. For example, He said "wow!" and left. stays on one line because "!" followed by lowercase and signals a continuation, not a new sentence. Patterns handled: !", ?", .", !), ?), .), and similar combinations with single quotes or brackets.

Obsługiwane formaty¶

Org-mode (--format org)¶

Regiony strukturalne (zachowane)¶

Code regions (#+BEGIN_SRC … #+END_SRC)¶

Regiony prozy (przeformatowywane)¶

Tokeny inline (zachowane jako niepodzielne)¶

LaTeX (--format latex)¶

Regiony strukturalne (zachowane)¶

Code regions (minted, lstlisting, verbatim)¶

Regiony prozy (przeformatowywane)¶

Markdown (--format markdown)¶

Regiony strukturalne (zachowane)¶

Code regions (fenced ``` / ~~~)¶

Regiony prozy (przeformatowywane)¶

reStructuredText (--format rst)¶

Regiony strukturalne (zachowane)¶

Code regions (.. code-block:: LANG)¶

Regiony prozy (przeformatowywane)¶

Auto-detection¶

Zwykły tekst (--format plaintext)¶

Wykrywanie zdań¶

Delimiter-span policy (residual cases)¶

Tytuły¶

Naukowe¶

Łacińskie¶

Pojedyncze inicjały¶

Data i czas¶

Quoted and parenthesized punctuation¶

Org-mode (`--format org`)¶

Code regions (`#+BEGIN_SRC` … `#+END_SRC`)¶

LaTeX (`--format latex`)¶

Code regions (`minted`, `lstlisting`, `verbatim`)¶

Markdown (`--format markdown`)¶

Code regions (fenced ``` / `~~~`)¶

reStructuredText (`--format rst`)¶

Code regions (`.. code-block:: LANG`)¶

Zwykły tekst (`--format plaintext`)¶