A Minimal, Dependency‑Free, CommonMark‑Oriented Parser in Rust
A brief introduction to bh_cmark — a minimal, dependency‑free, CommonMark‑oriented parser that I have implemented in Rust.
![]() |
|---|
| A Minimal, Dependency‑Free, CommonMark‑Oriented Parser in Rust |
🚀 The code for this post is in the following GitHub repository: bh_cmark.
In the ongoing
polyglot_pdf project,
there are two ad hoc Markdown “parsers” that support emphasis, headers, and a
simple image‑block construct .
The polyglot_pdf project relies primarily on the
Pango and
Cairo libraries, and indirectly on the
HarfBuzz text‑shaping engine.
As with the previous ad hoc Markdown “parsers,” the output of this
bh_cmark library is intended to be consumed by the text‑styling and
rendering APIs provided by Pango and Cairo.
The features supported by this parser are still limited to emphasis, headers, and image blocks. In the next iteration, URLs and possibly lists will be added, depending on how complex list handling becomes.
To generate HTML from the parser output, see the
tests/test_commonmark_spec.rs module.
In this iteration, HTML rendering was not one of the essential goals.
Parser behaviour — that is, the output — is validated against the CommonMark reference
implementation at
https://spec.commonmark.org/dingus/.
The 146 Markdown samples defined in the
tests/common/test_constant.rs module were fed through
Dingus to verify the results of the corresponding parser tests.
💡 I initially validated results against https://markdownlivepreview.com/. I managed to match most of the tests, but for certain pathological cases the outputs diverged — and fixing one case would break another.
Consider a test case that required significant effort to match:
***Mount Fuji* / *富士山, ふじさ, Fujisan* / *Núi Phú Sỹ***
Markdown Live Preview produces:
<em><strong>Mount Fuji* / <em>富士山, ふじさ, Fujisan</em> / *Núi Phú Sỹ</strong></em>
While Dingus renders it as:
<strong><em>Mount Fuji</em> / <em>富士山, ふじさ, Fujisan</em> / <em>Núi Phú Sỹ</em></strong>
Whereas for this test case:
*****Hello*world****
Markdown Live Preview’s result:
*<em><strong><em>Hello</em>world</strong></em>*
Dingus’s result:
**<em><strong>Hello<em>world</em></strong></em>
I cannot explain Markdown Live Preview’s result. Eventually, I realised that
Markdown Live Preview does not strictly follow the
CommonMark specification.
I removed all heuristic code and adhered strictly to CommonMark’s deterministic rules.
It was a significant learning curve.
There are 9 (nine) test modules. See the test suite area for the recommended order of reading or running the tests. Most of the modules are fairly long, but this is mainly due to the setup of test data. The actual test code is short and concise.
We are not going to discuss the code in detail here; the inline documentation is sufficient to understand the implementation. The Test Suite also demonstrates how the parser is used. The provided example shows concisely how to instantiate the scanner and parser, and how to interpret the parser output. You can extend this example to process Markdown files on disk.
To build the parser example:
cargo build --example example_01
To run the parser example:
cargo run --example example_01
I have already replaced the two previously mentioned
ad hoc “parsers” with this bh_cmark library.
It works great, and the code changes were minimal. I will describe this in the next
polyglot_pdf article.
For now, we conclude this post here… If you are interested in learning about Markdown parsing, I hope you find this post helpful and encouraging. It took me nearly three months to complete these functionalities. I had some failures along the way, but I learned a great deal as well.
Thanks for reading! And as always—stay curious, stay safe 🦊
✿✿✿
Feature image sources:
- https://www.omgubuntu.co.uk/2024/03/ubuntu-24-04-wallpaper
- https://in.pinterest.com/pin/337277459600111737/
- https://medium.com/analytics-vidhya/rust-adventures-from-java-class-to-rust-struct-1d63b66890cf/
- https://www.pngitem.com/download/ibmJoR_rust-language-hd-png-download/
- https://commons.wikimedia.org/wiki/File:Markdown-mark.svg
