Replacing the two ad hoc Markdown “parsers” used in the last post with the proper, minimal, dependency‑free, CommonMark‑oriented parser described in the bh_cmark post. Since the new parser supports thematic breaks, the PDF rendering has been extended to support horizontal line drawing — a new feature implemented in this post.

🦀 Index of the Complete Series.

164-feature-image.png
Rust: PDFs — Pango and Cairo Layout — Integrating the bh_cmark Parser

🚀 The code for this post is in the following GitHub repository: pdf_10_bh_cmark.

Repository Layout

💡 Please note: on both Windows and Ubuntu, I’m running Rust version rustc 1.90.0 (1159e78c4 2025-09-14).

This is once again a one‑off project — I don’t plan to update it in future development. I want to keep a log of progress exactly as it occurred. Future code may copy this and make changes to it. I’ve placed the project under the pdf_10_bh_cmark directory. The structure is:

/rust/bh_cmark/
│
/rust/polyglot_pdf/pdf_10_bh_cmark/
├── Cargo.toml
├── set_env.bat
├── config
│   └── config.toml
├── img
│   ├── Readme.md
│   └── unscalable.png
├── src
│   ├── config.rs
│   ├── document.rs
│   ├── font_utils.rs
│   ├── image_layout.rs
│   ├── main.rs
│   ├── page_geometry.rs
│   └── text_layout.rs
├── text
│   ├── essay.txt
│   └── img
│       ├── ho-chi-minh-fontainebleau-19460914.png
│       └── ho-chi-minh-marius-moutet-19460914.png
└── .vscode
    └── launch.json

The screenshot below shows the above two directories under /home/behai/rust/:

Please take note of the following about the environment.

⓵ 💡 The code requires the Pango, HarfBuzz, Cairo, etc. libraries. 🐧 On Ubuntu, all required libraries are globally recognised. 🪟 On Windows, I haven’t added the paths for the libraries’ DLLs to the PATH environment variable. In each new Windows terminal session, I run the following once:

set PATH=C:\PF\harfbuzz\dist\bin\;%PATH%
set PATH=C:\PF\vcpkg\installed\x64-windows\bin\;%PATH%
set PATH=C:\PF\pango\dist\bin;C:\PF\cairo-1.18.4\dist\bin;C:\PF\fribidi\dist\bin;%PATH%

Alternatively, you can simply run set_env.bat. After that, cargo run works as expected.

⓶ 💡 In the fifth article, we discussed the PKG_CONFIG_PATH user environment variable. This setting applies to all later articles. I did not mention it again from the sixth article onward. In the set_env.bat above, I include setting this variable so that we don’t forget it and avoid potential surprises.

⓷ The bh_cmark has been introduced in this post. Its usage in the context of this project will be briefly described in this section.

⓸ As before, other modules are copied from the immediate preceding post. src/page_geometry.rs and src/text_layout.rs remain unchanged. Most other modules have been refactored. We discuss these in the sections that follow.

The pdf_10_bh_cmark/text/essay.txt and Associated Image Files pdf_10_bh_cmark/text/img/*.*

They are copied from the last post. The following modifications were made.

⓵ Fixed a title to include a space between the last # and the text.

⓶ Removed a duplicate paragraph (a copy‑and‑paste error).

⓷ Replaced the two occurrences of =================================================== with ____ (4 underscores) and ---- (4 dashes or hyphens).
These are Thematic breaks, which create horizontal lines in the PDF.

⓸ Toward the end of the document, added a new Thematic break using the 4‑asterisk format:

### Tác Giả: Hứa Hoành.
****

🦀 Note that there is no blank line between the two lines. This is to demonstrate how the new block‑spacing configuration works.
This will be shown toward the end of this post.

The bh_cmark Local Crate

The code for the bh_cmark local crate is in the following GitHub repository: bh_cmark. It can be cloned using the command:

$ git clone https://github.com/behai-nguyen/bh_cmark

Based on the project directory structure shown in the Repository Layout section, Cargo.toml imports it as:

[dependencies]
bh_cmark = { path = "../../bh_cmark" }
...

Adjust the path accordingly for your directory structure.

Refactored Modules

We discuss the refactored modules in the sections that follow.

The pdf_10_bh_cmark/src/document.rs Module

It was first introduced in the eighth post, and has been in use ever since. In this iteration, it has also been refactored quite significantly.

● Related Markdown Abstract Syntax Tree (AST) entities were moved to the bh_cmark/src/ast.rs module:

● Other refactorings include:

Configuration Updates — the Configuration pdf_10_bh_cmark/config/config.toml File and the pdf_10_bh_cmark/src/config.rs Module

There were a few additions and one removal, but overall the changes were straightforward.

The pdf_10_bh_cmark/config/config.toml File

● Added the sections [block_spacing.heading], [block_spacing.paragraph], [block_spacing.image], [block_spacing.thematic], and [horizontal_break].

The [horizontal_break] section enables external configuration of the horizontal line’s width and colour.

The [block_spacing.*] sections enable external configuration of the vertical spacing above and below block elements. In previous iterations, the ad hoc “parsers” preserved newlines in the final text, and those newlines naturally became blank lines in the PDF. However, CommonMark does not retain newlines.
Hence the introduction of [block_spacing.*] — a more flexible and precise approach that provides far greater control than relying on raw newlines.

[block_spacing.image].after renders [layout].image_block_spacing obsolete.
The entire [layout] section has been removed.

The pdf_10_bh_cmark/src/config.rs Module:

Changes in this module correspond directly to the updates in config.toml described above.
The modifications are straightforward and should be self‑explanatory.

The pdf_10_bh_cmark/src/image_layout.rs Module

This module was first introduced in the eleventh post, and was formalised into APIs in the twelfth post.

In the twelfth post, the vertical space after an image block was configurable via [layout].image_block_spacing. This is now superseded by
[block_spacing.image].after, as described previously.
The underlying vertical‑spacing calculation remains unchanged.

🦀 A new [block_spacing.image].before setting has also been added.
The scaling logic and vertical‑space calculations were updated to incorporate this value.
This was a minor refactoring, and the changes are documented throughout the code.
We will not discuss it in detail here.

The unit test mod has also been updated to reflect the changes in the implementation.

The pdf_10_bh_cmark/src/font_utils.rs Module

There is only a single usage‑import update. As discussed previously, AST‑related entities were moved to the bh_cmark library. The only import change required here was:
use bh_cmark::ast::{Span, SpanStyle};.

The pdf_10_bh_cmark/src/main.rs Module

This module brings everything together, just as in previous iterations, and is therefore the most complicated one. Let’s discuss the refactorings carried out on it.

PreparedBlock enum — at the definition level, its variant refactoring mirrors PositionedBlock.
The pipeline first creates a vector of PreparedBlock, and from this vector constructs a vector of PositionedBlock, whose blocks are then rendered onto the PDF.

⓶ The next refactored item is the parse_blocks_from_file() function — it is now much simpler: it reads the entire input Markdown text file into a string and passes this string to the bh_cmark parser. Its API remains essentially unchanged.

⓷ The next function that has been extended is prepare_blocks() — it now includes the newly added AstBlock::Thematic block element.
For this variant, the total height of the horizontal line is the sum of the stroke width and the configured preceding and succeeding vertical spacing.

⓸ Three new helper functions have been added:

  1. header()
  2. paragraph()
  3. text_block() — this function is refactored from the PreparedBlock::Text logic in the measure_block() function from the last post.

⓹ The next function in the pipeline is measure_block() — the changes are: first, block‑height calculations now include the [block_spacing.*] configuration values as described; second, the function has been extended to handle the new block variants.

● For the PreparedBlock::Header and PreparedBlock::Paragraph variants, the function reads the corresponding block‑spacing configuration values and passes them to the new text_block() helper function to perform the calculations. 🦀 Since both Header and Paragraph may span multiple lines, [block_spacing.*].before applies to the first line, and [block_spacing.*].after applies to the last line. This logic is easy to follow in the text_block() function.

● The PreparedBlock::Image variant remains unchanged, since all calculations are performed by the pdf_10_bh_cmark/src/image_layout.rs module, as previously discussed.

● The calculation for the new PreparedBlock::Thematic variant is self‑explanatory.

⓺ The final function in the pipeline is output_positioned_block() — it follows the same refactoring pattern.

● At this point, both the Header and Paragraph variants are effectively identical: they are simply blocks of text. The text() closure is refactored from the PositionedBlock::Text and PreparedBlock::Text logic in the output_positioned_block() function from the last post.
Header and Paragraph simply call this closure with identical parameters to render text using the pre‑computed geometry.

● For the new Thematic arm — calls are made to Cairo methods to draw the horizontal line, whose properties are defined by the external configuration attributes.

Since the colour of the horizontal line may differ from the rest of the PDF document, the current drawing context must be saved, the colour applied, and then the context restored for subsequent blocks. This process was first discussed in the tenth post.

● The existing Image arm remains unchanged.

With the removal of a duplicate paragraph from the input file, and the introduction of the [block_spacing.*] configuration, the total number of pages is now 35 (thirty‑five), one page fewer than in the previous iteration.

As previously mentioned, the screenshot below:

demonstrates the following [block_spacing.thematic] setting:

[block_spacing.thematic]
before = 10.0
after = 10.0

And this next screenshot:

shows the same page with the following [block_spacing.thematic] setting:

[block_spacing.thematic]
before = 10.0
after = 50.0

What’s Next

That wraps up this article… Aside from the cosmetic Thematic break, there is no major feature added compared to the last iteration. Lists and URLs are still not available yet. Were the last two ad hoc “parsers” a waste of time?
I tend not to think so — they were part of the learning curve. This proper Markdown parser is solid progress for the project. It is now ready to be extended to support lists and URLs, and I am looking forward to that…

Thanks for reading! I hope this post helps others who are looking to deepen their understanding of PDF technology. As always—stay curious, stay safe 🦊

✿✿✿

Feature image sources:

🦀 Index of the Complete Series.