Rust: PDFs — Pango and Cairo Layout — Integrating the bh_cmark Parser
Replacing the two ad hoc Markdown “parsers” used in the last post with the proper, minimal, dependency‑free, CommonMark‑oriented parser described in the bh_cmark post. Since the new parser supports thematic breaks, the PDF rendering has been extended to support horizontal line drawing — a new feature implemented in this post.
🦀 Index of the Complete Series.
![]() |
|---|
Rust: PDFs — Pango and Cairo Layout — Integrating the bh_cmark Parser |
🚀 The code for this post is in the following GitHub repository: pdf_10_bh_cmark.
💡 Please note: on both Windows and Ubuntu, I’m running Rust version
rustc 1.90.0 (1159e78c4 2025-09-14).
This is once again a one‑off project — I don’t plan to update it in future development. I want to keep a log of progress exactly as it occurred. Future code may copy this and make changes to it. I’ve placed the project under the pdf_10_bh_cmark directory. The structure is:
/rust/bh_cmark/
│
/rust/polyglot_pdf/pdf_10_bh_cmark/
├── Cargo.toml
├── set_env.bat
├── config
│ └── config.toml
├── img
│ ├── Readme.md
│ └── unscalable.png
├── src
│ ├── config.rs
│ ├── document.rs
│ ├── font_utils.rs
│ ├── image_layout.rs
│ ├── main.rs
│ ├── page_geometry.rs
│ └── text_layout.rs
├── text
│ ├── essay.txt
│ └── img
│ ├── ho-chi-minh-fontainebleau-19460914.png
│ └── ho-chi-minh-marius-moutet-19460914.png
└── .vscode
└── launch.json
The screenshot below shows the above two directories under /home/behai/rust/:

Please take note of the following about the environment.
⓵ 💡 The code requires the Pango, HarfBuzz, Cairo,
etc. libraries. 🐧 On Ubuntu, all required libraries are globally recognised. 🪟 On Windows,
I haven’t added the paths for the libraries’ DLLs to the PATH environment
variable. In each new Windows terminal session, I run the following once:
set PATH=C:\PF\harfbuzz\dist\bin\;%PATH%
set PATH=C:\PF\vcpkg\installed\x64-windows\bin\;%PATH%
set PATH=C:\PF\pango\dist\bin;C:\PF\cairo-1.18.4\dist\bin;C:\PF\fribidi\dist\bin;%PATH%
Alternatively, you can simply run
set_env.bat.
After that, cargo run works as expected.
⓶ 💡 In the fifth article, we discussed the
PKG_CONFIG_PATH user environment variable. This setting applies to all
later articles. I did not mention it again from the sixth article onward. In the
set_env.bat above, I include setting this variable so that we don’t forget
it and avoid potential surprises.
⓷ The bh_cmark has been introduced in
this post. Its usage in the context of this project
will be briefly described in this section.
⓸ As before, other modules are copied from the immediate
preceding post. src/page_geometry.rs
and src/text_layout.rs remain unchanged.
Most other modules have been refactored. We discuss these in the
sections that follow.
❷ The
pdf_10_bh_cmark/text/essay.txt and Associated Image Files
pdf_10_bh_cmark/text/img/*.*
They are copied from the last post. The following modifications were made.
⓵ Fixed a title to include a space between the last # and the text.
⓶ Removed a duplicate paragraph (a copy‑and‑paste error).
⓷ Replaced the two occurrences of
===================================================
with ____ (4 underscores) and ---- (4 dashes or hyphens).
These are Thematic breaks, which create horizontal lines in the PDF.
⓸ Toward the end of the document, added a new Thematic break using
the 4‑asterisk format:
### Tác Giả: Hứa Hoành.
****
🦀 Note that there is no blank line between the two lines. This is to
demonstrate how the new block‑spacing configuration works.
This will be shown toward the end of this post.
The code for the bh_cmark local crate is in the following GitHub repository:
bh_cmark.
It can be cloned using the command:
$ git clone https://github.com/behai-nguyen/bh_cmark
Based on the project directory structure shown in the
Repository Layout section,
Cargo.toml imports it as:
[dependencies]
bh_cmark = { path = "../../bh_cmark" }
...
Adjust the path accordingly for your directory structure.
We discuss the refactored modules in the sections that follow.
⓵ The
pdf_10_bh_cmark/src/document.rs Module
It was first introduced in the eighth post, and has been in use ever since. In this iteration, it has also been refactored quite significantly.
● Related Markdown Abstract Syntax Tree (AST) entities were moved to the
bh_cmark/src/ast.rs module:
-
pub const MAX_HEADER_LEVEL: usize = 6 -
pub enum SpanStyle -
pub struct Span -
pub enum Blockwas renamed topub enum AstBlock. 🦀 Please also note the following additions:-
The payload for
Header,Paragraph, andImagewas refactored to use the newpub struct InlineContentinstead ofStringandVec<Span>. -
Added
ThematicandCodeenum variants.
-
The payload for
● Other refactorings include:
-
In
ImageBlockMeasuredInfo,spacing_after: f64replacedimage_block_spacing: f64due to a configuration refactoring. -
For
PositionedBlock, the following changes were made:-
Added the
PositionedBlock::Headervariant and its creation functionPositionedBlock::header(). The PDF rendering code needs to recognise header blocks to apply specific attributes — see configuration block spacing. -
Added the
PositionedBlock::Thematicvariant and its creation functionPositionedBlock::thematic(). The PDF rendering code needs to recognise thematic breaks in order to draw horizontal lines. -
PositionedBlock::Textwas renamed toPositionedBlock::Paragraph, and the associated creation functionPositionedBlock::text()was renamed toPositionedBlock::paragraph(). -
Both
PositionedBlock::block_index()andPositionedBlock::page()were refactored to account for the new variants.
-
Added the
⓶ Configuration Updates — the Configuration
pdf_10_bh_cmark/config/config.toml File and the
pdf_10_bh_cmark/src/config.rs Module
There were a few additions and one removal, but overall the changes were straightforward.
⒈ The
pdf_10_bh_cmark/config/config.toml File
● Added the sections [block_spacing.heading],
[block_spacing.paragraph], [block_spacing.image],
[block_spacing.thematic], and [horizontal_break].
The [horizontal_break] section enables external configuration of the
horizontal line’s width and colour.
The [block_spacing.*] sections enable external configuration of the
vertical spacing above and below block elements. In previous iterations, the ad hoc
“parsers” preserved newlines in the final text, and those newlines naturally became
blank lines in the PDF. However,
CommonMark
does not retain newlines.
Hence the introduction of [block_spacing.*] — a more flexible and precise
approach that provides far greater control than relying on raw newlines.
● [block_spacing.image].after renders
[layout].image_block_spacing obsolete.
The entire [layout] section has been removed.
⒉ The
pdf_10_bh_cmark/src/config.rs Module:
Changes in this module correspond directly to the updates in
config.toml described above.
The modifications are straightforward and should be self‑explanatory.
⓷ The
pdf_10_bh_cmark/src/image_layout.rs Module
This module was first introduced in the eleventh post, and was formalised into APIs in the twelfth post.
In the twelfth post, the vertical space after an image block was configurable via
[layout].image_block_spacing. This is now superseded by
[block_spacing.image].after, as described
previously.
The underlying vertical‑spacing calculation remains unchanged.
🦀 A new [block_spacing.image].before setting has also been added.
The scaling logic and vertical‑space calculations were updated to incorporate this value.
This was a minor refactoring, and the changes are documented throughout the code.
We will not discuss it in detail here.
The unit test mod has also been updated to reflect the changes in the
implementation.
⓸ The
pdf_10_bh_cmark/src/font_utils.rs Module
There is only a single usage‑import update. As discussed
previously, AST‑related entities were moved
to the bh_cmark
library. The only import change required here was:
use bh_cmark::ast::{Span, SpanStyle};.
❺ The
pdf_10_bh_cmark/src/main.rs Module
This module brings everything together, just as in previous iterations, and is therefore the most complicated one. Let’s discuss the refactorings carried out on it.
⓵
PreparedBlock enum — at the definition level, its variant
refactoring mirrors PositionedBlock.
The pipeline first creates a vector of PreparedBlock, and from this vector
constructs a vector of PositionedBlock, whose blocks are then rendered
onto the PDF.
⓶ The next refactored item is the
parse_blocks_from_file() function — it is now much simpler: it reads
the entire input Markdown text file into a string and passes this string to the
bh_cmark
parser. Its API remains essentially unchanged.
⓷ The next function that has been extended is
prepare_blocks() — it now includes the newly added
AstBlock::Thematic block element.
For this variant, the total height of the horizontal line is the sum of the stroke
width and the configured preceding and succeeding vertical spacing.
⓸ Three new helper functions have been added:
-
header() -
paragraph() -
text_block()— this function is refactored from thePreparedBlock::Textlogic in themeasure_block()function from the last post.
⓹ The next function in the pipeline is
measure_block() — the changes are: first, block‑height
calculations now include the [block_spacing.*] configuration values as
described; second, the function has been extended to handle
the new block variants.
● For the PreparedBlock::Header and PreparedBlock::Paragraph
variants, the function reads the corresponding block‑spacing configuration values and
passes them to the new
text_block() helper function to perform
the calculations. 🦀 Since both Header and Paragraph may span
multiple lines, [block_spacing.*].before applies to the first line, and
[block_spacing.*].after applies to the last line. This logic is easy to
follow in the
text_block() function.
● The PreparedBlock::Image variant remains unchanged, since all calculations
are performed by the pdf_10_bh_cmark/src/image_layout.rs module, as
previously discussed.
● The calculation for the new PreparedBlock::Thematic variant is
self‑explanatory.
⓺ The final function in the pipeline is
output_positioned_block() — it follows the same refactoring pattern.
● At this point, both the Header and Paragraph variants
are effectively identical: they are simply blocks of text. The
text() closure is refactored from the
PositionedBlock::Text and PreparedBlock::Text logic in the
output_positioned_block() function from the
last post.
Header and Paragraph simply call this closure with identical
parameters to render text using the pre‑computed geometry.
● For the new
Thematic arm — calls are made to
Cairo methods to draw the horizontal line, whose properties are defined
by the external configuration attributes.
Since the colour of the horizontal line may differ from the rest of the PDF document, the current drawing context must be saved, the colour applied, and then the context restored for subsequent blocks. This process was first discussed in the tenth post.
● The existing Image arm remains unchanged.
With the removal of a duplicate paragraph from the input
file, and the introduction of the [block_spacing.*]
configuration, the total number of pages is now 35 (thirty‑five), one page fewer than in
the previous iteration.
As previously mentioned, the screenshot below:

demonstrates the following [block_spacing.thematic] setting:
[block_spacing.thematic]
before = 10.0
after = 10.0
And this next screenshot:

shows the same page with the following [block_spacing.thematic] setting:
[block_spacing.thematic]
before = 10.0
after = 50.0
That wraps up this article… Aside from the cosmetic Thematic break,
there is no major feature added compared to the last iteration. Lists and URLs
are still not available yet. Were the last two ad hoc “parsers” a waste of time?
I tend not to think so — they were part of the learning curve. This proper Markdown
parser is solid progress for the project. It is now ready to be extended to support
lists and URLs, and I am looking forward to that…
Thanks for reading! I hope this post helps others who are looking to deepen their understanding of PDF technology. As always—stay curious, stay safe 🦊
✿✿✿
Feature image sources:
- https://www.omgubuntu.co.uk/2024/03/ubuntu-24-04-wallpaper
- https://in.pinterest.com/pin/337277459600111737/
- https://medium.com/analytics-vidhya/rust-adventures-from-java-class-to-rust-struct-1d63b66890cf/
- https://www.pngitem.com/download/ibmJoR_rust-language-hd-png-download/
- https://en.wikipedia.org/wiki/Cairo_%28graphics%29#/media/File:Cairo_banner_closeup.svg
- https://ur.wikipedia.org/wiki/%D9%81%D8%A7%D8%A6%D9%84:HarfBuzz.svg
- https://en.wikipedia.org/wiki/Pango
- https://commons.wikimedia.org/wiki/File:Markdown-mark.svg
