Rust: PDFs — Pango and Cairo Layout — Supporting Image Blocks
This article focuses on enabling the Markdown parser we last discussed to support image blocks, where images are specified using relative paths. Only the basic  image‑block syntax is supported.
🦀 Index of the Complete Series.
![]() |
|---|
| Rust: PDFs — Pango and Cairo Layout — Supporting Image Blocks |
🚀 The code for this post is in the following GitHub repository: pdf_09_image_block.
💡 Please note: on both Windows and Ubuntu, I’m running Rust version
rustc 1.90.0 (1159e78c4 2025-09-14).
This is once again a one‑off project — I don’t plan to update it in future development. I want to keep a log of progress exactly as it occurred. Future code may copy this and make changes to it. I’ve placed the project under the pdf_09_image_block directory. The structure is:
.
├── Cargo.toml
├── set_env.bat
├── config
│ └── config.toml
├── img
│ ├── Readme.md
│ └── unscalable.png
├── src
│ ├── config.rs
│ ├── document.rs
│ ├── font_utils.rs
│ ├── image_block_parser.rs
│ ├── image_layout.rs
│ ├── inline_parser.rs
│ ├── main.rs
│ ├── page_geometry.rs
│ └── text_layout.rs
├── text
│ ├── essay.txt
│ └── img
│ ├── ho-chi-minh-fontainebleau-19460914.png
│ └── ho-chi-minh-marius-moutet-19460914.png
└── .vscode
└── launch.json
We describe some entries in the following subsections. The rest will be covered in the sections that follow.
⓵ The src/page_geometry.rs module is copied unchanged from
the eleventh article. 👉 Changing any margin value in the
A4_DEFAULT_MARGINS constant will change the layout of the text in the PDF.
This module has remained unmodified for several articles.
⓶ The src/font_utils.rs, src/text_layout.rs, and
src/config.rs modules are also copied unchanged from
the eleventh article.
⓷ The src/config.rs module and its associated
config/config.toml file are also copied unchanged from
the eleventh article. They were first introduced in
the eighth article, with
significant additions made in the eleventh.
⓸ The src/inline_parser.rs module is copied unchanged from
the ninth article, where it was first introduced.
⓹ 💡 The code requires the Pango, HarfBuzz, Cairo,
etc. libraries. 🐧 On Ubuntu, all required libraries are globally recognised. 🪟 On Windows,
I haven’t added the paths for the libraries’ DLLs to the PATH environment
variable. In each new Windows terminal session, I run the following once:
set PATH=C:\PF\harfbuzz\dist\bin\;%PATH%
set PATH=C:\PF\vcpkg\installed\x64-windows\bin\;%PATH%
set PATH=C:\PF\pango\dist\bin;C:\PF\cairo-1.18.4\dist\bin;C:\PF\fribidi\dist\bin;%PATH%
Alternatively, you can simply run
set_env.bat.
After that, cargo run works as expected.
⓺ 💡 In the fifth article, we discussed the
PKG_CONFIG_PATH user environment variable. This setting applies to all
later articles. I did not mention it again from the sixth article onward. In the
set_env.bat above, I include setting this variable so that we don’t forget
it and avoid potential surprises.
❷ The
pdf_09_image_block/src/document.rs Module
This module is copied from the eleventh article; it was first introduced in the eighth. This iteration makes several significant refactorings necessary to support image blocks:
⓵
Block enum — it has been extended to support image blocks. At
this stage, it should be self‑explanatory; we will encounter it again in a
later section.
⓶
ImageBlockLayoutInfo struct — this struct was
first introduced in the eleventh article’s
src/image_layout.rs module. Its implementation has been extended in
this article.
⓷
ImageBlockMeasuredInfo struct — this is a new struct,
whose usage will be discussed in a later section.
⓸
PositionedBlock enum — the original struct has been
refactored into an enum to support image blocks. At this stage, it should be
self‑explanatory; we will encounter it again in a
later section.
❸ The
pdf_09_image_block/src/image_layout.rs Module
In the last — eleventh — article, we discussed the
image block layout algorithm, and the
layout_image_block() function implements that algorithm.
This function — by itself — is not suitable for integration into the PDF generation
pipeline. It has therefore been refactored into two functions:
⓵
measure_image_block() — this function effectively implements the
algorithm, but no longer renders images or captions. Instead, on
success, it returns the information required for pagination and later rendering of the
image block — the ImageBlockMeasuredInfo struct that was
previously mentioned.
It is a simplified version of layout_image_block(), and we will not cover it
in detail.
🦀 The tests have been rewritten accordingly for this function. The image
img/unscalable.png is not strictly required for the tests; it is included
in the codebase for completeness.
⓶
render_image_block() — this is also a refactored and simplified
version of layout_image_block(). It is responsible solely for rendering the
image block. It has no intelligence: it simply renders the loaded PNG based on the
information it is given. There is no test for this method.
❹ The
pdf_09_image_block/src/image_block_parser.rs Module
This module is the parser for image‑block text. As mentioned in the introduction,
only the basic syntax  is supported.
🦀 The API is the
parse_image_block() function. The comprehensive test suite
illustrates how to call this API and demonstrates invalid edge cases.
The code itself is less than 130 lines and is fully documented. Let’s reprint the assumptions and limitations from the module documentation.
-
The image‑block text must be at least
. - The image‑block text represents only a single image block.
-
Captions containing
]or[are not supported. For example,![A caption with \]](path)will be treated as invalid. -
Paths containing
)are not supported. For example,.png)will result in an invalid path, because the parser stops at the first). - Multiple image blocks on one line are not supported. The parser handles only one image block per line.
❺ The
pdf_09_image_block/text/essay.txt and Associated Image Files
pdf_09_image_block/text/img/*.*
The pdf_09_image_block/text/essay.txt file is copied from
the ninth article. Two modifications were made:
-
Removed an extra marker
**, which would otherwise appear as a literal string in the final PDF. -
Added two image‑block texts, one for each of the images under
text/img/.
❻ The
pdf_09_image_block/src/main.rs Module
🙏 Before proceeding with this module, it is beneficial to review the
discussion on the
pdf_09_image_block/src/document.rs.
This module brings everything together, just as in previous iterations, and is therefore
the most complicated one. It is originally copied from the
ninth article, specifically from the
pdf_06_text_styling/src/main.rs module.
Let’s discuss the refactorings carried out on it.
⓵
PreparedBlock enum — the original struct has been
refactored into an enum to support image blocks.
⓶ The next function requiring refactoring is
parse_blocks_from_file() — this function reads the text file and
turns each line into its corresponding
Block enum representation.
It now also detects image‑block text via the new helper
detect_image_block_text() function, then invokes the
parse_image_block() API to parse the
text. If the syntax is invalid, the image‑block text is treated as a normal paragraph.
Otherwise, a Block::Image is created.
In either case, a Block enum representation of the line is produced and
stored in the returned vector.
⓷ The new, more unified function
create_layout_for_block() replaces the previous
prepare_layout_text() function.
⓸ The next function that has been extended is
prepare_blocks() — it converts each
Block enum into its corresponding
PreparedBlock enum variant.
🙏 In the new PreparedBlock::Image arm, it prepares and caches both the
ready‑to‑render image caption and the image PNG stream as a
pango::Layout and a
cairo::ImageSurface,
respectively. 🦀 The cairo::ImageSurface provides both the PNG’s physical width and
height — values required for step‑scaling.
⓹ The next function in the pipeline is
measure_block() — in the new PreparedBlock::Image
arm, it first calls
measure_image_block()
to perform step‑scaling and collect pagination, positioning, and measurement
information. Using the data returned from measure_image_block(), it then
creates and stores a
PositionedBlock::Image
in the returned vector. Finally, it updates the positioning and pagination for the
next block, mirroring the existing PositionedBlock::Text arm.
⓺ The final function in the chain is
output_positioned_block() — following the same pattern, a new
PositionedBlock::Image arm
was added to render in‑memory image blocks. This new arm is very simple: it gathers all
cached information and passes it to
render_image_block() to render the
image and its caption.
With the addition of the two image blocks, the total number of pages is now 36 (thirty‑six), an increase of 1 (one) page. The screenshots below show the two pages containing image blocks:
We are using exactly the same configuration on both Ubuntu 🐧 and Windows 10 🪟, resulting in identical final PDF output on both platforms.
That is all for this article… I am glad I decided to write the image‑block parser rather than rely on regular expressions — I genuinely enjoyed implementing it. There are still several features I would like to support. For example, lists — both bullet points and numbered — which I understand can be a challenging feature to implement. Another example is correctly recognising and rendering URLs so that they become clickable links within the PDF.
At the moment, I am not sure which one I will tackle next…
Thanks for reading! I hope this post helps others who are looking to deepen their understanding of PDF technology. As always—stay curious, stay safe 🦊
✿✿✿
Feature image sources:
- https://www.omgubuntu.co.uk/2024/03/ubuntu-24-04-wallpaper
- https://in.pinterest.com/pin/337277459600111737/
- https://medium.com/analytics-vidhya/rust-adventures-from-java-class-to-rust-struct-1d63b66890cf/
- https://www.pngitem.com/download/ibmJoR_rust-language-hd-png-download/
- https://en.wikipedia.org/wiki/Cairo_%28graphics%29#/media/File:Cairo_banner_closeup.svg
- https://ur.wikipedia.org/wiki/%D9%81%D8%A7%D8%A6%D9%84:HarfBuzz.svg
- https://en.wikipedia.org/wiki/Pango
