Rust: PDFs — Pango and Cairo Layout — Supporting Bold, Italic, and Bold Italic Text
Implementing support for bold, italic, and bold italic text in paragraphs. Following Markdown, these three indicators — **, *, and *** — are used. Adjacent and nested Markdown syntaxes, as well as escapes such as \* and \\, are supported. This article continues and extends the work from the eighth article. In addition to rendering all natural headers, the final PDF now styles paragraph text according to the Markdown instructions in the source text file.
🦀 Index of the Complete Series.
![]() |
|---|
| Rust: PDFs — Pango and Cairo Layout — Supporting Bold, Italic, and Bold Italic Text |
🚀 The code for this post is in the following GitHub repository: pdf_06_text_styling.
💡 Please note that Pango also supports HTML markup. I am not taking
that route because I prefer to retain as much control as possible over how the
input text is processed. For the same reason, I choose not to use any of the Rust
Markdown parser crates, and instead implement a minimal parser that
provides only the required support.
We describe the features the parser supports and some of its known limitations.
The
pdf_06_text_styling/src/inline_parser.rs test suite, in particular
the test
Markdown constants, should illustrate the parser’s capabilities.
Also, the
pdf_06_text_styling/text/essay.txt file provides a complete
example of the supported Markdown.
💡 Please note, the term marker event is used to refer to a valid opening marker followed by a valid closing marker.
● Adjacent marker: a sequence of marker events.
For example, — **Tưởng Vĩnh Kính**, Hồ Chí Minh Tại *Trung Quốc*, Thượng Huyền
dịch, ***trang 339***.
● Nested marker: some marker events are enclosed within an outer
marker event. For example, **Không đọc *sử* không đủ tư cách nói chuyện
*chính trị*.**
● Escaped: the character \ signifies that the character
following it is escaped. For example, \*not bold\* is interpreted
as the literal string *not bold*. \\Úc Đại Lợi\\
is interpreted as \Úc Đại Lợi\.
● Uneven marker indicators: the result may not be what we expect.
-
**Tưởng Vĩnh Kính***: results inTưởng Vĩnh Kính, followed by*. -
***Tưởng Vĩnh Kính**: results in*Tưởng Vĩnh Kính. -
***Tưởng Vĩnh Kính*: results in**followed byTưởng Vĩnh Kính.
● Bold nested inside italic: for example, *-- **Sir John Seeley**,
1885* is not supported. I discovered this at the last minute; it results in
-- Sir John Seeley, 1885.
To get the intended effect of -- Sir John Seeley, 1885,
use adjacent marker events: *--* **Sir John Seeley***, 1885*.
💥 It is best to construct marker events as cleanly as possible; ambiguous marker events can produce unexpected results.
Some software such as
Visual Studio Code
and https://markdownlivepreview.com/
do not suffer from these limitations. Bringing this parser up to par with such software
is not my objective, and is beyond my capabilities as well. I only aim to support a
subset of Markdown that is sufficient for creating presentable PDFs.
💡 Please note: on both Windows and Ubuntu, I’m running Rust version
rustc 1.90.0 (1159e78c4 2025-09-14).
This is once again a one‑off project—I don’t plan to update it in future development. I want to keep a log of progress exactly as it occurred. Future code may copy this and make changes to it. I’ve placed the project under the pdf_06_text_styling directory. The structure is:
.
├── Cargo.toml
├── set_env.bat
├── config
│ └── config.toml
├── src
│ ├── config.rs
│ ├── document.rs
│ ├── font_utils.rs
│ ├── inline_parser.rs
│ ├── main.rs
│ ├── main_start_01.rs
│ ├── main_start_02.rs
│ └── page_geometry.rs
├── text
│ └── essay.txt
└── .vscode
└── launch.json
We describe some modules in the following subsections. The rest will be covered in the sections that follow.
⓵ The src/page_geometry.rs module is copied unchanged from the
Rust: PDFs — Text Rotation with Cairo and Pango article.
👉 Changing any margin value in the A4_DEFAULT_MARGINS constant will
change the layout of the text in the PDF.
⓶ The src/config.rs module is copied unchanged from the
Rust: PDFs — Pango and Cairo Layout — Supporting Headers article.
⓷ 💡 The code requires the Pango, HarfBuzz, Cairo,
etc. libraries. 🐧 On Ubuntu, all required libraries are globally recognised. 🪟 On Windows,
I haven’t added the paths for the libraries’ DLLs to the PATH environment
variable. In each new Windows terminal session, I run the following once:
set PATH=C:\PF\harfbuzz\dist\bin\;%PATH%
set PATH=C:\PF\vcpkg\installed\x64-windows\bin\;%PATH%
set PATH=C:\PF\pango\dist\bin;C:\PF\cairo-1.18.4\dist\bin;C:\PF\fribidi\dist\bin;%PATH%
Alternatively, you can simply run
set_env.bat.
After that, cargo run works as expected.
⓸ 💡 In the fifth article, we discussed the
PKG_CONFIG_PATH user environment variable. This setting applies to all
later articles. I did not mention it again from the sixth article onward. In the
set_env.bat above, I include setting this variable so that we don’t forget
it and avoid potential surprises.
⓹ The text/essay.txt file — copied from the
last article,
with Markdown added to text in paragraphs.
Pango provides a powerful and straightforward approach
to text styling. We can summarise it as follows: first, apply the base font
as usual; next, determine the byte‑range of the sub‑text you want to style, and apply
attributes to those byte‑ranges to achieve the desired effects. 🦀 To get
bold italic text, apply both bold and italic attributes to the
same byte‑range.
We demonstrate this Pango approach in the
pdf_06_text_styling/src/main_start_01.rs module. For the sake of
simplicity, we use only single‑byte text: xy, bc, de. The new text‑styling
code:
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
let attrs = pango::AttrList::new();
let mut bold = AttrInt::new_weight(Weight::Bold);
bold.set_start_index(0);
bold.set_end_index(9);
attrs.insert(bold);
let mut italic = AttrInt::new_style(Style::Italic);
italic.set_start_index(4);
italic.set_end_index(5);
attrs.insert(italic);
let mut italic = AttrInt::new_style(Style::Italic);
italic.set_start_index(8);
italic.set_end_index(9);
attrs.insert(italic);
layout.set_attributes(Some(&attrs));
💡 Please note: the index parameters passed to AttrInt::set_start_index() and AttrInt::set_end_index() are byte indices, not character indices. UTF‑8 characters may span multiple bytes.
In the code above:
-
Bold is applied to the entire text, from
xtoe, inclusive. -
Italic is applied to
bcandde. Because bold already applies to the whole string, these segments become bold italic:bcandde.
The parser identifies these byte‑ranges automatically based on the positions of the marker events. Next, we look at the parser from an overview perspective.
The parser lives in the
pdf_06_text_styling/src/inline_parser.rs module. Its API is simple:
pub fn parse_inline(markdown_text: &str) -> InlineParseResult
InlineParseResult encapsulates the result of parsing a single line
(paragraph) of Markdown text. It exposes two pieces of data.
The first field is text: String. This is the text with all marker indicators
(i.e. *) removed. Escaped asterisks are still represented by the 3‑byte
character \u{E000}. Call the
reserve_asterisk() function on this text to restore
escaped * characters before giving it to Pango.
The second field is spans: Vec<Span>. This is the definition of the
Span struct. Each Span
represents a byte‑range—as discussed earlier—of a slice
in text and its associated style. Recall that ***bold italic***
produces two Spans: one for bold italic and one for
bold italic, resulting in bold italic.
Stripping out all inline documentation and test‑related code, the actual parser is fewer than 300 lines. Given the amount of inline documentation, we will not discuss the parser code in detail here. The documentation and the test methods should be sufficient to guide your understanding of the implementation.
❺ A Simple Example On Using the Parser
We now look at a simple example of how to apply the parser. The code is intentionally
minimal: it parses a single line of Markdown text and writes it to a PDF.
It assumes that the final clean text fits on a single line, so no measurement or layout
logic is required.
This example is the
pdf_06_text_styling/src/main_start_02.rs module, which is a
refactored version of the earlier
pdf_06_text_styling/src/main_start_01.rs example:
●
create_font_attrs(): a generic method that creates the styling
attributes for the text. It is based on the code shown in a
previous discussion.
● And in the
main() function:
84
85
86
87
88
89
90
91
92
93
94
95
96
97
let markdown_text = r"**Không đọc *sử* không đủ tư cách nói chuyện *chính trị*.** \*";
// let markdown_text = "***Không đọc sử không đủ tư cách nói chuyện chính trị.***";
// let markdown_text = "( **Chính Ðạo, *Việt Nam Niên Biểu*, *Tập 1A***, trang 347 )";
let res = parse_inline(markdown_text);
let attrs = pango::AttrList::new();
for span in res.spans() {
for attr in create_font_attrs(span) {
attrs.insert(attr);
}
}
layout.set_attributes(Some(&attrs));
layout.set_text(&reserve_asterisk(res.text()));
-
Calls
parse_inline()to parse theMarkdowntext. -
Uses the resulting
Spans to create the appropriate styles for each byte‑range, and applies those styles. -
Calls
reserve_asterisk()on the resulting clean text to restore any escaped asterisks, then givesPangothis final text to render using the selected font and applied styles.
Before we discuss the final main code, let’s briefly cover the auxiliary modules.
⓵ The
pdf_06_text_styling/src/document.rs module — copied from the
Rust: PDFs — Pango and Cairo Layout — Supporting Headers
article, with some refactorings:
-
Added
enum SpanStyleandstruct Span— we covered these in the Overview of the Parser and A Simple Example On Using the Parser sections. -
struct Block— in theParagraphvariant, a new fieldspans: Vec<Span>has been added, which we will discuss in a later section. -
Removed
line_heightfromstruct PositionedBlock— we will discuss this in a later section.
⓶ The
pdf_06_text_styling/src/font_utils.rs module — the code here is
not new:
-
The
previous article’s
to_pango_description()function is copied over. -
The
create_font_attrs()function discussed in A Simple Example On Using the Parser.
We have now covered all the groundwork. Next, we discuss integrating the parser into the PDF creation process.
The final module,
pdf_06_text_styling/src/main.rs, is a copy of the
previous article’s
pdf_05_header/src/main.rs module, with some refactoring. We discuss
those changes in the sections that follow.
● The
parse_blocks_from_file() function — for paragraph text, we
now assume it is Markdown and parse it accordingly:
130
131
132
133
} else {
let InlineParseResult { text, spans } = parse_inline(&line);
blocks.push(Block::Paragraph { text, spans });
}
We discussed spans in a previous section.
With this information available, we now have all the data required for measuring and
pagination.
● The new
prepare_layout_text() function replaces the previous
block_text() function. The code in this new function follows the
approach we have already discussed, and should be self‑explanatory.
● The previous
measure_block() and
output_positioned_block() functions repeatedly create
pango::Layout objects, set the font, and set the text in order to measure
line heights, perform pagination, and finally render the output. In this article, we
prepare everything once and cache it. The two methods mentioned above then use this
cached data to perform their work, rather than recalculating everything on the fly.
We discuss this caching implementation next.
● The caching mechanism is made possible by the new
struct PreparedBlock and the
prepare_blocks() function, which returns a vector of
PreparedBlock.
-
PreparedBlock— thisstructrepresents aPango-ready‑to‑render version of the semantic Block. Thelayoutfield contains complete layout data: individual lines derived from theBlock::Paragraph’stextfield that fit within the page width, right‑justified, and with font family, font size, and styling attributes already applied. TheBlock::Paragraph’sline_heightsvector stores the height of each individual line. Styling can cause line heights to vary, which is why we removed theline_heightfield fromstruct PositionedBlock, as previously discussed. -
The new
prepare_blocks()function is a simplified version of the earliermeasure_block()function. For each semantic Block, it computes aPango-readyPreparedBlockand finally returns a vector ofPreparedBlock.
It follows naturally that the total number of PreparedBlocks should always
match the number of Blocks, while there may be more
PositionedBlocks.
● The new
measure_block() function now receives, as its parameter, a
reference to the vector of PreparedBlock returned by the
prepare_blocks() function. It performs its
measurements based on this vector.
● The new
output_positioned_block() function now receives a reference
to a PreparedBlock. The overall flow of the code remains largely unchanged.
The screenshots below show some PDF pages generated on 🐧 Ubuntu:
Implementing the parser took a while, but it was satisfying to see it completed. The next feature I would like to support is images with captions, where images are specified using relative paths, similar to how it is done in LaTeX.
Thanks for reading! I hope this post helps others who are looking to deepen their understanding of PDF technology. As always—stay curious, stay safe 🦊
✿✿✿
Feature image sources:
- https://www.omgubuntu.co.uk/2024/03/ubuntu-24-04-wallpaper
- https://in.pinterest.com/pin/337277459600111737/
- https://medium.com/analytics-vidhya/rust-adventures-from-java-class-to-rust-struct-1d63b66890cf/
- https://www.pngitem.com/download/ibmJoR_rust-language-hd-png-download/
- https://en.wikipedia.org/wiki/Cairo_%28graphics%29#/media/File:Cairo_banner_closeup.svg
- https://ur.wikipedia.org/wiki/%D9%81%D8%A7%D8%A6%D9%84:HarfBuzz.svg
- https://en.wikipedia.org/wiki/Pango
