Rust FFI Font Subsetting Using the HarfBuzz Text Shaping Engine
Loosely speaking, font subsetting involves extracting only the characters we need from a font program, such as a TrueType .ttf file. The Arial Unicode MS font program is around 20MB. If we need only a few Vietnamese characters, we can extract and use those, resulting in a font subset of just a few kilobytes.
This article focuses on font subsetting on Windows and Ubuntu as a standalone process. We begin by installing a few standalone font tools on Windows, then explore the font subsetting workflow using the HarfBuzz library.
🦀 Index of the Complete Series.
![]() |
|---|
| Rust FFI Font Subsetting Using the HarfBuzz Text Shaping Engine |
🚀 The code for this post is in the following GitHub repository: harfbuzz_font_subset.
This article is a continuation of the first article. In this article, we also present a brand new standalone program, whose boilerplate code is also based on the program that accompanied the previously mentioned article.
❶ Install the BirdFont GUI Application
You can download the installer from BirdFont. It is free software, and donations to the developers are welcome.
Originally, ChatGPT recommended FontForge, which I did install and use for a bit. However, it’s quite old, and the UI feels a bit awkward.
❷ Install the Python fontTools Package
This is its repository:
https://github.com/fonttools/fonttools.
We can use it to convert a font program (such as a TrueType font file)
into XML to inspect internal information that would otherwise be hidden.
💡 If you’re unfamiliar with Python Virtual Environments or how to set one up, please see this post: Python: Virtual Environment virtualenv for multiple Python versions, which I wrote a fair while back.
I already have a generic Python development area under F:\pydev\. After
changing to the F:\pydev\ directory, activate the virtual environment
venv, then install the fonttools package using the following commands:
F:\pydev>venv\Scripts\activate
(venv) F:\pydev>pip install fonttools[ufo,lxml,woff,unicode]
Once completed, we can verify the installation with the following command:
(venv) F:\pydev>ttx --version
My installation reports 4.60.1.
🐧 On Ubuntu, the installation we performed in the
first article also installed this tool: ttx --version
reports 4.46.0.
🪟 On Windows, I can run it as follows:
(venv) F:\pydev>ttx path\to\font\program\font_file.ttf
🐧 On Ubuntu, the CLI syntax is the same.
The resulting path\to\font\program\font_file.ttx is an XML file that
lists the internal structure of the .ttf font program.
❸ Optionally Install HarfBuzz CLI Tools
Recall that the build we performed in the first article also produced some CLI tools for Windows and Ubuntu.
If desired, we can download a prebuilt version of these CLI tools from
https://sourceforge.net/projects/harfbuzz.mirror/. I downloaded
harfbuzz-win64-12.1.0.zip and extracted its contents to
C:\PF\harfbuzz-win64\. I believe this build is provided by the author of HarfBuzz.
❹ Font Programs Used In This Article
🪟 On Windows, we use the standard Arial Unicode MS font, located at
C:/Windows/Fonts/arialuni.ttf. 🐧 On Ubuntu, I downloaded
Noto_Sans_SC, Noto_Sans_TC,Noto_Serif_TC.zip
from
Google Fonts, and extracted the contents to
/home/behai/Noto_Sans_TC. The font program we are using is
NotoSansTC-Regular.ttf.
To keep things simple, we will refer to these fonts using absolute paths.
❺ The hb-shape and hb-subset CLIs
💡 Please note: based on the build and installation process discussed in the first article, 🪟 on Windows, before running these CLIs we need to set the library paths once per terminal session:
set PATH=C:\PF\harfbuzz\build\src\;%PATH%
set PATH=C:\PF\vcpkg\installed\x64-windows\bin\;%PATH%
A glyph ID is an unsigned integer
used by a font program to represent a specific visual shape of a character—for example,
Ề. The same character can have different glyph IDs in different font programs.
The text we are using for font subsetting throughout this article is “Kỷ độ Long Tuyền đới nguyệt ma. 幾度龍泉戴月磨。” — a single verse in Vietnamese and Chinese from a poem by General Đặng Dung of the Later Trần Dynasty, shortly before the general took his own life in 1413, rather than be executed by his Ming-Chinese captors. This verse means “Countless times I have sharpened my battle sword under the moonlight.”
The hb-shape CLI is a shaping diagnostics tool. Among its many options,
we can use it to obtain the glyph IDs for a given text and font program.
🪟 – In its simplest form:
C:\PF\harfbuzz\build\util\hb-shape.exe ^
C:\Windows\Fonts\arialuni.ttf ^
"Kỷ độ Long Tuyền đới nguyệt ma. 幾度龍泉戴月磨。"
The output looks like:
[gid46=0+1366|gid2985=1+1024|gid3=2+569|gid211=3+1139^
|gid2955=4+1139|gid3=5+569|gid47=6+1139|gid82=7+1139|^
gid81=8+1139|gid74=9+1139|gid3=10+569|gid55=11+1251|^
gid88=12+1139|gid92=13+1024|gid2931=14+1139|gid81=^
15+1139|gid3=16+569|gid211=17+1139|gid2957=18+1139|^
gid76=19+455|gid3=20+569|gid81=21+1139|gid74=22+1139|^
gid88=23+1139|gid92=24+1024|gid2937=25+1139|gid87=^
26+569|gid3=27+569|gid80=28+1706|gid68=29+1139|gid17^
=30+569|gid3=31+569|^gid12557=32+2048|gid12597=33+2048^
|gid29212=34+2048|^gid16216=35+2048|gid13507=36+2048|^
gid14743=37+2048|gid19319=38+2048|gid4589=39+2048]
💡 Note: if we run the above command multiple times, we are not guaranteed to get the glyph list in the same order each time.
We can feed the unique gids—46, 2985, … 4589—
into hb-subset to create a font subset program if we choose to.
To get CLI options for hb-shape, we can run:
C:\PF\harfbuzz\build\util\hb-shape.exe --help
To get just the glyph IDs for the text:
C:\PF\harfbuzz\build\util\hb-shape.exe --no-glyph-names^
--no-positions --no-clusters C:\Windows\Fonts\arialuni.ttf^
"Kỷ độ Long Tuyền đới nguyệt ma. 幾度龍泉戴月磨。"
The output is now simplified to:
[46|2985|3|211|2955|3|47|82|81|74|3|55|88|92|2931|^
81|3|211|2957|76|^3|81|74|88|92|2937|87|3|80|68|17|^
3|12557|12597|29212|16216|13507|14743|19319|4589]
🐧 – Using a different font program, we get different glyph IDs:
$ hb-shape /home/behai/Noto_Sans_TC/NotoSansTC-Regular.ttf^
"Kỷ độ Long Tuyền đới nguyệt ma. 幾度龍泉戴月磨。"
Output:
[gid44=0+621|gid460=1+521|gid1=2+224|gid194=3+620|^
gid430=4+606|gid1=5+224|gid45=6+527|gid80=7+606|gid79^
=8+610|gid72=9+564|gid1=10+224|gid53=11+544|gid86=^
12+607|gid90=13+514|gid406=14+554|gid79=15+610|gid1^
=16+224|gid194=17+620|gid432=18+606|gid74=19+275|^
gid1=20+224|gid79=21+610|gid72=22+564|gid86=23+607|^
gid90=24+514|gid412=25+540|gid85=26+377|gid1=27+224|^
gid78=28+926|gid66=29+563|gid15=30+278|gid1=31+224|^
gid5794=32+1000|gid5822=33+1000|gid17998=34+1000|^
gid8575=35+1000|gid6506=36+1000|gid7442=37+1000|^
gid11001=38+1000|gid20341=39+1000]
And:
$ hb-shape /home/behai/Noto_Sans_TC/NotoSansTC-Regular.ttf^
--no-glyph-names --no-positions --no-clusters^
"Kỷ độ Long Tuyền đới nguyệt ma. 幾度龍泉戴月磨。"
gives:
[44|460|1|194|430|1|45|80|79|72|1|53|86|90|406|79|^
1|194|432|74|1|79|72|86|90|412|85|1|78|66|15|1|5794|^
5822|17998|8575|6506|7442|11001|20341]
The hb-subset CLI extracts only the characters we need from a font
program. We can either provide the text directly or supply a unique list of glyph IDs
that represent the text.
🪟 – Let’s take a look at glyph ID input. I’m using the list above, which contains duplicates:
C:\PF\harfbuzz\build\util\hb-subset.exe --glyphs=46,^
2985,3,211,2955,3,47,82,81,74,3,55,88,92,2931,81,3,^
211,2957,76,3,81,74,88,92,2937,87,3,80,68,17,3,^
12557,12597,29212,16216,13507,14743,19319,4589^
C:/Windows/Fonts/arialuni.ttf --output-file=subset.ttf
The screenshot below shows subset.ttf as viewed in
BirdFont:
![]() |
|---|
| Windows |
🐧 – Now let’s look at text input:
$ hb-subset /home/behai/Noto_Sans_TC/NotoSansTC-Regular.ttf^
--text="Kỷ độ Long Tuyền đới nguyệt ma. 幾度龍泉戴月磨。"^
--output-file=subset.ttf
The screenshot below shows subset.ttf as viewed in
BirdFont:
![]() |
|---|
| Ubuntu |
❻ A Rust FFI Font Subsetting Using HarfBuzz’s Functions
We are replicating the subsetting functionality of the hb-subset CLI
using Rust FFI to call into HarfBuzz’s functions. The structure
of this program is quite similar to the
HarfBuzz’s hb_version_string() program discussed in the
first article — in this one, we’re simply calling more of HarfBuzz’s functions.
🦀 This program is also a one-off standalone; we’ll use the illustrated code to develop
future programs, but we won’t be making changes to it.
We note the following:
⓵ In
harfbuzz_font_subset/build.rs: Added hb-subset.h.
The compiler will generate a new bindings.rs module, which will
include the required HarfBuzz subset functions.
⓶ In
harfbuzz_font_subset/src/main.rs:
● Lines 11–23: We import the functions needed for subsetting.
● Lines 53–55: We use character Unicode values, effectively subsetting based on the actual text. The text is hardcoded for simplicity.
● Note the lack of error handling — this is intentional to keep the code simple for illustration purposes.
● Lines 65–72:
let result = slice.to_vec();
unsafe { hb_blob_destroy(blob) };
unsafe { hb_face_destroy(subset_face) };
unsafe { hb_subset_input_destroy(input) };
unsafe { hb_face_destroy(face) };
fs::write(output_font_file, result).unwrap();
🙏 I purposely access result last, after freeing all
HarfBuzz-allocated memory. This ensures that the call to
slice.to_vec() produces memory owned by Rust, rather than still
managed by HarfBuzz.
🐧 On Ubuntu, all required libraries are globally recognized. 🪟 On Windows, I haven’t added the paths for harfbuzz.dll, harfbuzz-subset.dll, and their dependencies to the PATH environment variable. So in each new Windows terminal session, I run the following once:
set PATH=C:\PF\harfbuzz\build\src\;%PATH%
set PATH=C:\PF\vcpkg\installed\x64-windows\bin\;%PATH%
After that, cargo run works as expected.
Viewing win_subset.ttf and linux_subset.ttf in
BirdFont should show the same results as
the font subset files created by the hb-subset
CLI.
💡 Subset Using Glyph IDs
For an illustration of how to subset using glyph IDs, please refer to
harfbuzz_font_subset/src/main_glyph.rs — rename it to
main.rs to build and run. It uses the
harfbuzz_font_subset/src/glyph.rs module.
I’m aware of other Rust crates related to HarfBuzz, notably the
hb-subset crate. In fact, my initial attempt at subsetting
was through this crate. However, it is outdated, so I chose the FFI route rather than
patching the crate.
That wraps up the font subsetting illustration process. There are still more than 1,300
warnings, but I’m not too worried about them. In the next instalment, we’ll extend
the code in the
harfbuzz_font_subset/src/main.rs module into a generic function:
pub fn get_font_subset(input_font_file: &str,
face_index: u32,
text: &str
) -> Result<Vec<u8>, String>
to perform font subsetting generically, as part of a polyglot (multilingual) PDF creation workflow using the lopdf crate.
Thanks for reading! I hope this post helps others on the same journey.
As always—stay curious, stay safe 🦊
✿✿✿
Feature image sources:
- https://www.omgubuntu.co.uk/2024/03/ubuntu-24-04-wallpaper
- https://in.pinterest.com/pin/337277459600111737/
- https://www.rust-lang.org/
- https://www.pngitem.com/download/ibmJoR_rust-language-hd-png-download/
- https://ur.wikipedia.org/wiki/%D9%81%D8%A7%D8%A6%D9%84:HarfBuzz.svg


