( Python / ReportLab. ) How UTF-8 Gets Displayed by Browsers and PDF creation tools?
A new question popped into my head recently, triggered by working with ReportLab.
When we declare an HTML page as UTF-8, we can display all kinds of human languages
using the “font-family that we specify in the CSS” ( my erroneous assumption! ).
Whereas with PDF tools, we need to select appropriate fonts for target human
languages that we want to display. The question, therefore, is: how do browsers
manage that?
How UTF-8 Gets Displayed by Browsers and PDF creation tools?
The following
HTML page illustrates
UTF-8 declaration mentioned in the introduction. I purposely
specify only Arial font for
the page:
We understand that, within
Windows, available fonts are in the
“Fonts” folder ( directory ), directly under the
Windows installation directory. In my case, it is:
C:\Windows\Fonts
-- I had always assumed that, the Arial fonts shipped with Windows are capable
of displaying all human languages available under the UTF-8 character encoding
as defined by the Unicode Standard!
ReportLab User Guide section
3.6 Asian Font Support, page 53, spells out clearly that
we need to load appropriate fonts for languages that we want to
work with. Under the above aforementioned assumption, I thought
loading
Windows Arial font would give me a
PDF text similar to the
HTML above:
-- Note: to work out the file name:
arial.ttf -- I just copy the
“Arial icon” in my
C:\Windows\Fonts folder to the
Python script's
“fonts” sub-directory. It will then list several
tff files. Double click on any one of them,
Windows will bring up the font sample dialog, the name of the font
is listed in this dialog. I repeat this process for other fonts.
The result was not what I assumed:
In
Acrobat-Reader, under
File | Properties… | Font tab shows all embedded fonts in the
document:
ArialMT is used in the document.
ArialMT is loaded and used.
– FireFox loads other fonts on its own accord as necessary to display the content correctly.
I do the same:
Microsoft YaHei font files have
ttc extension. I use
https://transfonter.org/
to convert
msyhl.ttc to
ttf, and store the result files in the
Python script’s
“fonts” sub-directory. This time, the result is what I have
anticipated:
Font is a very large subject. I was just trying to answer my own question.
I am happy with what I have found. And I hope you find this post useful,
and thank you for visiting.