• Re: pdf4tcl and Chinese characters

    From Rich@21:1/5 to Harald Oehlmann on Wed Jan 24 18:41:28 2024
    Harald Oehlmann <[email protected]> wrote:
    Thanks for great pdf4tcl !

    I have a string with Chinese characters.
    I output them with pdf4tcl:

    pdf setFont {9 p} Helvetica
    pdf setFillColor black
    pdf text "实地"

    I only get question marks.
    The interesting ::pdf4tcl::createFont command should be used to select
    256 glyphs. Well, Chinese language has a magnitude of this.

    Has anybody solved this issue ?

    Thanks for any hin,
    Harald


    pdf4tcl 0.9.4 on TCL 8.6.13...

    You are bumping into a PDF limitation.

    Each "font" within a PDF can address at most 256 characters. This is a
    limit from very early in PDF's lifetime, and creates a real PIA for
    using non-ASCII characters.

    Basically you have to create a "custom" font in the pdf using ::pdf4tcl::createFontSpecEnc with a custom encoding of codepoints (the
    byte values) to actual character glyphs. Then you have to "change
    font" to your custom font in order to draw these characters, and use
    your custom assigned code point value for the glyph you want output.

    I.e., ASCII assigns 65 decimal to capital A. Using
    ::pdf4tcl::createFont you can assign 65 decimal to output the glyph 实
    and then when you want to output that glyph, you 'change font' to your
    custom font and output 65 decimal as the "character".

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Harald Oehlmann@21:1/5 to All on Wed Jan 24 19:15:43 2024
    Thanks for great pdf4tcl !

    I have a string with Chinese characters.
    I output them with pdf4tcl:

    pdf setFont {9 p} Helvetica
    pdf setFillColor black
    pdf text "实地"

    I only get question marks.
    The interesting ::pdf4tcl::createFont command should be used to select
    256 glyphs. Well, Chinese language has a magnitude of this.

    Has anybody solved this issue ?

    Thanks for any hin,
    Harald


    pdf4tcl 0.9.4 on TCL 8.6.13...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Harald Oehlmann@21:1/5 to All on Wed Jan 24 20:18:25 2024
    Am 24.01.2024 um 19:41 schrieb Rich:
    Harald Oehlmann <[email protected]> wrote:
    Thanks for great pdf4tcl !

    I have a string with Chinese characters.
    I output them with pdf4tcl:

    pdf setFont {9 p} Helvetica
    pdf setFillColor black
    pdf text "实地"

    I only get question marks.
    The interesting ::pdf4tcl::createFont command should be used to select
    256 glyphs. Well, Chinese language has a magnitude of this.

    Has anybody solved this issue ?

    Thanks for any hin,
    Harald


    pdf4tcl 0.9.4 on TCL 8.6.13...

    You are bumping into a PDF limitation.

    Each "font" within a PDF can address at most 256 characters. This is a
    limit from very early in PDF's lifetime, and creates a real PIA for
    using non-ASCII characters.

    Basically you have to create a "custom" font in the pdf using ::pdf4tcl::createFontSpecEnc with a custom encoding of codepoints (the
    byte values) to actual character glyphs. Then you have to "change
    font" to your custom font in order to draw these characters, and use
    your custom assigned code point value for the glyph you want output.

    I.e., ASCII assigns 65 decimal to capital A. Using
    ::pdf4tcl::createFont you can assign 65 decimal to output the glyph 实
    and then when you want to output that glyph, you 'change font' to your
    custom font and output 65 decimal as the "character".

    Thank you, Rich. That is what I feared.
    Is there nobody out there who has automated this?
    I suppose, this is not easy...
    You also want to have one text field with one font, otherwise, the text
    is interrupted, I suppose.

    So, I will try to create a function, which assembles the glyphs of one
    text, then creates a font and then outputs it.
    In a 2nd step, an optimization may be done to find one font with 256 characters, which assembles as many text snippets as possible.

    I have to sleep on this...

    Thanks,
    Harald

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich@21:1/5 to Harald Oehlmann on Wed Jan 24 21:17:54 2024
    Harald Oehlmann <[email protected]> wrote:
    Am 24.01.2024 um 19:41 schrieb Rich:
    Harald Oehlmann <[email protected]> wrote:
    Thanks for great pdf4tcl !

    I have a string with Chinese characters.
    I output them with pdf4tcl:
    ...

    I only get question marks.
    The interesting ::pdf4tcl::createFont command should be used to select
    256 glyphs. Well, Chinese language has a magnitude of this.

    You are bumping into a PDF limitation.

    Each "font" within a PDF can address at most 256 characters. This is a
    limit from very early in PDF's lifetime, and creates a real PIA for
    using non-ASCII characters.

    Basically you have to create a "custom" font in the pdf using
    ::pdf4tcl::createFontSpecEnc with a custom encoding of codepoints (the
    ...

    Thank you, Rich. That is what I feared.
    Is there nobody out there who has automated this?

    Not that I'm aware of for pdf4tcl. Possibly for some other library for
    some other language.

    I suppose, this is not easy...

    Not trivial, not rocket science either.

    You also want to have one text field with one font, otherwise, the text
    is interrupted, I suppose.

    Depending upon what you mean by text field, you can switch fonts before
    drawing each glyph if you like and it will have no impact on the final
    viewing of the pdf. If by field you mean a data entry field, then I
    have no idea there.

    When you delve down into the PDF internals, you find that PDF is
    nothing more than instructions to place glyphs at x,y positions on a
    sheet of virtual paper. I.e., internally it is very much like the
    Tcl canvas widget. Which is why 'font switches' don't cause problems
    with the render (unless you, the creator, create vastly different
    actual fonts for 'effect'). But if the plural "fonts" are all of the
    same size and all from the same base, font switches are invisible in
    the final render.

    So, I will try to create a function, which assembles the glyphs of one
    text, then creates a font and then outputs it.

    Yes, you either have to decide what glyphs you want ahead of time, and 'pre-create' fonts to draw those glyphs, or you have to analyze the
    characters you want to "print" for the pdf (or for the current page)
    and create a custom font for those characters.

    The one advantage you get for the second method is that most unicode
    TTF font files are huge, and if you create a custom internal font for
    only the used characters, pdf4tcl only embeds the glyphs for the
    characters you actually use, which means if you only use 1% of the
    glyphs, you only store 1% of the font file into the pdf, making the pdf smaller.

    In a 2nd step, an optimization may be done to find one font with 256 characters, which assembles as many text snippets as possible.

    Yes, it will be possible to do so, sometimes. For Chinese, given the
    huge number of total characters, this may be difficult to do in a
    general sense for all possibilities, but you might come close.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Harald Oehlmann@21:1/5 to All on Thu Jan 25 08:44:03 2024
    Muchas gracias, Alejandro,
    looks promissing,
    Harald

    Am 25.01.2024 um 00:27 schrieb [email protected]:
    Harald,
    take a look to tclfpdf (https://github.com/lamuzzachiodi/tclfpdf).
    There are an example (utf8.tcl, pasted below) with chinese characters using font simhei.ttf.
    May be this help you.
    Saludos,

    Alejandro

    #--- utf8.tcl -----------
    package require tclfpdf
    namespace import ::tclfpdf::*

    Init;
    AddPage;
    # Add a Unicode font (uses UTF-8)
    AddFont "DejaVu" "" "DejaVuSansCondensed.ttf" 1;
    SetFont "DejaVu" "" 14;
    Write 8 " -----
    English: Hello World
    Greek: Γειά σου κόσμος
    Polish: Witaj świecie
    Portuguese: Olá mundo
    Spanish: Hola mundo
    Russian: Здравствулте мир
    Vietnamese: Xin chào thế giới
    ------";
    Ln 10;
    AddFont "simhei" "" "simhei.ttf" 1;
    SetFont "simhei" "" 20;
    Write 10 "Chinese: 你好世界";
    #Select a standard font (uses windows-1252)
    SetFont "Arial" "" 14;
    Ln 10;
    Write 5 "The file size of this PDF is only 16 KB.";
    Output "utf8.pdf";

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Harald Oehlmann@21:1/5 to All on Wed Feb 28 13:38:41 2024
    Dear team,

    please allow me to give a status on this.
    I looked into tclfpdf, great stuff.
    Specially the Error routine to call "exit" is quite wiered.
    But now, this issue is solved using PDF4TCL as described at the very end of: https://wiki.tcl-lang.org/page/pdf4tcl

    It is quite manual and an automated process like in tclfpdf would be
    great. The pdf4tcl ticket tracked GOT 6 new tickets.

    bUT ANYWAY; THANK YOU ALL AND TAKE CARE;
    hARALD


    Am 25.01.2024 um 08:44 schrieb Harald Oehlmann:
    Muchas gracias, Alejandro,
    looks promissing,
    Harald

    Am 25.01.2024 um 00:27 schrieb [email protected]:
    Harald,
    take a look to tclfpdf (https://github.com/lamuzzachiodi/tclfpdf).
    There are an example (utf8.tcl, pasted below) with chinese characters
    using font simhei.ttf.
    May be this help you.
    Saludos,

    Alejandro

    #--- utf8.tcl -----------
    package require tclfpdf
    namespace import  ::tclfpdf::*

    Init;
    AddPage;
    # Add a Unicode font (uses UTF-8)
    AddFont "DejaVu" "" "DejaVuSansCondensed.ttf" 1;
    SetFont "DejaVu" "" 14;
    Write 8 "        -----
    English: Hello World
    Greek: Γειά σου κόσμος
    Polish: Witaj świecie
    Portuguese: Olá mundo
    Spanish: Hola mundo
    Russian: Здравствулте мир
    Vietnamese: Xin chào thế giới
            ------";
    Ln 10;
    AddFont "simhei" "" "simhei.ttf" 1;
    SetFont "simhei" "" 20;
    Write 10 "Chinese: 你好世界";
    #Select a standard font (uses windows-1252)
    SetFont  "Arial" "" 14;
    Ln 10;
    Write 5 "The file size of this PDF is only 16 KB.";
    Output "utf8.pdf";


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)