Discussion:
[fontforge-users] SetFontNames Unicode
akovia
2015-12-17 18:55:28 UTC
Permalink
I wrote a script fix some naming errors like leading spaces in names and
such and ran across something I don't understand.

In this instance I have a font with the following PS Names

Fontname    = Abaddon Familyname  = AbaddonTM Fullname    = Abaddon™

The relevant part of the script ...

Print("familyname   = "+"["+fmly+"]") Print("$fullname    =
"+"["+$fullname+"]") Print("fulfixed     = "+"["+fulfixed+"]")
SetFontNames($fontname,fmly,fulfixed) Print("familyname   =
"+"["+fmly+"]") Print("$fullname    = "+"["+$fullname+"]")
Print("fulfixed     = "+"["+fulfixed+"]")

and the result ...

familyname   = [AbaddonTM] $fullname    = [Abaddon™] fulfixed     =
[Abaddon™]

familyname   = [AbaddonTM] $fullname    = [Abaddon"] fulfixed     =
[Abaddon™]

(Output separated for readability and brackets were for identifying
leading and trailing spaces)

So it seems obvious that it is converting the unicode symbol to a double
quote, but is this the intended behavior, or am I doing something wrong?
Personally I'd rather have it removed all together if it can't set it to
the unicode, or at least have a hash table to convert to regular ascii.

Any input would be most appreciated.
--
  akovia
m***@ansuz.sooke.bc.ca
2015-12-17 22:19:28 UTC
Permalink
Post by akovia
familyname   = [AbaddonTM]
$fullname    = [Abaddon"]
fulfixed     = [Abaddon™]
So it seems obvious that it is converting the unicode symbol to a double
quote, but is this the intended behavior, or am I doing something wrong?
I think it's taking the low byte of the Unicode value. The trademark
symbol is U+2122 and the quote is U+0022, and on trying with some other
non-ASCII values, it seems to consistently take the low byte of others as
well. I think it's likely that the field cannot contain non-ASCII values,
but this certainly seems like the wrong way to enforce that requirement.

In several other places, FontForge attempts to "translate" non-ASCII
characters to ASCII using hardcoded (undocumented, inconsistent)
lookup tables, and that was my first guess as to what might be going on,
but there doesn't seem to be such a translation going on in this
particular bit of code.
--
Matthew Skala
***@ansuz.sooke.bc.ca People before principles.
http://ansuz.sooke.bc.ca/
akovia
2015-12-18 00:47:02 UTC
Permalink
Post by m***@ansuz.sooke.bc.ca
familyname   = [AbaddonTM]
$fullname    = [Abaddon"]
fulfixed     = [Abaddon™]
So it seems obvious that it is converting the unicode symbol to a double
quote, but is this the intended behavior, or am I doing something wrong?
I think it's taking the low byte of the Unicode value. The trademark
symbol is U+2122 and the quote is U+0022, and on trying with some other
non-ASCII values, it seems to consistently take the low byte of others as
well.
That's interesting and sounds probable.
Post by m***@ansuz.sooke.bc.ca
I think it's likely that the field cannot contain non-ASCII
values,
True.
It won't let you generate fonts via the gui unless it conforms to the
spec.
Post by m***@ansuz.sooke.bc.ca
but this certainly seems like the wrong way to enforce that requirement.
Agreed.
Post by m***@ansuz.sooke.bc.ca
In several other places, FontForge attempts to "translate" non-ASCII
characters to ASCII using hardcoded (undocumented, inconsistent)
lookup tables,
I didn't know this. I was actually trying to figure out how to use my
own lookup table but I saw no easy way to script it. Access to some form
of regex would be helpful.

Are you aware of any other command line tools that can modify these
values in a ttf or otf, or any other solution??
--
akovia

------------------------------------------------------------------------------
m***@ansuz.sooke.bc.ca
2015-12-18 04:21:35 UTC
Permalink
Post by akovia
Post by m***@ansuz.sooke.bc.ca
In several other places, FontForge attempts to "translate" non-ASCII
characters to ASCII using hardcoded (undocumented, inconsistent)
lookup tables,
I didn't know this. I was actually trying to figure out how to use my
own lookup table but I saw no easy way to script it. Access to some form
of regex would be helpful.
In native scripting, there is some support of plain substring search via
Strstr() and Strcasestr(). Python probably contains some regex stuff.
Adding native-script support for regular expressions would probably
require either adding another library dependency, or implementing regex
from scratch (which is the sort of thing George would have done, but
fortunately didn't before he left).
Post by akovia
Are you aware of any other command line tools that can modify these
values in a ttf or otf, or any other solution??
If I didn't want to do the string processing in a .pe script I think I'd
do it externally in Perl and then pass the results into the script as a
command-line parameter, for the script to attach the value to the font.
But one could also use TTX to convert the font file to XML, edit the XML
with whatever scripting tools are convenient, and then convert back.
--
Matthew Skala
***@ansuz.sooke.bc.ca People before principles.
http://ansuz.sooke.bc.ca/

------------------------------------------------------------------------------
akovia
2015-12-18 15:54:54 UTC
Permalink
Post by m***@ansuz.sooke.bc.ca
Post by akovia
Post by m***@ansuz.sooke.bc.ca
In several other places, FontForge attempts to "translate" non-ASCII
characters to ASCII using hardcoded (undocumented, inconsistent)
lookup tables,
I didn't know this. I was actually trying to figure out how to use my
own lookup table but I saw no easy way to script it. Access to some form
of regex would be helpful.
In native scripting, there is some support of plain substring search via
Strstr() and Strcasestr(). Python probably contains some regex stuff.
Adding native-script support for regular expressions would probably
require either adding another library dependency, or implementing regex
from scratch (which is the sort of thing George would have done, but
fortunately didn't before he left).
I played around with Strstr a bit. I wasn't able to figure out and
efficient way to handle all possible cases like multiple illegal glyphs
without some huge code for each character. I also found it odd that I
couldn't String match via some sort of notation (0x2122, U+2122, etc..)
vs the actual glyph.
Post by m***@ansuz.sooke.bc.ca
Post by akovia
Are you aware of any other command line tools that can modify these
values in a ttf or otf, or any other solution??
If I didn't want to do the string processing in a .pe script I think I'd
do it externally in Perl and then pass the results into the script as a
command-line parameter, for the script to attach the value to the font.
But one could also use TTX to convert the font file to XML, edit the XML
with whatever scripting tools are convenient, and then convert back.
Well I am still a newbie to scripting in general. This whole endeavour
is as much about learning as it is the end result.
I've dabbled with perl and python both, but i'm still in the Bash bush
league for now. Whenever I run into snags like these I always assume
that I'm just missing something so it was great of you to confirm what I
thought was happening. I'll just keep plugging away and see if I can
find a solution that is within my grasp.

Cheers!
--
akovia

------------------------------------------------------------------------------
m***@ansuz.sooke.bc.ca
2015-12-18 17:12:27 UTC
Permalink
Post by akovia
I played around with Strstr a bit. I wasn't able to figure out and
efficient way to handle all possible cases like multiple illegal glyphs
without some huge code for each character. I also found it odd that I
couldn't String match via some sort of notation (0x2122, U+2122, etc..)
vs the actual glyph.
It should be possible to use the Utf8() function to convert a code to a
string consisting of only that character, for use in Strstr(). And
although I mentioned Strstr() as a possible partial replacement for the
missing regex support, really, I think what I'd do if I were specifically
looking to find and fix non-ASCII characters, would be use the Ucs4()
function to convert the string into an array of integers, loop through the
integers (looking in particular for any outside the range 0 to 127) and
make any desired edits, and then use Utf8 to convert it back to a string.
That's probably the most painless way to do it within native scripting.
--
Matthew Skala
***@ansuz.sooke.bc.ca People before principles.
http://ansuz.sooke.bc.ca/

------------------------------------------------------------------------------
akovia
2015-12-18 18:50:19 UTC
Permalink
Post by m***@ansuz.sooke.bc.ca
Post by akovia
I played around with Strstr a bit. I wasn't able to figure out and
efficient way to handle all possible cases like multiple illegal
glyphs without some huge code for each character. I also found it
odd that I couldn't String match via some sort of notation (0x2122,
U+2122, etc..) vs the actual glyph.
It should be possible to use the Utf8() function to convert a code to
a string consisting of only that character, for use in Strstr().  And
although I mentioned Strstr() as a possible partial replacement for
the missing regex support, really, I think what I'd do if I were
specifically looking to find and fix non-ASCII characters, would be
use the Ucs4() function to convert the string into an array of
integers, loop through the integers (looking in particular for any
outside the range 0 to 127) and make any desired edits, and then use
Utf8 to convert it back to a string. That's probably the most painless
way to do it within native scripting.
Thanks for the tips! I will check them out for sure.

I was looking through the source and found what you were talking about.
https://github.com/fontforge/fontforge/blob/master/Unicode/unialt.c

It's a shame they couldn't expose this table for scripting use.

Thanks again for all the help!
--
  akovia

Loading...