UFO 4 File Structure

84 views
Skip to first unread message

Tal Leming

unread,
Apr 4, 2015, 10:31:19 PM4/4/15
to ufo-...@googlegroups.com
Hi Everyone,

As has been discussed in passing on this list, we are considering a big file structure change for the UFO in UFO 4. This is in the very early stages and we need community involvement. We had a UFO file structure meeting after Robothon last month and began the planning for our research work. If you are interested in the task ahead, please read on...

The current UFO file structure has served us well, but it is experiencing some growing pains. Dropbox and Dropbox-like services aren't tuned to handle the high-volume of files that are inside of the UFO structure. That makes these popular and useful services painfully slow or broken when working with large numbers of small UFOs or small numbers large UFOs. It's also not possible to email a UFO without an additional archiving/compression step that often confuses users. Plus, Windows doesn't recognize the UFO as a file type. These problems are all related to the package structure that the UFO utilizes. Switching to a single file, rather than a package, might solve these problems. Rather than pick one and hope that it will work, we're going to investigate several candidates with systematic "apples to apples" comparisons.

# Possible Formats

We have discussed several formats so far and have some strong candidates to test:

- Zip: A compressed version of the current UFO structure.
- "Database": A simple key/value database, possibly with some additional data for glyphs that can be queried (Unicode value, etc.). SQLite emerged as a favorite at our meeting, but others like MongoDB may be worth exploring.
- Single XML File: A single, flat XML representation of the current package structure. This could be regular XML or something like KML.
- Shallower Package: A package, but without single files for individual glyphs. So, fewer files overall.

# Evaluation Criteria

We have some criteria that we can use to evaluate the candidates. Some of these are absolute requirements:

- Human readable: Inaccessible binary files are not acceptable.
- Extensible: The format must be easy to add to in the future.
- Implementation ease: Adding support for this new format must not be difficult for developers. It needs to more or less work with an off the shelf code library, preferably one that is part of a core language distribution (ie the Python Standard Library).
- History: The file structure specification needs to be established, maintained and have a track record of significant real world use. It would be catastrophic if we picked some brand new, awesome thing and then the developer of that file structure got bored and left it to rot.

Some of these are "it would be great if" criteria:

- Speed: How long does it take to read and write? Faster would be better.
- Improved version control compatibility: UFO has been very useful in version control situations. We need to make sure that we don't break that and ideally we will be able to improve it.
- Inter-app communication: Users have become accustomed to having the same UFO open in multiple applications at the same time. It's not a practice that I'm 100% comfortable with, but users love the flexibility it gives them. It would be good if we could maintain this functionality. It would be even better if the new file structure enabled more stability in this practice.

# Evaluation Plan

Evaluating these things in a systematic way with hard data is going to be key. We will need to test the candidates by implementing them, evaluating file sizes and complexity and timing read/write speed in different real-world environments. That last thing is going to require input from as many designers and foundries as will humor us. We will build a tool that will allow these folks to run the environment tests in their own specific hardware setups and report the results back to us. Essentially, the user would run the tool, the tool would ask for a directory where it can write some temporary files, it would chug along for probably a long time, then output some data and the user would send us the output.

I have started the infrastructure for all of this. It is here:

https://github.com/unified-font-object/ufo4Research/tree/master/fileStructureTests

I won't bore you with all of the details, so here's the bird's eye view: Each of the candidate file structures will be expressed as a subclass of a base "file system" class (called BaseFileSystem). BaseFileSystem sets the required I/O API needed for UFOs and implements all of this except the direct file structure I/O behavior. In other words, it knows how to read from/write to XML and where the files should be located, but it doesn't know how to actually store it into a file structure. The candidate file structure subclasses must implement that because, obviously, that is going to be file structure specific. On top of all of this is a basic UFO reader/writer (forked from RoboFab and simplified for our purposes) and a set of font/layer/glyph/... objects. The reader/writer takes a file system during init and blissful reads from/writes to it without needing to know anything about the actual file structure.

This will allow subclasses to be implemented easily. For example, as a control for the experiments I've written a UFO3FileSystem that reads/writes the UFO 3 package structure. Here are all 51 lines of the code:

https://github.com/unified-font-object/ufo4Research/blob/master/fileStructureTests/ufo3.py

This subclass system will also allow us to make "apples to apples" comparisons since the only thing that the file structure candidates will need to have implemented is their own specific I/O. The UFO behavior, XML interpretation and all of that is centralized and shared.

This is still a work in progress, but it is nearly complete. I'll put full documentation on the UFO 4 Research wiki as soon as the code is ready for others to work with.

## Benchmarks

We will need to establish a set of common authoring tool to/from UFO interactions that we can test. Here are some:

- reading an entire UFO
- writing an entire UFO
- reading a small number of glyphs from a UFO
- writing a small number of glyphs into an existing UFO
- reading a large number of glyphs, but not all, from a UFO
- writing a large number of glyphs, but not all, into an existing UFO

We will need to test all of these with a large collection of UFOs. And, we will need to test these in various hardware configurations. Our test suite can be expressed as an equation like this:

each candidate file structure * authoring tool interactions * test UFOs * hardware configurations = results

It's probably going to be a lot of data. That will be useful.

## Test Cases

Fonts come in all different shapes and sizes these days and workflows vary quite drastically. We need representative tests of as many of these as we can find. We need tiny UFOs, large UFOs, production UFOs, in-progress UFOs, etc. We need as many designers and foundries to contribute to this collection as possible. Obviously, we can't place all of these fonts in a public repository. However, for our purposes, we don't need UFOs that contain pretty outlines, meaningful kerning pairs, etc. We just need UFOs that contain representative amounts of data. There is an easy solution to this. We can develop a tool that will dump abstract, anonymous statistical data (the number of glyphs, the number of contours, the number of points and so on) about a foundry or designer's typefaces and we can use these reports to generate UFOs. I've started a Python script that does this:

https://github.com/unified-font-object/ufo4Research/blob/master/fontSizeReporter/fontSizeReporter.py

This is not complete and I think of new things that it should do every hour or so. It only works on OTFs, but it should also work with UFOs in RoboFont, Glyphs files and any other environment where we can automate file reading. We also need to figure out the exact data that we want to extract. I need some help with all of that. But, I digress. Here is what the current tool generates from my own fonts:

https://github.com/unified-font-object/ufo4Research/blob/master/fileStructureTests/core/fontData.txt

From these reports, we can generate representative UFOs. This also prevents us from needing to store a huge number of actual, huge UFOs in our Git repository. When we are running our tests, we can generate the UFOs on the fly as needed and tear them down when the testing is complete. Here is a generator that works with the report format from above:

https://github.com/unified-font-object/ufo4Research/blob/master/fileStructureTests/core/fonts.py

# In Conclusion

So, we have work a lot of work ahead of us, but we have a plan. If anyone has any suggestions for possible file structures, please let us know. If anyone wants to help with building the infrastructure or core test suite, please let us know. If you have thoughts on the testing process, please let us know. Your input is important.

# One Last Thing

I've heard the complaints about XML, Property Lists*, GLIF and all of that. If you think we should use a format/file structure/whatever that is complete different from what we have used in UFO 1-UFO 3, please post it to this list. The directions that we are currently looking into are pragmatic. There is a decade worth of code in multiple codebases that reads and writes the data formats that are currently in the UFO spec. If we dramatically change the file structure, the code in these codebases will have to be rewritten. I am reluctant to make that kind of a spec change because it would mean a significant amount of work for many developers and, thus, chaos. I'd rather figure out how to handle the growing pains that UFO is facing with the current XML formats. That said, if you have a file structure that you think is good enough to warrant dramatic code changes, please present it.

Thanks,
Tal

*Seriously. I got it. This is the last time I'm going to write about Property Lists... It was 2003 and we needed a stable, text-based, programming language neutral, archive quality object serializer that was easy to implement. Property Lists were what we had, so we used them. No, they are not human readable if your definition of "human readable" is "open it up in a plain text editor and very quickly see and understand everything without any knowledge of the data structures." Yes, they are "human readable" if your definition of "human readable" is that a human can open one in a text editor, read the text, change the text, save and have the file work. The bottom line is that what we had at the time were undocumented binary files that regularly went corrupt. Property Lists may not be pretty, but they are documented and relatively easy to fix when something goes wrong. So, we solved the problem that we were facing then. Now, please stop complaining about Property Lists. It's a waste of everyone's time. What's done is done. :)

rrob...@adobe.com

unread,
Apr 6, 2015, 1:33:00 PM4/6/15
to ufo-...@googlegroups.com
If you want a large test UFO font, you can use the AFDKO program 'tx' tool to convert one of the Open Source AdobeHanSans faces to UFO: this gets you 65535 real .glif files.  ('tx -ufo SourceHanSans-Bold.otf test.ufo'; takes about 4 minutes.).

Working in a development group, my contacts have have no problem with compressing /decompressing UFO fonts, and source code control is a major issue. Many small text files is a big advantage with many source code control systems, so I am much happier with the current structure than with a single file. I suggest that the next version support both the current package structure and a single file version. People who care which is used will mostly know how to convert between them as needed. 

I am fond of zip - widely supported, and popular text editors like BBedit allow you to open a zip archive file and edit individual files directly ( although you do have to add a mapping for '.glif' to XML in Preferences->Languages).
Reply all
Reply to author
Forward
0 new messages