New Leo "Package" File Format?

83 views
Skip to first unread message

Thomas Passin

unread,
Mar 29, 2023, 1:48:09 PM3/29/23
to leo-editor
There is a perennial problem when one wants to give a  Leo outline to someone else.  It happens when an outline contains external files, or images to display, or any other data files that might be needed.  For example, an article written with the Viewrendered3 plugin in mind, or for a Sphinx document, must have its resources available or it cannot work.

If the outline contains @file trees and these external files aren't included in, e.g., a zip file, those files will be blank when the recipient open the outline.  Yes, one can change the @files to @clean and re-save them all.  But that is awkward, and negates the reason for having them be @files in the first place.  

Otherwise, one is forced to create a package file - usually a zip file - that contains the outline and any required external files and subdirectories.

Current software, such as LibreOffice or Word, handle this by saving their files as archives that contain all the external resources a document needs.  I suggest that Leo needs a similar capability.  This would not replace the existing Leo file format nor the existing outline save commands.  It would add new Save/Open Archive commands.

How might this work?  For a save, Leo would check each external file and each @rst tree to get their paths, and then compress the external resources and at-files into the archive.  For other resources, such as images in, say, an images directory, An outline could have a new kind of node, perhaps with an @resources headline, that specifies what subdirectories and files to include.  Perhaps there could be more than one @resources node in the document.

To open an archive, Leo would expand the archive, which would create all the directories and files. Then it would load the .leo outline contained in the archive.

David Szent-Györgyi

unread,
Apr 1, 2023, 5:04:41 PM4/1/23
to leo-editor
I have seen this function referred to as "Pack and Go" in other software that is widely used - SolidWorks, AutoCAD for starters. 

Mike Hodson

unread,
Apr 1, 2023, 10:06:06 PM4/1/23
to leo-e...@googlegroups.com


On Wed, Mar 29, 2023, 11:48 Thomas Passin <tbp1...@gmail.com> wrote:
There is a perennial problem when one wants to give a  Leo outline to someone else.  It happens when an outline contains external files, or images to display, or any other data files that might be needed.  For example, an article written with the Viewrendered3 plugin in mind, or for a Sphinx document, must have its resources available or it cannot work.

<Snip>

This is the main reason why Audacity decided to move to SQLite as its file saving format in v3.0. Someone would have an AUP file and not every one of the .au additional files to go along with it and wonder where their project went when they tried to move it.

I assume they wanted to make it something that was simply usable in the program as an extent API, however this does have the issue of not releasing space from the save file until an unsaved session is completely closed. [Truth be told I'm not a huge fan of the SQLite format myself, and would much prefer a mysql server in almost every use case.]

It seems it took audacity ~2 years to implement this as a lasting format change from the point it was mentioned as a pull request until it became cemented in version three of the program.

Having a zip file of all file assets would work as well with some implementation of a VFS by the program utilizing it I presume.



Mike

David Szent-Györgyi

unread,
Apr 2, 2023, 3:24:02 PM4/2/23
to leo-editor
On Saturday, April 1, 2023 at 10:06:06 PM UTC-4 mys...@gmail.com wrote:
Having a zip file of all file assets would work as well with some implementation of a VFS by the program utilizing it I presume.

Given use of the appropriate codec, Zip files are easily packed or unpacked. 

The zlib codec provides for superior compression, and is open source; it can be built for operating systems that do not bundle it. 

Thomas Passin

unread,
Apr 2, 2023, 3:55:31 PM4/2/23
to leo-editor
That's good to know.  It seems to me that the main challenge would be for Leo to know just what to have in the package.  External files would be easy, but for example image files - how to know about them could be a real challenge.  I'm thinking that an outline could contain an @resources node, where the user could add anything that Leo didn't know about.  Not ideal, but perhaps necessary.

Edward K. Ream

unread,
Apr 2, 2023, 4:44:13 PM4/2/23
to leo-e...@googlegroups.com
On Sun, Apr 2, 2023 at 2:55 PM Thomas Passin <tbp1...@gmail.com> wrote:
That's good to know.  It seems to me that the main challenge would be for Leo to know just what to have in the package.  External files would be easy, but for example image files - how to know about them could be a real challenge.  I'm thinking that an outline could contain an @resources node, where the user could add anything that Leo didn't know about.  Not ideal, but perhaps necessary.

I'll create a new command soon. A first draft will include all @<file> nodes.

Edward

David Szent-Györgyi

unread,
Apr 2, 2023, 5:18:24 PM4/2/23
to leo-editor
On Sunday, April 2, 2023 at 3:55:31 PM UTC-4 tbp1...@gmail.com wrote:
It seems to me that the main challenge would be for Leo to know just what to have in the package.  External files would be easy, but for example image files - how to know about them could be a real challenge.  I'm thinking that an outline could contain an @resources node, where the user could add anything that Leo didn't know about.  Not ideal, but perhaps necessary.
 
If you limit the scope of your work to compression and decompression of files, you might consider the libraries available for 7-Zip - support for operating ystems other than Windows requires one of the variants described there. If you care about handling individual images or metadata from them, your task is much greater and a great challenge. 

I know something of that challenge, since I earn my living supporting software for life science microscopy. The number of formats used in that field is enormous, the requirements that must be during acquisition are distinct from those required thereafter for retrieval and analysis. Acquisition can involve a great number of individual images, enough that efficient writing to disk and reading back from disk can require a number of individual files, with a separate file that describes the entire dataset. 

Not that you would necessarily wish to use the formats designed for life science microscopy of the open source software available for reading and writing them, but here are links that might be of interest. 

OME-TIFF and OME-Big-TIFF: these support individual files with a great number of images; the OME-Big-TIFF variant supports files larger than four gigabytes. These, among others, are described under "OME Model and File Formats". Information specific to OME-TIFF is available; documentation for the OME-TIFF file structure is available also. 

Bio-Formats is a standalone Java library for reading and writing life sciences image file formats. It is capable of parsing both pixels and metadata for a large number of formats, as well as writing to several formats. C++ code is available; I cannot speak to its condition and compliance with the current standard for the format. 

Thomas Passin

unread,
Apr 2, 2023, 5:41:00 PM4/2/23
to leo-editor
Now this is interesting!  I wasn't considering huge image files, most just "ordinary" ones like photos, screenshots, or graphs that would be common images to want to include with, for example, a markdown document.  I'll read up on your links.  Thanks!

Edward K. Ream

unread,
Apr 2, 2023, 11:21:42 PM4/2/23
to leo-e...@googlegroups.com
On Sat, Apr 1, 2023 at 4:04 PM David Szent-Györgyi <das...@gmail.com> wrote:
I have seen this function referred to as "Pack and Go" in other software that is widely used - SolidWorks, AutoCAD for starters. 

See #3238. I'm going to keep this simple for now. The new command will only add the .leo file and its external files to the .zip file.

Edward

David Szent-Györgyi

unread,
Apr 3, 2023, 12:18:51 PM4/3/23
to leo-editor
Now this is interesting!  I wasn't considering huge image files, most just "ordinary" ones like photos, screenshots, or graphs that would be common images to want to include with, for example, a markdown document.  I'll read up on your links.  Thanks!

A consideration that didn't occur to me initially: the archive file format might have a limit on file size. The archive file library might also have a limit on that. The original ZIP file format limited archive size to 4 GB; I read that the ZIP64 format extension raises that limit to 16 exabytes(!), and that Windows Vista and its successors build support for ZIP64-sized archives into Windows Explorer/File Explorer, and that macOS Sierra's built-in Archive Utility does not support ZIP64. 

Limits imposed by the file system can also be a problem. NTFS has no issues, but FAT32 limits file size to 4GB; I reformat FAT32 thumb drives drives to exFAT format, which lifts that to 2^64-1 bytes(!). 

I don't want to think about cross-platform issues handling metadata such as NTFS attributes. 

I find that Windows 7 Windows Explorer and Windows 10 File Explorer clutter the Windows file cache when I ask them to process Zip archives of many hundreds of files or multiple gigabyte Zip files; basic operations slow to a crawl as a result, leading me to reboot Windows to recover the performance lost. I don't see such performance losses when I use 7-Zip instead. 

Offray Vladimir Luna Cárdenas

unread,
Apr 24, 2023, 4:48:43 PM4/24/23
to leo-e...@googlegroups.com

Hi,

What we do, is that we package the associated files to a data narrative as a Fossil[1] repository, with versioned and unversioned files. Versioned files are used for the ones where we want to track the history and the unversioned are used for raster files (for example, PDF outputs for the data stories or PNG/JPG) images. This gives us a pretty simple infrastructure to exchange and publish our resources, that can travel in a single file. It's like .zip but with history.

[1] https://fossil-scm.org

I think that, via Fossil, SQLite as an application format is a pretty powerful tool. Some related docs and videos below:

My 100 pesos,

Offray

--
You received this message because you are subscribed to the Google Groups "leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email to leo-editor+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/leo-editor/516d7db0-e2e9-4fdf-9197-0a10f869edfbn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages