Application X-gzip File Extension

0 views
Skip to first unread message

Rita Seliba

unread,
Aug 4, 2024, 6:25:55 PM8/4/24
to verbstearulhot
Hypotheticallyif a tarball were an official media type and following conventions, its MIME type would be application/tar (file extension .tar) and its compressed version would be application/tar+gzip (file extensions .tar.gz and .tgz).

When the user right-clicks and saves this file (in Chrome) I would like the resulting file's name to be download.js.gz. This way when the user double clicks this file (in Mac OS X), it gets decompressed and renamed correctly to download.js, and they can view the contents easily.


The problem is that when I use the content type text/javascript (or application/x-javascript), the file gets saved as download.js. And when I use the content type application/x-gzip, the file gets saved as download.gz.


There doesn't seem to be a way to do this in browsers besides Chrome. Therefore, the best option I could think of is to inform the user to save the file normally (so that it automatically inserts the proper extension based on the content type), and then having them go into finder and renaming the file to append the '.gz' extension.


gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and intended for use by GNU (from where the "g" of gzip is derived). Version 0.1 was first publicly released on 31 October 1992, and version 1.0 followed in February 1993.


gzip is based on the DEFLATE algorithm, which is a combination of LZ77 and Huffman coding. DEFLATE was intended as a replacement for LZW and other patent-encumbered data compression algorithms which, at the time, limited the usability of the compress utility and other popular archivers.


Although its file format also allows for multiple such streams to be concatenated (gzipped files are simply decompressed concatenated as if they were originally one file),[5] gzip is normally used to compress just single files.[6] Compressed archives are typically created by assembling collections of files into a single tar archive (also called tarball),[7] and then compressing that archive with gzip. The final compressed file usually has the extension .tar.gz or .tgz.


gzip is not to be confused with the ZIP archive format, which also uses DEFLATE. The ZIP format can hold collections of files without an external archiver, but is less compact than compressed tarballs holding the same data, because it compresses files individually and cannot take advantage of redundancy between files (solid compression).The gzip file format is also not to be confused with the that of the compress utility, based on LZW, with extension .Z; however, the gunzip utility is able to decompress .Z files.[8]


Various implementations of the program have been written. The most commonly known is the GNU Project's implementation using Lempel-Ziv coding (LZ77). OpenBSD's version of gzip is actually the compress program, to which support for the gzip format was added in OpenBSD 3.4. The 'g' in this specific version stands for gratis.[9] FreeBSD, DragonFly BSD and NetBSD use a BSD-licensed implementation instead of the GNU version; it is actually a command-line interface for zlib intended to be compatible with the GNU implementations' options.[10] These implementations originally come from NetBSD, and support decompression of bzip2 and the Unix pack format.


An alternative compression program achieving 3-8% better compression is Zopfli. It achieves gzip-compatible compression using more exhaustive algorithms, at the expense of compression time required. It does not affect decompression time.


Data in blocks prior to the first damaged part of the archive is usually fully readable. Data from blocks not demolished by damage that are located afterward may be recoverable through difficult workarounds.[12]


The tar utility included in most Linux distributions can extract .tar.gz files by passing the z option, e.g., tar -zxf file.tar.gz, where -z instructs decompression, -x means extraction, and -f specifies the name of the compressed archive file to extract from. Optionally, -v (verbose) lists files as they are being extracted.[13]


zlib is an abstraction of the DEFLATE algorithm in library form which includes support both for the gzip file format and a lightweight data stream format in its API. The zlib stream format, DEFLATE, and the gzip file format were standardized respectively as RFC 1950, RFC 1951, and RFC 1952.


The gzip format is used in HTTP compression, a technique used to speed up the sending of HTML and other content on the World Wide Web. It is one of the three standard formats for HTTP compression as specified in RFC 2616. This RFC also specifies a zlib format (called "DEFLATE"), which is equal to the gzip format except that gzip adds eleven bytes of overhead in the form of headers and trailers. Still, the gzip format is sometimes recommended over zlib because Internet Explorer does not implement the standard correctly and cannot handle the zlib format as specified in RFC 1950.[14]


Since the late 1990s, bzip2, a file compression utility based on a block-sorting algorithm, has gained some popularity as a gzip replacement. It produces considerably smaller files (especially for source code and other structured text), but at the cost of memory and processing time (up to a factor of 4).[15]


Research published in 2023 showed that simple lossless compression techniques such as gzip could be combined with a k-nearest-neighbor classifier to create an attractive alternative to deep neural networks for text classification in natural language processing. This approach has been shown to equal and in some cases outperform conventional approaches such as BERT due to low resource requirements, e.g. no requirement for GPU hardware.[16]


I would like to check with you guys here in AWS forum the issue that I discovered a while ago. Basically I'm trying to test our backup recovery process. Below is my simple flow of our backing up process.


However when I tried to download the tar.gz file from S3, considering that on the AWS Console S3 file list it's has a correct file extension of tar.gz e.g. backup1.tar.gz BUT after the download, I've noticed that my backup file turns to backup1.tar (not tar.gz) so when I tried to decompress the backup its not usable or readable etc.

I'm not sure if it's on my OS level error but overall it will make a huge confusion to the user if the backup process is working fine or not.


I did aws s3 sync some/local/folder s3://my-special-bucket/some/local/folder

and some of the files that got uploaded were .tar.gz files. Well, AWS decided that those should receive the metadata Content-Type: applicaton/x-tar instead of Content-Type: application/x-gzip. Highly aggravating. I had to write a script to list every file in my bucket and then fix the metadata for any file with the .tar.gz extension.


I have yet to see anyone explain how to download a tar.gz from an S3 bucket without AWS changing the format to a .tar and changing the config of the files. I need help to figure out how to down load a large (1.7 GB) tar.gz file.


If you wouldn't mind creating an issue on the AWS-CLI's Github ( -cli/), that will help us better track this. Including the version of your aws cli (aws --version) and a brief set up steps for reproduction will be really useful.


When I got the meta$mime_type and parsed it then added to the end of the extracted file name, wrong extension was appeared. For instance, in the log file (comparing with the ID of the file to check same file) the mime type was jpeg and the extract file was also image but in the name of the file had html -extract_ID_html- should be extract_ID_jpeg. So I need the exact mime type of the file. Also I tried to take log file using global value. But it did not correspond right type. It seems to be same as the extracted file name.


May have suggested that before, but have you looked into the hosom/file-extraction package - or installed it via zkg and see if that provides the wanted behavior? It contains a table mapping some known mime-types to file extensions and I wonder if this i how thes what you need?


I have few relatively large datasets as gzipped JSON files, with extension .json.gz. The hugo development server serves these files with the application/x-gzip MIME type. But, I would like them to be served as application/json files, because then the browser will automatically decompress them and consider them as normal JSON files. This plays more nicely with jQuery AJAX calls etc.


Additional question: what's the correct mime type for tar.bz2/tbz2 files? Different sources gave me all kinds of different answers: application/x-gtar, application/x-compressed-tar, application/x-bzip-compressed-tar, application/x-tar-bz2, application/x-bzip2, etc.Same for tar.gz/tgz


Note: Compression schemes like "gzip", "bzip", and "compress" are not actually "mime-types". They are "encodings" and hence must not have entries in this file to map their extensions. The "mime-type" of an encoded file refers to the type of data that has been encoded, not the type of encoding.


MinIO Server supports compressing objects to reduce disk usage.Objects are compressed on PUT before writing to disk, and uncompressed on GET before they are sent to the client. This makes the compression process transparent to client applications and services.


Depending on the type of data, compression may also increase overall throughput.Write throughput for a production deployment is generally 500MB per second or greater per available CPU core in the system.Decompression is approximately 1 GB per second or greater for each CPU core.


Some data cannot be effectively compressed, such as video or already compressed data.MinIO does not compress common incompressible file types, even if they are specified in the compression configuration.


MinIO supports encrypting compressed objects but recommends against combining compression and encryption without a prior risk assessment.Before enabling encryption for compressed objects, carefully consider the security needs of your environment.

3a8082e126
Reply all
Reply to author
Forward
0 new messages