Downloading and Extracting TGN explicit.zip from http://vocab.getty.edu/doc/#Export_Files

72 views
Skip to first unread message

Bob St. Clair

unread,
Oct 13, 2016, 7:57:20 PM10/13/16
to Getty Vocabularies as Linked Open Data
I've downloaded this file. The page I downloaded this from reports that the extracted size would be 13.8GB. However, looking at the zip file, it looks like the extracted size would be in the exabyte range (if I've got my decimal places correct). Is there something wrong with the zip file, or is the reported extract size woefully out of date?

Thanks!

Vladimir Alexiev

unread,
Oct 14, 2016, 4:39:45 AM10/14/16
to Getty Vocabularies as Linked Open Data
The page is a bit out of date: says zip is 661Mb, but it's in fact 1024Mb. Assuming the same decompression ratio, the extracted size would be 21.4Gb. If the zip says exabytes, it may be corrupted. I'm downloading it now and will check.

BTW consider downloading the Total Exports (see last section). They are 1.9x bigger for TGN, but otherwise you'll be missing the inferred statements (see http://vocab.getty.edu/doc/#Inference)

Vladimir Alexiev

unread,
Oct 14, 2016, 5:29:33 AM10/14/16
to Getty Vocabularies as Linked Open Data
The extracted size as of Apr 2015 was 6Gb. This means very significant growth for TGN: 3-4 times in 18 months.

Vladimir Alexiev

unread,
Oct 14, 2016, 5:33:17 AM10/14/16
to Getty Vocabularies as Linked Open Data
TGN had 1475679 places in Apr 2015, and now has 2495100: a growth of 1.7x in 18 months

Vladimir Alexiev

unread,
Oct 14, 2016, 12:19:52 PM10/14/16
to Getty Vocabularies as Linked Open Data
The extracted zip is 25.2Gb, so it's good.

Bob St. Clair

unread,
Oct 17, 2016, 1:52:31 PM10/17/16
to Getty Vocabularies as Linked Open Data
Hi Vladimir,
Thanks for looking into this. When I download from the link on the page, it's still trying to extract a 1.43 exabyte set of files. Did you download from the page link, or from the backend behind the scenes? I'm using Windows and it's built-in extract capability.

Thanks,
Bob

Vladimir Alexiev

unread,
Oct 19, 2016, 12:22:23 PM10/19/16
to Getty Vocabularies as Linked Open Data
I have downloaded from the link listed in the page: http://vocab.getty.edu/dataset/tgn/explicit.zip. Guess you can call that "the backend".
Try with a proper extractor, eg WinRAR.

Bob St. Clair

unread,
Oct 19, 2016, 12:47:32 PM10/19/16
to Getty Vocabularies as Linked Open Data
Hi Vladimir,
Thanks - that worked. I'll let Satya Nadella know his extractor is "improper" and maybe he can fix it in the next version of Windows.

Vladimir Alexiev

unread,
Nov 7, 2016, 10:06:19 AM11/7/16
to Getty Vocabularies as Linked Open Data
> his extractor is "improper"

See what the British Library says https://data.bl.uk/lodbnb/bnb1.html:
Due to the extreme sizes of some of the zip archives (often well in excess of 4GB), we recommend using a dedicated zip archive application, such as 7-zip, to open and extract these datasets.
For example, the built-in zip archive handling in Microsoft Windows (ie Right-click to 'Extract Here') is not designed to handle these sizes and will throw errors, even suggesting (falsely) that the archive is corrupt.
Reply all
Reply to author
Forward
0 new messages