Sorry for the slow update, I was busy having a relatively non-
weekend. Anyway, I've now placed the current version of my HOTU
rebuild online at http://hotu.pratyeka.org/
So I guess the major question is - what makes this one different?
Well, as well as making everything multilingual-friendly (no small
made URL-friendly 'codes' for each of the games, platforms, companies
people, which enables pretty / SEO friendly / human readable URLs.
The site uses AJAX/JSON to load and cache the lists of platforms,
caching, the memcache daemon and lighttpd to speed up load times and
make the database easily searchable via any aspect.
Gone are the multi-screen results of the previous site, if you search
a certain group of games they will all show immediately on one results
Data import so far includes everything from the Excel sheet, plus all
static text data and box/screenshots remaining on the swiss mirror
to reformat / fix old links, mostly normalise formatting, etc).
so far inaccessible, but I will add them ASAP.
During import I used regular expressions to further categorise game
in to various types to make them a little more useful / delineated.
we can also track which games have or don't have various types of
additional resources, so that we can more easily locate or write them.
The list is: map/patch/source/original/remake/tools/crack/demo/guide
includes walkthroughs, solutions, guides)/reference.
I also similarly categorised URLs which were clearly one of: reviews,
I've also added an experimental placeholder for DOSBox compatibility
information, in the future this will be auto-harvested from the DOSBox
page (data ripper only needs to be run once per DOSBox release,
compatibility information will auto-generate an appropriate DOSBox
version download link).
Ideally we could also add DOSBox configurations if required, there are
libraries of these out there already...
In testing I noticed that the boxshots archive linked to on the
of the group was definitely incomplete, so I ripped the entire swiss
(>800MB!) and then used unix tools to gather all of the jpg files and
them in one directory, which mostly fixed the problem.
There are still some missing box shots, which have filenames in the
Excel sheet but which were not preserved on the swiss mirror. I will
publish a list of the affected games should anyone want to go hunting.
Another problem was incorrect naming. In the Excel sheet, 'related
games' are sometimes not the exact, current names of the corresponding
games. Coupled with the fact that related game IDs are missing, this
is a bit of a pain. I've used a partial-match mechanism to resolve
of these, but there is still a shortlist of these I've detected that
still need to
be manually fixed.
Over the next few days I will try to find the time to write a crawler
through the mirrored mirror's files (!!) and extract 'real dog'
rating, # rating votes, company intros, and anything else that appears
to be missing from the other data sources. This will hopefully
the import. (I'm sure this won't be difficult, it's just a matter of
There are a few known issues: platform information is somewhat
corrupt due to the excel dump being textual (ie: "Windows X, Windows
vs "Windows X" vs "Windows Y" entires being a pain to break up).
Around five different game records have weird issues due to some
kind of parsing bug, which causes a few strange bits of data to
Finally, there is a problem with subgenre allocation so selecting a
subgenre only shows one game. These should all be fixed soon.
Please let me know what you all think about the rebuild so far, and
know if you can see any errors, since it's important to catch any
at the earliest possible stage (ie: before manual additions /
to the current database).
Also, I really liked Siddhartha's idea of considering the project as a
modern-day cultural archivism. It's probably fair to say that having
multilingual support is very important for the project if we are going
consider the goal of the project to be preserving and sharing cultural
Amusingly work on HOTU today was made possible by China's
'grave sweeping holiday' - a Confucian tradition about respect for
your ancestors. See http://en.wikipedia.org/wiki/Qingming_Festival
(one of the most famous ancient Chinese paintings, of the ancient
Kaifeng - with a prominent alcohol shop next to the bridge in the
OK, so that's it for now.
Stay well, be happy and respect your ancestors! :)