Andromeda 2020

0 views

Skip to first unread message

Cloris Sopha

unread,

Aug 3, 2024, 11:08:05 AM8/3/24

to jeyloutygec

I also commented out the line self.vocab = self.doctags in _upconvert_old_d2vkv, because the very next line calls a function which destroys self.vocab, whereas this line just triggers a NotImplementedError in attempting to reference self.vocab, which should render the assignment moot.

A screen (preceding any human interviews) via Triplebyte, which presented short blocks of code and asked multiple-choice questions about their outputs, flaws, et cetera. Questions were timed. Googling, documentation, REPLs, et cetera were disallowed.

I was worried this might be like Triplebyte, but no; they presented me with a description of a function they wanted to implement. This function was a simplified version of something they encounter in their day-to-day work, and they took the time to explain the context. The description included sample input and expected output.

Once upon a time in 2017, my colleague Andy Dorner, who is awesome at devops, made a magical deploy script for HAMLET. I was like, ugh, Heroku is not a good solution, I have to AWS, I regret my life choices, and he was like, never fear! I will throw together a script which will make it all test/build/deploy every time you git push, using magic.

(This is the linear, simplified version. In practice, picture two months of spare time flailing, git pushing to redeploy and find out what broke, leaving an absolute wreck of a git history, which it kills me not to clean up, but which I am leaving so that you feel less alone in your own desperate sallies against AWS.)

To my great shock, the sample application that Amazon provides for your brand-new Python instance is a pleasant-looking page that links to AWS Python and Django documentation. It was weirdly hospitable.

The most helpful thing here was connecting directly to the EC2 instance (you can do this with eb ssh but I just used the web console) and hand-running commands to see what happened, rather than making an .ebextensions or .platform/hooks change, deploying it, and then hunting through logs. This was also particularly helpful for dealing with packages issues; instructions for various install processes usually reference apt-get and Debian/Ubuntu, but I have yum and AL2, and packages often have slightly different names, and oy, just logging into the instance and doing some yum list is so much easier than guessing and hoping.

I needed to repoint my CNAME record for hamlet.andromedayelton.com from my old Beanstalk environment to the new one. The cryptic URL is clearly visible in the AWS web console for Beanstalk, and the hardest part of this was remembering where my CNAME record lived.

When last we met I was turning a perfectly innocent neural net into a terribly ineffective one, in an attempt to get it to be better at face recognition in archival photos. I was also (what cultural heritage technology experience would be complete without this?) being foiled by metadata.

What if I train yet another ML system to do some kind of k-means clustering on the embeddings? This is both a promising approach and some really first-rate yak-shaving, combining all the question-begging concerns of the previous paragraph with all the crystalline clarity of black box ML.

Sadly, because we cannot have nice things, the data sets used for pretrained face recognition embeddings are things like lots of modern photos of celebrities, a corpus which wildly underrepresents 1) archival photos and 2) Black people. So the results of the face recognition process are not all that great.

Gotcha 1: If you fetch a page from the API and assume you can treat its contents as an image, you will be sad. You have to treat them as a raw data stream and interpret that as an image, thusly:

Step 4: I realized that, honestly, almost everything in nst_utils wanted to be an ImageUtility, which was initialized with metadata about the content and style files (height, width, channels, paths), and carried the globals (shudder) originally in nst_utils as class data. This meant that my new dpla_cats script only had to import ImageUtility rather than * (from X import * is, of course, deeply unnerving), and that utility could pingpong around knowing how to do the things it knew how to do, whenever I needed to interact with image-y functions (like creating a generated image or saving outputs) rather than neural-net-ish stuff. Everything in nst_utils that properly belonged in an ImageUtility got moved, step by step, into that class; I think one or two functions remained, and they got moved into the main script.

content is the path to the content image; style is the path to the style image; iterations and learning_rate are the usual; layer_weights is the value of STYLE_LAYERS in the original code, i.e. how much to weight each layer; run_until_steady is a bad API because it means to ignore the value of the iterations parameter and instead run until there is no longer significant change in cost; and noisy_start is whether to use the content image plus static as the first input or just the plain content image.

Step 6a: Correspondingly, I configured the output filenames to record some of the metadata used to create the image (content, style, layer_weights), to make it easier to keep track of which images came from which script runs.