2020.01.26

Autoencoder


Source.

I did some more Keras dabbling since there was cold weather and time off. I continued down the autoencoder thread that I'd explored a bit previously.

To recap




Source.

Autoencoders are primarily used to perform image denoising (and as ML examples). AE compression does not stand up to mathematical algorithms, but neural networks can learn graphical features and draw them into an image. An autoencoder can also generate/repair images by tweaking the 'latent space' which is basically the network's abstract idea of what is in the image.


I tried to draw my best understanding of a convolutional autoencoder on a cocktail napkin (with oversimplifications described later). An autoencoder might learn what a type of ball looks like and use that information to redraw it, given partial or complete information. The latent layer amounts to a classification or set of classifications.

The encoding portion consists of convolutional 'kernels' (m-by-n filters) that get dragged over the image and produce a 'feature map' that is basically how the image is seen by the kernel's understanding of the world. Convolution is necessary to see features in a position-independent way.

The latent space reads all of these filters and decides what is in the image, kernels that don't recognize anything will not be as loud as ones that do. The decoder then takes this information and redraws the image using transpose convolution (the inverse of the encoding portion). By comparing the input and output, the network can learn without classification or additional supervision.

The above drawing uses kernels that see an entire ball. In practice, they only recognize certain portions of each object. By having many layers (left to right), the interpretation of the image gets more abstract.

Troubleshooting



I don't quite remember where I left off with my previous attempt, but in giving it another go, I hit something that felt familiar: the network would produce a flat gray output image that seemed to be shooting the median pixel value for all of the training data. So I was starting from scratch.

I don't think using rms loss was my issue; this is a pretty standard measure of error for networks that produce images. Learning method matters, so I switched between adagrad and rms prop.

Most autoencoder examples use images no larger than ~120 pixels per side. This makes sense for a lot of applications (e.g. the standard MNIST number recognizer) and certainly cuts down on training time (especially relevant since most available code is sample code). My original goal was a bit higher: 512 squares with larger convolutional kernels. I was aiming to have a network with 40k-1M weights, and perhaps this simply isn't enough to handle such large input. Then again, the point of convolution is to be highly independent of input dimensions, so why would a 64x64 input be any worse than four 32x32s? Among other things, the answer is that each kernel produces a same-size feature map and that quicky eats up available memory.
m = Sequential()
# 512x512
m.add(layers.Conv2D(8, (12, 12), activation='relu', padding='same', input_shape=input_shape))
m.add(layers.GaussianNoise(0.1))
m.add(layers.Conv2D(8, kernel_size=(8, 8), padding='same'))
m.add(layers.SpatialDropout2D(0.05))
m.add(layers.Dense(input_channels, activation='relu', kernel_initializer='glorot_uniform'))
m.add(layers.MaxPooling2D(2, 2))

# 256x256
m.add(layers.Conv2D(8, kernel_size=(6, 6), padding='same'))
m.add(layers.Conv2D(16, kernel_size=(4, 4), padding='same'))
m.add(layers.SpatialDropout2D(0.04))
m.add(layers.Dense(input_channels, kernel_initializer='glorot_uniform'))
m.add(layers.MaxPooling2D(2, 2))

# 128x128
m.add(layers.Conv2D(8, kernel_size=(4, 4), padding='same'))
m.add(layers.Conv2D(16, kernel_size=(3, 3), padding='same'))
m.add(layers.SpatialDropout2D(0.03))
m.add(layers.Dense(input_channels, activation='relu', kernel_initializer='glorot_uniform'))
m.add(layers.MaxPooling2D(2, 2))

# 64x64
m.add(layers.Conv2D(8, kernel_size=(3, 3), padding='same'))
m.add(layers.Conv2D(8, kernel_size=(2, 2), padding='same'))
m.add(layers.SpatialDropout2D(0.02))
m.add(layers.Dense(input_channels, activation='relu', kernel_initializer='glorot_uniform'))

# Flatten to latent space
m.add(layers.Flatten())

# Reshape to 64x64x(L or RGB), transpose convolution will then upscale.
m.add(layers.Reshape((64, 64, output_channels)))

# 64x64
m.add(layers.Conv2DTranspose(64, (1, 1), strides=(2, 2), activation='relu', kernel_initializer='glorot_uniform'))
m.add(layers.BatchNormalization())
m.add(layers.Dense(output_channels, activation='relu', kernel_initializer='glorot_uniform'))

# 128x128
m.add(layers.Conv2DTranspose(64, (4, 4), strides=(2, 2), activation='relu', kernel_initializer='glorot_uniform'))      
m.add(layers.Cropping2D(((1, 1),(1, 1))))
m.add(layers.SpatialDropout2D(0.02))
m.add(layers.BatchNormalization())
m.add(layers.Dense(output_channels, activation='relu', kernel_initializer='glorot_uniform'))

# 256x256
m.add(layers.Conv2DTranspose(64, (4, 4), strides=(2, 2), activation='relu', kernel_initializer='glorot_uniform'))      
m.add(layers.Cropping2D(((1, 1),(1, 1))))
m.add(layers.SpatialDropout2D(0.02))
m.add(layers.BatchNormalization())
m.add(layers.Dense(output_channels, activation='relu', kernel_initializer='glorot_uniform'))

# 512x512
m.add(layers.Conv2DTranspose(64, (1, 1), strides=(1, 1), activation='relu', kernel_initializer='glorot_uniform'))      
m.add(layers.BatchNormalization())

m.add(layers.Dense(output_channels, name='output'))
Proceeding, for a time, with that image size naturally meant smaller training batches. I had hoped to address this simply by doing overnight runs, but after hours my output was still monotone gray and occasionally a checkerboard. I was expecting to see something resembling progress (e.g. blotchy noise) early on.

I pingponged between sample code. Everyone seems to do things a little different - except for the people who just copy and paste the official Keras example and call it their own. It really doesn't help that so many examples are hardwired to use existing models or input data. Swapping snippets isn't hard, it just takes time to get everything rewired for a given method of doing input/batching/display/normalization.

I eventually dialed it down to 32x32 so I could hasten the trial and error process. Previous attempts used a bunch of noise/dropout layers that are generally good practice, but I was concerned some of these might be cranked up too high. Certainly upon re-reading SpatialDropout2D, I realized that cutting an entire feature map from a layer with eight kernels might be a bit heavy-handed. I had quite a few batch normalizers in there as well.
Layer (type)                 Output Shape              Param #
=================================================================
conv2d (Conv2D)              (None, 32, 32, 16)        784
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 32, 32, 16)        1040
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 32, 32, 16)        0
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 32, 32, 16)        2320
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 32, 32, 16)        1040
_________________________________________________________________
dropout (Dropout)            (None, 32, 32, 16)        0
_________________________________________________________________
batch_normalization (BatchNo (None, 32, 32, 16)        64
_________________________________________________________________
conv2d_transpose (Conv2DTran (None, 35, 35, 32)        8224
_________________________________________________________________
cropping2d (Cropping2D)      (None, 33, 33, 32)        0
_________________________________________________________________
batch_normalization_1 (Batch (None, 33, 33, 32)        128
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 36, 36, 32)        16416
_________________________________________________________________
cropping2d_1 (Cropping2D)    (None, 32, 32, 32)        0
_________________________________________________________________
batch_normalization_2 (Batch (None, 32, 32, 32)        128
_________________________________________________________________
dense (Dense)                (None, 32, 32, 512)       16896
_________________________________________________________________
dense_1 (Dense)              (None, 32, 32, 3)         1539
=================================================================
Total params: 48,579
Trainable params: 48,419
Non-trainable params: 160
I wasn't sure about the latent layer so I removed that, having seen a number of examples that simply went from convolution to transpose convolution.

What I really noodled on was the output layer(s). The decoder portion of the network is a bunch of transpose convolutional layers that take abstract information and redraw the features that were abstractized by the encoder. So as the decoder traces out a bunch of feature maps from latent space (and maybe enlarges it), you'll end up with a bunch of semi-images that must be recombined. This seems like a crucial step and all of the examples I saw seemed to have a different approach but not much in the way of an explanation of how.

A common approach is an n-channel dense or convolutional layer that takes all the feature maps (width, height, channels, kernels) and spits out an image. This makes sense - the layer uses each kernel's output to decide whether or not to fire. I have however, been wary of the model summaries that show dense layers having not a lot of weights, particularly when troubleshooting an autoencoder that produces no detail.

I found another implementation that used a single-kernel (per channel) 3x3 convolutional layer. While this makes sense for output shape, it seems like you're producing an output image by dragging a 3x3 conv box across a bunch of feature maps. This feels like a massive dumbing down of the elaborately-constructed data.
m.add(layers.Conv2DTranspose(32, (4, 4), strides=(1, 1), activation='relu', kernel_initializer='glorot_uniform'))
m.add(layers.Cropping2D(((2, 2),(2, 2))))
m.add(layers.BatchNormalization())
m.add(layers.Dense(512, activation='relu', kernel_initializer='glorot_uniform'))
m.add(layers.Dense(output_channels, activation='linear', kernel_initializer='glorot_uniform'))
What I settled on for the back end of the transpose convolutional block was a dense layer with a lot of units. Typically you'll see units = 3 for RGB and 1 for monochrome, since the dimensionality of the output space is a tensor or whatever. I'm sure this is the right approach for many cases and takes advantage of the magic of a dense (weight-shared, not fully-connected) layer, but I was worried about bottlenecks and concerned by the number of parameters associated with the dense(3) layer.

More precisely, I used dense(units=a lot, relu) followed by dense(channels, linear). The first, large, relu layer was meant to combine all of the transpose convolutional feature maps in a manner that allowed a large number of knobs to be turned by the magic of neuroscience, training, and cuda. Then an output-size final layer to decide how to interpret the mess of activations before it. Linear activation is scary since it's combining a bunch of inputs, but makes sense for a 0-255 output.

Trying it out




I have a few data sets to choose from:
Using my Java graphics library, I wrote an image sampler and set it upon my Horizon: Zero Dawn screencap directory to produce 1000ish 32x32 squares, it started noisy then got better and better:

40/40 [==============================] - 66s 2s/step - loss: 0.0131 - val_loss: 0.0096
Epoch 12/16
40/40 [==============================] - 66s 2s/step - loss: 0.0132 - val_loss: 0.0096
Epoch 13/16
40/40 [==============================] - 65s 2s/step - loss: 0.0132 - val_loss: 0.0096
Epoch 14/16
40/40 [==============================] - 65s 2s/step - loss: 0.0131 - val_loss: 0.0096
Epoch 15/16
40/40 [==============================] - 67s 2s/step - loss: 0.0131 - val_loss: 0.0095
Epoch 16/16
40/40 [==============================] - 66s 2s/step - loss: 0.0130 - val_loss: 0.0095
After an hour or so with a batch size of 160, I was getting down into the 0.00x loss territory. In retrospect, this was despite the fact that my input/output training sets were being independently rotated/mirrored. Whoops. Regardless, output from these small pieces started looking like input, but neither really looked like anything since they were 1/2000th of an image.


So the next thing was to write a script to apply the autoencoder to an entire image, square-by-square. Not only would this reconstruct something that actually looked like a thing, but it'd be a good test of if the autoencoder was overfitting the training set. The output looked way better than a flat gray png.


The square-by-square application left obvious borders in the decoded tiles. This was likely because edges of each tile have less data than the middle portions. The simple solution was to modify the stitcher to crop the few boundary pixels and change the step size to remove the gaps. The right way to do this, however, is to have a 32x32 autoencoder only produce a 24x24 output, thereby saving on network size and computation.

Next steps




There are a few things to try next:

Other people's (neural) fails




My phone automatically tags images (thanks). Pretty much everything on it is a memebank (because using gif services is cheating). This has resulted in a few awesome fails. Apparently weapons look a lot like musical instruments and a heart looks like a fruit. Stormtroopers are basically mannequins - we all knew that anyway - and jazz addicts use their (jazz?) hands.


On this flip side, I could go for a sculpture of Christian Pulisic riding a dragon.


Fox is clearly using a deepfake merge of Donald Trump and Rudy Giuliani for its contributors. Nice try.


Perhaps I simply don't appreciate it, but it seems like some of these neural algorithms are pretty overhyped. We've already covered neural style transfer, though I kind of like that one. I ran the default example of Deep Dream; it's... a crappy kaleidoscope. And both kind of cheat by modifying the original image with an activated style layer - escaping the problems with generating a wholly new image and making the application of their algorithm extremely slow.
Tool


Lee scored some tickets to Tool earlier this month. It was a great show. They were strict about use of cellphones (which was kind of nice) until the encore. I grabbed a couple shots and then went back to enjoying it.



Gloomhaven


The Unnatural Ones' list of exploits keeps growing.

GBES


The Society explored the Miralani Makers District this month.


Thunderhawk Alements was standard Miramar fare, Serpentine Cider was - well - cider. I wasn't especially fond of most of the sakes at Setting Sun, but the place had style.


I happened upon a poster and wanted to know more, Katherina Michael's Amazon review.
Ben Adams is a surfer who finds himself responsible for neutralizing a terrorist threat. Well, obviously he’s not just any surfer… he’s a former FBI agent and also felon, who’s now retired and living a life of surfing and drinking beers on the beach. Trying to stay off the radar, he keeps to himself, with few — if any — friends, and a loyal dog. But the FBI has some leverage from his past, and has decided that his recklessness is a special talent, so has forced him to go undercover on an insane mission.

If he doesn’t comply, ISIS may get a critical nuclear weapon trigger and Ben would certainly go back to jail. But doing the job will involve high-speeds, high-stakes, and Ben’s own high-jinks thrown into the mix.

Late Apex is the third novel in Jeremy DeConcini’s Ben Adams trilogy. It’s an action-packed and fast paced page-turner in its own right — you don’t need to have read the first two novels to dive right into this one — with a rough and tumble hero you’ll love for his irreverence.

Thanks to his work as a Special Agent in the Department of Homeland Security, DeConcini’s experiences and personal political opinions permeate his plot and convince the reader that Late Apex’s storyline is truly plausible. All this to say… this highly enjoyable and suspenseful read seems realistic enough, even as we cringe at the American negligence.
Motorcycle, dog, fallout zone, racing phrase, surfer, FBI... it's like a perfect storm of awesome things. I don't think I'm going to read it though, it sounds like it's trying to be too awesome and can't possibly make it work.
Travel


What's better than an east coast trip during an impeachment trial?


You thought that was rhetorical. Answer: finding a goatse-inspired lamp. If you don't know what that is, don't look it up.


This trip included stops at Dog Money brewery (great graphic design, not great beer), Vanish (good beer, great spot out in the boonies), the Potomac locks, and some other places. J and me finally vanquished Wotan thanks in small part to perfected builds and in large part to the DLC being scaled to player count.
Karakin


It's here, the new speed map. Complete with IEDs, rocket strikes, and...


... RPGs???


Also I perfected the genius tactic of spike-stripping a door. Then camping it. The spikes are mostly for entertainment.


2020.01.04

Pako - Car Chase Simulator


Cattle's xmas Steam gift this year was Pako - Car Chase Simulator. It had me pretty well captivated for my first two-hour session. I think I worked out a little bit of strategy:


Anyway, it's a fun arcadey experience.
Pony Island


And Pokey got everyone Pony Island. I expected a rainbow sidescroller or maybe a weird rpg.


I won't spoil it, but the name is pretty misleading. It's a really cool puzzle game like Myst with a Lost (tv show) vibe - this is something you find out almost immediately so I promise I haven't spoiled it.


Oh yeah, and part of it is basically a debugger, so that's fun.
Post-Scriptum


Some of the PUBG crew bought in-development Post-Scriptum about a year ago. It was on the shelf for a while, but Cattle and I decided to give it another go. It doesn't seem to have changed much - it still has all of the complex, deliberate gameplay that is its hallmark. Of course this makes the game hard to play casually and it's hard to go deep into a game you're not sure you're going to continue playing.

I (calmly) ragequit the last session after 60 minutes, a dozen deaths, and zero enemies seen. Unless I'm missing something, it feels like the one sniper you're allowed to have per squad is just super OP. And if that's historically accurate, then it fits the game. But other than a few iconic scenes from the silver screen, I don't remember having seen or read about markmen being decisive tactical elements in WWII. Perhaps that has to do with the sheer numbers involved in actual battls vs 40/side Post-Scriptum.

So at the end of the day, all of the amazing graphics and team mechanics are - for me - defeated by the fact that as soon as you get near the front, you'll be shot long before you see an enemy infantrymen. PS is still(!) in (active?) development, but at this point if I want tactical MMOFPS action, I'll fire up Planetside, it has half the depth but infinitely more approachability.
PUBG


With increased stability and finally MMR (temporarily), we gone back to PUBG over the holidays and had a good time doing it.



Cattle and I had a very Silver League shootout. To narrate the above:

Anyway, Cattle requested a video, so it was a good time to try the updated (months ago) PUBG video editor.

Previously you could just watch replays and had to do a screen record, this actually has cameras and video output. That really opens up the editing options, where previously you needed a stationary camera or numerous tries to get a moving camera (with no pan).

The key frame settings are buggy, so you frequently have cut and retry key frames. It takes an hour or so to figure out how to work around the issues.

Export creates a strangely-encoded (Firefox hates it, Chrome plays it) webm with no mpeg or avi option. I never had sound on any of mine and found nothing when searching for solutions. I'm not sure if I have something wonky going on or if everyone just sets their videos to music. My workaround was to do a screen record and add the audio as a track for my webm converted to mp4. This, of course, means timing the audio start and not using slow mo: not fun.

The other major weakness is that you can't jump back in time. While doing so can be confusing, but if it's done right it can be really neat. If you're going to jump between perspectives, it's good to rewind a couple seconds so the viewer can track the camera change. And as can be vaguely derived from the video below, sometimes there are two cool paths to follow. I knew from memory that we had a neat buggy vs motorbike race to the island, but didn't realize there was a Kenny Powers-style jump from one of our adversaries. I could have generated two videos and cut them in, but I'd rather have multiple timelines and bounce back to the chase after watching that majestic bike jump.


The Hacienda Heist


The new meme strat is to drop into Hacienda to nab the gold camaro (mirado). After one incidend where I ran Cattle over, we decided it's best to have one person grab the car while someone else loots a cold warehouse on the outskirts. You still end up having to drive through a town where people are already looted up, so it's perilous. But meme strats aren't meant to be easy.


The old meme strat is to go to a Vikendi church and pray to RNGesus