Our Neural Net Dreams of Cars

One of the technologies we've built at Orbital Insight is a state-of-the-art car detector. We use this detector to process 50 centimeter satellite imagery and extract insights like consumer shopping behavior—for more on that, see our earlier blog post. Under the hood, our car detector is implemented using a convolutional neural network, or "conv net" for short, that we've trained on thousands of manually marked cars.


An example of imagery we process. The sequence shows several images of the same parking lot. The images are 50 centimeter resolution, panchromatic band. Courtesy of DigitalGlobe and Airbus.

Training neural networks can often be as much an art as it is a science. Having the ability to debug the training process and understand what features the network learns to fire on is critically important. The most straightforward way to visualize the internal structure of a conv net is to look at the filters themselves—if you discover "dead neurons" or noisy filters, these can be symptoms of problems in your hyperparameter settings.

More recently, another set of methods to visualize this internal structure of a neural network have become popular. These techniques sometimes go by the name of "DeepDream" or "Inceptionism", and after a blog post by Google on the topic, have permeated not only academic circles but the broader internet culture as well (there is even a Reddit community dedicated to sharing these visualizations). The technique boils down to an optimization problem where you find an input image that produces as large of a response from a particular neuron in the network as possible[1]. Just like the neural net training process, this optimization is solved via gradient descent and backpropagation, except instead of optimizing the parameters of the neural net, we optimize the input image itself. To initialize the procedure, you can either use an image of your choice, or generate random noise.

We regularly use these techniques to analyze our conv nets, and today we wanted to share some visualizations of our car detection conv net:


These images were generated with random initializations, and they're produced by maximizing the response of the "car" neuron. You might notice that this conv net hallucinates not only a car in the center of the receptive field, but its neighbors as well—this is because this particular model was optimized for counting cars in parking lots and was trained on a representative dataset (i.e. cars in parking lots, which often have neighbors).

We're hiring engineers to help us solve the next set of challenges in this space, please reach out!

[1] In reality the procedure ends up being a little more complicated; if you follow the naive recipe, you'll end up with something that just looks like noise. To produce "nice" looking images, some sort of prior needs to be factored into the optimization as well.