Applying Computer Vision to Railcar Detection

railcar station

Application of Object Detection using Orbital Insight GO 

By Courtney Layman

At Orbital Insight, we have built a platform called GO that combines remote sensing data with machine learning. It allows users to select any location in the world and analyze activity over time using our built-in computer vision and geolocation algorithms.

Our computer vision team here at Orbital Insight is focused on building algorithms to detect different objects in remote sensing imagery across the world. For example, in a recent project, we used high-resolution satellite imagery from Planet, provider of global, daily data and insights about Earth, to build a railcar detector. This model was requested by one of our customers and is intended to provide information about economic activity across different countries in combination with a ship and aircraft detector.

This project came with exciting challenges, such as highly clustered objects, limited training data, and an indistinguishable appearance from other overhead objects. Since SkySat imagery is optical, we also had to account for different weather patterns, atmospheric conditions, and shadows in our images. Despite these natural phenomena, we were able to work around these obstacles and develop a model that accurately predicted 74% of the railcars in our test set.

Rail car detection

The image on the left shows our ground truth polygons at a rail yard in Brooklyn, Illinois. The image on the right shows our model’s detections at this rail yard. There were 150 true positives, 6 false positives, and 6 false negatives detected.


The railcar detector was trained on Planet's high-resolution SkySat imagery, which is available at 50cm resolution globally. We used the red, green, and blue bands from their ortho visual product, which is pansharpened and orthorectified. We only used images after April 2020 where the assets were modified to a pixel resolution of 0.5 meters. The ground sampling distance (GSD) in these images ranges anywhere from 0.5 meters to 0.85 meters depending on the off-nadir angle.

We generated our training data in-house using a global dataset of rail yard locations as our starting point. Polygons were drawn around each railcar and labels were separated into 3 categories: passenger, flatcar, and railcar. Though there are more types of railcars than this, including locomotives, open toppers, and tank/coil cars, annotators found that it was difficult to distinguish between these railcar types in SkySat imagery, so we grouped them into the general “railcar” category.


The Tensorflow Object Detection API was used to train the model. We had some flexibility in determining whether to treat this as a keypoint or bounding box detection model, so we experimented with both CenterNet and Mask R-CNN algorithms. In the end, we had better results with Mask-RCNN and the rotated bounding boxes output provided more flexibility for different use cases, so this was our final model. Due to high-class imbalance, the model had trouble distinguishing between different railcar types (flatcars, passenger cars, etc.), so we ended up treating this as a single class problem.


Railcars tend to be highly clustered at rail yards, so this proved a challenge as some of the highly clustered scenes could take over an hour to annotate. Our dataset consisted of 30,000 railcars in 335 SkySat scenes. Some of these image scenes showed the same rail yard at different times of the year, so our dataset only covered 141 distinct rail yards. Though our dataset had a lot of railcars, there were not many images and geographical diversity.

Rail car detection

This image shows an annotated rail yard in Budapest, Hungary that contains 848 railcars. Railcars are annotated in green and flatcars are annotated in red.

When using optical imagery for computer vision, the weather is always an obstacle. Objects tend to blend together when the atmosphere is full of water vapor and other particles, which makes it difficult for models to pick them out.

Shadows and haze made it hard to differentiate railcars from tracks in some images in our dataset. Even with high-resolution imagery, it was difficult to determine the division between railcars in certain images. If a human eye cannot make out the railcar, then a computer vision algorithm will also likely have trouble.


These images show a rail yard in Livorno, Italy with and without annotations. Railcars are annotated in green and flatcars are annotated in red. It is difficult to make out the railcar boundaries in these images.

Another challenge with railcars is that they don’t have a distinctive shape or color when compared with many other overhead objects. As trains are frequently used to transport materials, containers and long haul trucks are often located within rail yards and these look very similar to railcars at 0.5 meter resolution, leading to false positives.


The image on the left shows our ground truth polygons at a rail yard in Armenia. The image on the right shows our model’s detections at this rail yard with several false positives of long-haul trucks detected.


To deal with the challenges of clustered objects, weather obstacles, and false positives, we ran three label campaigns to obtain more training data and increase the diversity of the dataset over the course of the project. Each label campaign provided a significant boost on model performance with a 3-5 point jump in F1-score with each additional campaign.

With low dataset diversity, it is critical not only to obtain as much training data as possible, but also to make sure that image metadata, railcar dimensions, weather, and seasonality parameters are spread evenly across train, validation, and test sets. We split the dataset by rail yard rather than image scene to make sure rail yards weren’t split across sets, which would have led to data leakage.

Due to our image tiling scheme, the images in our dataset had a consistent height, but variable width depending on how far they were from the equator. Varying image sizes can be a problem since computer vision algorithms will resize images in a batch to the same size. With anchor-based models such as Mask R-CNN, the resize can cause objects in the image to emerge much larger or smaller depending on the original size of the image, which means the specified anchor boxes may no longer fit the resized objects. Because of this, we used a grid-based partitioning method to crop the scenes into equal-sized chips in terms of the number of pixels, so all of our images and objects maintained consistent size.

Through several label campaigns, many model tuning experiments, and innovative ideas from the computer vision team at Orbital Insight, the railcar detector was able to reach an F1-score of 0.755 on the test set. Imagine what the Orbital Insight team could detect for you - airplanes, vehicles, tanks, etc.!

To see how this model performs in the wild, please visit our GO platform and book a personalized demo!