Leveraging Deep Learning for Vehicle Detection And Classification

Leveraging deep learning

Leveraging Deep Learning for Vehicle Detection And Classification

By Wenchen Wu

If you work in defense intelligence, traffic monitoring, surveillance and reconnaissance areas, you may have already encountered a vehicle detection use case. As awareness about geospatial intelligence grows, computer vision is making inroads into the commercial sector. Today vehicle detection can be found in industries from retail, healthcare, tourism, and real estate. Whether it is counting cars in the parking lots for demand forecasting, supply chain monitoring for economic activity analysis, or urban planning; object detection has become a core enabling technology. In this blog, we discuss the mechanics of object detection for vehicles, explore novel techniques for modeling, and address common challenges.

Orbital Insight GO offers several vehicle detection algorithms available and widely used by our customers. Our computer vision scientists are researching many more advanced capabilities.

For example, the current vehicle detection suite is a point detector. While this is sufficient for vehicle count and understanding their spatial-temporal distributions, it lacks details about the individual vehicle. With the increasing spatial resolution of satellite imagery, it has become possible to develop detectors that provide better descriptions of individual vehicles. This modeling effort would thus set the front work for developing fine-grained vehicle detections anticipating that the spatial resolution of satellite imagery will continue to improve over time.

Point VS Segmentation Detector

Illustration of detection results: point vs. instance segmentation detectors

With these in mind, our goal is to expand our existing vehicle detection suite in two ways: including new Imagery and offering detectors with more descriptive results. The former allows us to leverage the additionally available imagery for persistent monitoring of areas. The latter enables descriptive vehicle detection outputs such as vehicle length, orientation, etc., for more detailed geospatial intelligence gathering.

As a result, our initial algorithm is a two-class object detection model that detects and differentiates cars and trucks. One purpose of developing this algorithm was to help monitor activity around various parking regions such as those near retail stores, manufacturing facilities, ports, etc., which can, in turn, provide insight into the foot traffic and economic implications across various regions on Earth. Another objective is to automate the process of monitoring tens of thousands of sites daily for detecting abnormal and suspicious events.


The vehicle detector can be trained on globally available satellite sensors such as Airbus and Planet. Our initial focus is on stationary vehicle detection at parking areas. Hence, we select scenes through an automated method utilizing metadata from maps, weather conditions, imaging parameters, etc., to find appropriate and available Skysat images that contain sufficient parking areas. Unlike the data collection for typical object detections, data collection for vehicle detection has a few unique aspects:

  • Representative sampling across various object densities in a region (sparse vs. crowded lots) is more important than sampling across different geospatial locations: At current imaging resolution, vehicles are very small. The appearance of typical vehicles does not vary much across different geo-locations at such low resolution. Hence the need for a variety of object densities in the samples may outweigh the need for a variety of objects from different geo-locations.
  • Representative sampling across contextual/scene variety is critical: Unlike large object detection, where objects can be recognized with little help of scene context, vehicle detection requires more contextual information. For example, a parked car or a rectangular structure on a building may look similar in a satellite image. In such scenarios, knowing the object is in a parking area or on a building (contextual information) can make the difference between a successful detection and an erroneous one.

These fine differences affect our data collection and data split strategies compared to other object detection work.

After sufficient scenes and images were collected, we labeled parking areas and vehicles in these areas in-house. An example of a labeled image is shown below (red = parking areas; green = trucks; blue = cars).

Vehicle labeling

Illustration of our vehicle labeling on a satellite image of a rail yard in Los Angeles.

As shown above, instance segmentation requires labeling each object with a polygon rather than a point. This implies a 4~5x increase in resources for our labeling task compared to prior modeling of vehicle detections. To alleviate this extra labeling cost, we only labeled the vehicles at parking areas. Since we only partially labeled the images (only parking regions) to save time, special care is needed in our modeling to address such incomplete labeling. Details on how we address these challenges will be discussed later. 

For data split, we break images into non-overlapping patches and group them based on their unique geolocation. We then split them into training, validation, and test sets based on random sampling on groups to ensure no data leakage in the data split. Some examples of images from our dataset are shown below:

Low occupancy lots

Images with low occupancy lots

Truck lots

Images with truck parking lots

High occupancy lots

Images with high occupancy lots


PyTorch: Detectron2 was used to train various models. We experimented with various networks and eventually used Mask-RCNN for the detection. 


Due to the limited spatial resolution in satellite imagery, vehicle detection is a tiny-object detection task. A tiny object has more limited intrinsic features that can be used by models to recognize it. At this resolution, many man-made objects may resemble the appearance of a vehicle. Hence it is harder to differentiate vehicles from other similar size objects without contextual information. Another challenge is that the density of vehicles in an image can vary dramatically. When vehicles are tightly packed or sparsely distributed in a region, model training behaves differently. Therefore, novel model settings and hyperparameter tuning are necessary. Finally, without exception, we also encountered common challenges in deep learning such as the need for large amounts of labeled data; variations in atmospheric conditions and lighting conditions; data imbalance; variations in scenery (contextual differences), etc. 


As part of the development of this model, we explored several novel techniques in modeling as well as in data labeling to address the challenges. 

From the modeling perspective, to overcome the issue of detecting tiny and potentially densely clustered objects, we experimented with various hyper-parameter tuning and upsampling techniques. We found that image upsampling, parameter tuning on those related to Region Proposal Network, and transfer learning are the three most effective factors in our modeling. To deal with partial labeling in our dataset (no labels outside of parking areas), we applied pseudo-labeling techniques to non-parking regions. This helps regularize the learning and select a better model for generalization to non-parking areas. 

From the data labeling perspective, to enable efficient labeling and robust learning on tiny object detection, we developed a few tools/algorithms to address these. For example, a large portion of the labeling efforts was spent on regions with high vehicle occupancy. These objects tend to have similar sizes and are located in an organized fashion. Utilizing this observation, we developed an efficient tool that turns a polygon labeling of a row of vehicles into one-dimension labeling. This speeds up our labeling drastically and enables us to improve the performance in detecting densely packed vehicles in areas such as car manufacturers' sites. For another example, we introduced a human-in-the-loop pseudo-labeling tool, where humans label each vehicle with a point, while the pseudo-labeling model refines it to a rotated bounding box representing that vehicle. 

Through several label campaigns and with the assistance of our additional vehicle labeling suites, many model tuning experiments, and innovative ideas from the computer vision team at Orbital Insight, our vehicle detector was able to achieve F1-scores of 0.80, 0.76, and 0.85 for car, truck, and vehicle detections, respectively. Our ultimate goal for vehicle detection is to achieve fine-grained classification for all sorts of vehicles. We are not done yet but are off to a good start. Imagine what the Orbital Insight team could detect for you - Cars, Trucks, Tanks, etc.!

Below are some visualizations to highlight the strength and limitations of our current model ( red = marked parking regions; blue = cars; green = trucks). Since ground-truth is only available within the parking regions, the model is only expected to perform well in parking regions; but we see promising results for non-parking areas.

Ground truth polygon

These images show that our vehicle detector detected most cars in low occupancy lots with some false positives from shadow-like blobs. Although the model was trained solely from labeled vehicles within the parking regions, it still detected most vehicles outside the parking areas.

Ground truth polygon set 2

These images show that our vehicle detector detected most cars in low/medium occupancy lots. There are a few false positives from structures on the rooftop and bright blobs in parking areas. We expect to improve false positives from the inclusion of better contextual information and more training samples.

Ground truth polygon set 3

These images show that our vehicle detector detected most cars in medium/high occupancy lots for imagery of the same region acquired at different times.

Ground truth polygon set 4

These images show that our vehicle detector detected most vehicles in the parking regions but had difficulty in differentiating smaller trucks from cars. Part of the confusion came from the label inconsistency since it is also hard for humans to differentiate them at this imaging resolution.

What could you learn with Orbital Insight's vehicle detection algorithms? Discover more about our platform here.