Tuesday, Aug 8: 8:50 AM - 9:05 AM
Metro Toronto Convention Centre
Automatic object detection in remote sensing imagery flags objects of interest in a scene. We are interested in multi-label classification of images in the FAIR1M benchmark dataset, which contains "ground truth" images labeled with 5 coarse and 37 fine hierarchical object classes summarizing scene content. We propose using a SwinV2 visual backbone that feeds into a transformer to produce multi-label taxonomic classification. The flexibility of deep learning models for computer vision makes it possible to identify targets that would otherwise be difficult to explicitly quantify. However, with naïve application, the classifications from a multi-task model may not uphold known hierarchical structure. Common approaches to acknowledge hierarchical structure of labels include adding penalties for inconsistent predictions in the cost function, using different features to predict coarse/fine classes, and completing classification tasks of different granularities at different depths in the model. We implement the proposed model with additional hierarchy aware modifications and compare to naïve flat classification.
Section on Statistics in Defense and National Security