Million-AID

Million-AID: A Million Aerial Image Database for Scene Classification by Automatic Labeling

Million-AID

Introduction

The Million-AID dataset has 256×256 and 512×512 images, and its image resolution ranges from 0.5m-11.4m. It has three RGB spectra and is derived from multiple different remote sensing detectors. The images of the Million-AID dataset were all acquired using Google Earth. In order to make the data set more in line with the distribution of the scene in the real world, we get as much as possible global coordinate data of scene categories. In order to reflect the global distribution of images, we have calculated the geographic coordinates of all data blocks. Divide the world map into a large number of square areas, and count the number of data block coordinates falling in the square area. The following distribution map is obtained as shown in Figure 1.

global_distribution2

The global distribution

From this histogram, it can be seen that the image resolution is mainly concentrated at 0.298m and 0.597m. This is because most of our scene categories require resolutions within 1m, so the image resolution is mainly concentrated on these two resolutions. Because different scene types need to be displayed with different resolutions, the data set contains multiple image resolutions, which also reflects the rich scene categories and large image diversity of our data set.

global_distribution2

The benchmark

Here, we describe two subsets of the Million-AID database that serve as benchmarks. Million-AID-Standard and Million-AID-32.

Million-AID-Standard has 320,557 training images, including 46 scene categories, and the number of images in each category is between 4,000 and 20,000. The validation set contains 20,034 images and the test set contains 601,04 images. The experiments in this article are mainly carried out on Million-AID-Standard.

Million-AID-32 contains 32 scene categories common to NWPU and AID. Million-AID-32 is separated from Million-AID-Standard, each category contains 2000 images. Similarly, we separated the corresponding subset of 32 categories from NWPU and AID. These subsets are used to compare the image diversity between the three data sets and will be used in Section 4.3.

Download here

Points data

For scene categories with small footprints, such as “storage tank”, “wind turbine”, “pool”, etc., coordinate points are generally used to represent geographic location information of these scenes, and attached text contains semantic information. By retrieving the semantic information in the attached text, the coordinates of point data matching the retrieval category in a region can be acquired. Thereby the acquisition of the geographic information corresponding to the target category in the region will be completed. For accomplishing the process, we use Google Map Platform to create a JavaScript-based scene coordinates acquirer. The Radar Search API of Google Map Platform is able to search the geo-information labeling database in the area by input semantic information and return the coordinates of the points matching the input information. Based on Radar Search API in Google Map Platform, the scene coordinates acquirer has been created, which can switch the map area to achieve the large-area searching tasks and save the coordinates in a file in real time. The features of the acquirer improve the efficiencies of coordinates acquisitions. Fig.2 shows the coordinates of “baseball field” by red marks. Plenty of scenes coordinates are obtained by the acquirer, such as “baseball field”, “soccer field”, “church”, “tennis court”. geoinformation

global_distribution2

Lines and planes data

Some scene categories in the category hierarchy, such as business districts, cultivated land, rivers, etc., require an area to represent. A single point is not enough to describe the geometry and position information of these scenes. Lines and faces need to be used to accurately represent this scene information. In order to obtain the line and surface information of the scene category, we use the open map project, OpenStreetMap, to obtain the line and surface information of these scene categories. OpenStreetMap is an open online map project for free editing. OpenStreetMap consists of three elements, node, way, and relation. Node is the basic element that represents the geometric information of the scene. A single Node represents a point in space. It consists of longitude, dimension, and node id. node usually attaches a key, which represents the semantic information of node. Way is an ordered list of nodes, divided into open way, closed way, area, and so on. Each way contains at least one tag, and the tag contains the attachment semantic information of the way. Use way to represent geometric information for multiple scene categories. When using OpenStreetMap to extract the line and surface information of the scene category, we can directly extract the way data. The way data of the desired scene category is extracted by retrieving the semantic information of the tag.

global_distribution2

OpenStreetMap provides a data retrieval tool Osmosis. Use Osmosis to retrieve tag information and extract matching way data. Figure 4 shows the way data of a “commercial area” extracted from OpenStreetMap data (the way data is converted to KML data displayed in Google Earth), and Figure 5 shows the extracted “river” way data of the Yangtze River basin. It can be seen from the figure that the extracted way data information can be matched with the real remote sensing image, and the labeling accuracy is high and has great value.

global_distribution2

OpenStreetMap contains a large amount of data, a wide coverage, and relatively high precision. The tag information in OpenStreetMap is rich and varied, covering a large number of scenes of human production and life, and covering most scene categories in the category hierarchy. Thousands of way information can be obtained by using a data retrieval tool. This information is of great help to the acquisition of scene images. and position information.

global_distribution2