In an article published in the journal Scientific Data, researchers from Duke University developed an original large-scale, multi-class dataset of above-ground storage tanks (ASTs) from high-resolution aerial imagery across the contiguous United States. They presented a validation procedure to ensure the quality and reliability of the annotations.
Background
ASTs are large containers or vessels used to store liquids, typically industrial chemicals, petroleum products, or water, in various industries, including chemical and petroleum production, processing, refining, and transport. These tanks are situated above the ground surface, in contrast to underground storage tanks, and are vulnerable to natural and anthropogenic hazards such as hurricanes, floods, fires, explosions, and sabotage.
However, suitable data for checking system vulnerabilities and failures, calculating current production and capacity, and evaluating the state of energy and other infrastructure are not easily available to regulators, researchers, and other decision-makers. In some cases, remotely sensed imagery has been used to develop AST datasets for specialized purposes. Yet, they are limited by sparse annotations, lack of geographic coverage, missing location data, limited availability, and simplified classifications. Therefore, there is a need for a publicly available dataset of ASTs with high-resolution aerial imagery.
About the Research
In the present paper, an innovative dataset of ASTs was designed using the high-resolution aerial imagery collected by the National Agriculture Imagery Program (NAIP) of the United States Department of Agriculture (USDA). The NAIP imagery covered the continental US during the season of agricultural growth and had a minimum ground sampling distance of 60 cm, allowing the identification of objects ranging from 3 to 69 m in diameter. The study selected NAIP tiles based on the presence of relevant infrastructure and identifiable objects. The selected tiles were then broken into 512-by-512-pixel images for annotation.
The authors used a graphical user interface (GUI) tool called LabelImg to manually annotate and classify the objects of interest in the images. The objects were categorized into five classes: external floating roof tanks, closed roof tanks, spherical pressure tanks, narrow closed roof tanks, sedimentation tanks, and water towers. The research implemented a validation procedure to minimize the errors and ensure the data quality and consistency of the annotations.
The procedure involved three steps: (1) checking for missed objects, (2) adjusting the bounding boxes, and (3) confirming the class labels. Furthermore, the validation procedure was evaluated by comparing it with a ground truth dataset created by an expert. The researchers also obtained geospatial information for each tank, image, and tile, as well as the diameter of each object, developed tile-level annotations, and compiled the full dataset.
Research Findings
The outcomes showed that the paper compiled the individual images and corresponding annotations into a broader dataset along with metadata. The dataset consists of 142,107 objects distributed between seven classes, with narrow closed roof tanks and closed roof tanks comprising 35% and 51% of the dataset, respectively. It covers 48 states and is consistent with existing petroleum datasets that were not utilized to create the dataset.
The final dataset is publicly available in a Figshare repository in various formats, such as JavaScript object notation (JSON), Geographic JSON (GeoJSON), Environmental Systems Research Institute (ESRI) shapefile, eXtensible markup language (XML), and Joint photographic experts’ group (JPG).
The study reported the results of the validation procedure and the comparison with the real-world dataset. It found that the validation procedure improved the coverage and quality of the annotations, as well as the accuracy of the class labels. The research also found that the validation procedure achieved an average precision of 0.99 and recall of 0.952 across tank classes, indicating that the bounding boxes were correctly correlated with objects of interest, and fewer than 5% of objects were missed. Moreover, it analyzed the causes of missed objects and found that smaller tanks were harder to identify.
Applications
The proposed dataset can be used for various applications. Following are some of the potential uses:
- Training and testing data for object detection algorithms, particularly for remotely sensed imagery and ASTs.
- Providing geospatial data for AST risk assessments, facilitating the estimation of potential impacts from natural and anthropogenic hazards on ASTs and their surroundings.
- Generating petroleum storage and net capacity estimates by considering the location, size, and type of ASTs.
- Supporting petrochemical market evaluations and economic assessments based on the distribution and characteristics of ASTs.
- Contributing to machine learning or computer vision tasks.
Conclusion
In summary, the novel dataset contains high-resolution aerial imagery, geospatial coordinates, border vertices, and orthorectified imagery for over 130,000 ASTs from five labeled classes. It is publicly accessible and serves a variety of purposes, including production and capacity estimation, risk and hazard assessment, and infrastructure evaluation. Additionally, the study introduced a quality checking approach to maintain the quality and reliability of the annotations.
The authors suggested that future work could include updating the dataset with more recent imagery, expanding geographic coverage to other areas, and adding more classes and features to the dataset. They argued that the novel dataset can be a valuable resource for regulators and decision-makers interested in ASTs and associated risks and benefits.