In a paper published in the journal Scientific Data, researchers categorized printed circuit board (PCB) surface defects into nine distinct types based on causes, locations, and morphologies. They developed an openly accessible PCB surface defect (DsPCBSD+) dataset, which included 20,276 manually annotated defects across 10,259 images. This dataset aimed to accelerate research and advancements in deep learning (DL)-based detection of PCB surface defects, which are essential for quality control in manufacturing.
Related Work
Past PCB surface defect detection work has developed several datasets to support DL model training. Still, these datasets have limitations such as synthetic defect generation, limited categorization, and validation challenges. Additionally, many datasets need more real-world variability, leading to poor generalization in practical applications, and often fail to address the handling of defect-free or duplicate instances in cropped images. These challenges hinder the accurate validation and deployment of DL models in real manufacturing environments.
DsPCBSD+ Dataset Construction
The construction process for the DsPCBSD+ dataset is divided into three main steps: defect image collection, defect classification and preprocessing, and defect labeling and dataset partitioning. The images were collected exclusively from actual PCB defects found on the inner and outer layers of boards after etching, using the equipment at Guangzhou FastPrint Technology Co., Ltd.
This equipment features a multi-group controllable light emitting diode (LED) spotlight system and a 16K high-resolution line scan system, with four cameras on each side to catch both faces of the PCB. The captured images undergo preprocessing tasks like noise removal and contrast enhancement before key features are extracted and compared against reference images to detect defects. Thirty-two thousand two hundred fifty-nine images were retrieved from the management system, each formatted as a 226x226 pixel joint photographic experts group (JPG) file, to form the DsPCBSD+ dataset.
PCB surface defects, influenced by factors like copper residue, copper deficiency, and conductor scratches, were reclassified based on their causes, locations, and morphologies into four main categories: copper residue, copper deficiency, conductor scratch, and foreign object defects.
The dataset underwent rigorous screening to eliminate defect-free images, duplicate defect images, incomplete defect images, and other irrelevant categories. Special care was taken to ensure a balanced distribution of defect types, particularly focusing on including critical defects like open and short, which can potentially lead to PCB scrapping. This balanced approach helps avoid bias in the model training process, ensuring that DL models can detect all defect types effectively.
For the labeling process, defects were annotated using labeling software in visual object classes (VOC) format, with bounding boxes meticulously drawn around each defect. The images were then converted into you only look once (YOLO) and common objects in context (COCO) formats for training DL models. A total of 20,276 defects were annotated across 10,259 images. The team analyzed the distribution of defect labels across the nine categories to ensure the dataset's robustness and the effective training of models on varied defect sizes and types.
Dataset Overview Summary
The DsPCBSD+ dataset includes YOLO and COCO format annotations, reflecting their widespread adoption due to their simplicity, flexibility, and universality. Images and annotations are organized in separate folders within the dataset. One folder stores image and label data, with further divisions for training and validation purposes.
The label files include file names, label categories, and bounding box coordinates, with measurements normalized relative to image dimensions. The folder contains subfolders with .json files containing comprehensive label information, including image IDs, defect categories, and bounding box coordinates. To ensure the reliability of the DsPCBSD+ dataset, five PCB manufacturing experts conducted a thorough manual review of images and annotations. This review addressed issues such as annotation discrepancies and overlapping defects, involving discussions to determine accurate labels and positions.
Training involved adjustments to hyperparameters and resizing input images to match model recommendations. The team evaluated precision-recall curves and mean average precision (mAP) metrics highlighting areas where both models performed well and faced challenges, particularly with small defects and complex backgrounds.
The DsPCBSD+ dataset offers advantages over existing PCB defect datasets by including overlooked defects, such as hole breakout, foreign objects, and scratches. It provides detailed classification standards, addressing gaps found in other datasets. However, the dataset has limitations, including needing 3D depth information, excluding defect images after solder masking, and integrating local images into the full board context for practical applications. These considerations are important for users applying the dataset to real-world quality inspection tasks.
Conclusion
To sum up, the work successfully categorized PCB surface defects into nine distinct categories based on their causes, locations, and morphologies, leading to the development of the DsPCBSD+ dataset. This dataset, which comprises 20,276 manually annotated defects across 10,259 images, was created to support the training and advancement of DL models for accurate and efficient detection of PCB surface defects. By making this dataset accessible, the study aimed to accelerate research and foster progress in DL-based PCB defect detection.