DEEPPATENT2: A Comprehensive Dataset for Advancing Technical Drawing Understanding

In a recent publication in the journal Scientific Data, researchers introduced DeepPatent2, a sizable dataset to address the challenges of producing accurate descriptions for sketched images in technical documents.

Study: DEEPPATENT2: A Comprehensive Dataset for Advancing Technical Drawing Understanding. Image credit: Generated using DALL.E.3
Study: DEEPPATENT2: A Comprehensive Dataset for Advancing Technical Drawing Understanding. Image credit: Generated using DALL.E.3

Background

Technical illustrations, sketches, and drawings serve as visual aids to convey information efficiently. In computer vision, the challenge lies in comprehending the intricacies of these images, encompassing object recognition, attribute determination, and contextual understanding.

While conventional datasets such as Microsoft Common Objects in Context (MS COCO) and ImageNet focus on natural images, technical drawings found in design patents offer a unique set of challenges. These drawings, though lacking the color and environmental details of natural images, provide essential abstraction, emphasizing strokes and lines that maintain human recognizability.

Despite the significance of technical drawings, they remain understudied in computer vision and information retrieval. Existing sketch datasets, such as QuickDraw and Cross-Language Evaluation Forum-Intellectual Property (CLEF-IP) 2011, fall short of capturing the rich semantic information present in technical drawings. DEEPPATENT, a prior dataset, lacked object identification and viewpoint descriptions. In response, DEEPPATENT2 emerges.

Crafting the DEEPPATENT2 dataset

DEEPPATENT2, an extensive dataset, encompasses over two million technical drawings derived from design patent documents published by the United States Patent and Trademark Office (USPTO) between 2007 and 2020, expanding upon the DEEPPATENT dataset in size, content, and metadata richness.

Scale and Composition: DEEPPATENT2 surpasses DEEPPATENT, offering more than a five-fold increase in volume. It includes both original and segmented patent drawings. The metadata for each drawing incorporates object names and viewpoints, meticulously extracted with high precision through a supervised sequence-tagging model.

Data Creation Pipeline: The process involves three key components: data acquisition, text processing, and image processing. Patent documents in XML (eXtensible Markup Language) and Tag Image File Format (TIFF) formats are acquired, with each TIFF file potentially containing multiple figures, referred to as compound figures. Text processing entails extracting human-readable object names from figure captions, overcoming challenges posed by compound figures. Image processing involves figure segmentation and metadata alignment, where a novel transfer learning method, Medical Transformer (MedT), proves effective in segmenting compound figures.

Text Processing-Entity Recognition: Entity recognition involves tokenizing and encoding text using pre-trained models such as Distil Bidirectional Encoder Representations from Transformers (DistilBERT). The sequence-tagging model, employing bidirectional Long Short-Term Memory with a conditional random field (BiLSTM-CRF) architecture, achieves high accuracy in recognizing object names and viewpoints, ensuring an F1-measure of 0.960 for overall entity recognition.

Image Processing-Figure Segmentation and Metadata Alignment: The identification of figure labels is accomplished using Amazon Rekognition, surpassing alternative optical character recognition (OCR) engines in terms of precision, recall, and F1 score. The segmentation of compound figures is executed using the Medical Transformer (MedT) model, showcasing enhanced performance when contrasted with baseline methods like point-shooting, U-Net, HR-Net, and Detection Transformer (DETR).

Data Records: The final dataset comprises two million compound PNG figures, 2.7 million segmented PNG figures, and JSON (JavaScript Object Notation) metadata organized by year. Metadata includes patent ID, original figure file, object names, viewpoints, figure labels, bounding boxes, and document-level information.

Semantic Information Extraction: The dataset yields 132,890 unique object names and 22,394 viewpoints. Analysis reveals diverse and disproportionate viewpoints, posing challenges for 3D reconstruction from 2D sketches.

DEEPPATENT2 is a comprehensive resource poised to propel advancements in diverse research areas, including 3D image reconstruction and image retrieval for technical drawings.

Technical validation of the DEEPPATENT2

The data, generated through advanced machine learning and deep learning methods, undergoes a meticulous validation process to address potential errors in figure label detection, compound image segmentation, label association, and entity recognition (ER). The overall error rate averaged at 7.5 percent, is approximated by considering errors in label association mismatches. This rate is expressed as a precision value.

While all figures are retained in the dataset, those with mismatches are marked in filenames for reference. Verifying error rates involves manual inspection of 1400 compound figures, confirming consistency with estimated error rates. The dataset aligns with comparable error rates in computer vision datasets, acknowledging the inherent challenges of automated tagging.

In demonstrating the dataset's utility, a conceptual captioning task is showcased, employing a variant of the residual network (ResNet-152) for image captioning on technical drawings. The dataset's potential extends to tasks such as technical drawing image retrieval, summarization of scholarly and technical corpora, 3D image reconstruction, figure segmentation, and technical drawing classification.

Furthermore, the dataset could contribute to the creation of generative and multimodal design models for innovation by combining generative adversarial networks (GAN) and diffusion models. The dataset's richness in detailed technical drawings enhances its value for training accurate multimodal generative models.

Conclusion

In summary, researchers introduced the DEEPPATENT2 dataset. Enriched with semantic details such as object names and multiple views, the dataset effectively addresses the shortcomings observed in its predecessors.

Leveraging a pioneering pipeline integrating natural language processing and computer vision methods, the proposed dataset demonstrates its utility through enhanced conceptual captioning model performance. The expansive dataset is poised to contribute to advancements in tasks such as 3D image reconstruction and image retrieval for technical drawings.

Journal reference:
Dr. Sampath Lonka

Written by

Dr. Sampath Lonka

Dr. Sampath Lonka is a scientific writer based in Bangalore, India, with a strong academic background in Mathematics and extensive experience in content writing. He has a Ph.D. in Mathematics from the University of Hyderabad and is deeply passionate about teaching, writing, and research. Sampath enjoys teaching Mathematics, Statistics, and AI to both undergraduate and postgraduate students. What sets him apart is his unique approach to teaching Mathematics through programming, making the subject more engaging and practical for students.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Lonka, Sampath. (2023, November 10). DEEPPATENT2: A Comprehensive Dataset for Advancing Technical Drawing Understanding. AZoAi. Retrieved on November 24, 2024 from https://www.azoai.com/news/20231110/DEEPPATENT2-A-Comprehensive-Dataset-for-Advancing-Technical-Drawing-Understanding.aspx.

  • MLA

    Lonka, Sampath. "DEEPPATENT2: A Comprehensive Dataset for Advancing Technical Drawing Understanding". AZoAi. 24 November 2024. <https://www.azoai.com/news/20231110/DEEPPATENT2-A-Comprehensive-Dataset-for-Advancing-Technical-Drawing-Understanding.aspx>.

  • Chicago

    Lonka, Sampath. "DEEPPATENT2: A Comprehensive Dataset for Advancing Technical Drawing Understanding". AZoAi. https://www.azoai.com/news/20231110/DEEPPATENT2-A-Comprehensive-Dataset-for-Advancing-Technical-Drawing-Understanding.aspx. (accessed November 24, 2024).

  • Harvard

    Lonka, Sampath. 2023. DEEPPATENT2: A Comprehensive Dataset for Advancing Technical Drawing Understanding. AZoAi, viewed 24 November 2024, https://www.azoai.com/news/20231110/DEEPPATENT2-A-Comprehensive-Dataset-for-Advancing-Technical-Drawing-Understanding.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Salesforce's BLIP-3-Video Achieves Breakthrough In Video Analysis With Minimal Tokens