Enhancing Biomedical Named Entity Recognition with Dictionary-Based Matching Graph Network

In an article published in the journal Nature, researchers introduced a dictionary-based matching graph network called Biomedical Named Entity Recognition (BioNER) to improve the computer’s ability to recognize and understand biological terms. This approach implements a matching graph method and a bi-directional graph convolutional network (BiCGN). Instead of a simple masked manner, the proposed approach can leverage the dictionary-based matching graph.

Study: Enhancing Biomedical Named Entity Recognition with Dictionary-Based Matching Graph Network. Image credit: SOMKID THONGDEE/Shutterstock
Study: Enhancing Biomedical Named Entity Recognition with Dictionary-Based Matching Graph Network. Image credit: SOMKID THONGDEE/Shutterstock

Background

Biomedical text mining refers to the methods and study of how text mining may be applied to texts and literature of the biomedical domain. It is based on ideas from various fields, such as natural language processing (NLP), bioinformatics, medical informatics, and computational linguistics. It supports applications for identifying documents and concepts matching search queries. Search engines such as PubMed search allow users to query literature databases with words or phrases present in documents, metadata, or indices.

Traditional biomedical text mining uses feature-based methods. These methods were improved upon by the use of deep neural structures like Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), Convolutional Neural Networks (CNN), and Transformers (BERT).

A BERT pre-trained with the help of a bio-medical corpus was able to succeed, but there were still unresolved issues. One of those issues was the lack of integration of human knowledge, specifically for entities that were not well-represented in the corpus.

The researchers proposed the implementation of biomedical dictionaries using position features and external dictionary information. The proposed solution, Dictionary-Based Matching Graph Network (DGMN), implements a matching graph method that defines entities from start to endpoint. Both BiLSTM and BioBERT were used as basic encoders for text representation. This method significantly improved the performance compared to the masked approaches and enhanced BioNER through the effective implementation of dictionary information by addressing the issues in entity recognition.

Approach

The model architecture can be broken down into five layers. The first one was with the input layer, which provides a series of biomedical text. The second layer breaks down the text into tokens in a process of tokenization, which eases further operations. These tokens are fed into encoders that utilize BiLSTM and BioBERT, followed by the BiGCN encoding both forward and reverse versions of the dictionary-based matching graph. This process could be repeated for multiple layers for T times. It is followed by the activation function layer and, finally, the output layer, which delivers label sequences corresponding to the input text.

The BioBERT and BiLSTM Encoder provided contextual representations. Moreover, the PieceTokenizer further tokenized words into subwords when required. BiGCN transformed a series of entities into directional graph connections and encoded graph information in both forward and backward directions. Two Graph Convolutional Networks (GCNs) were involved in formulating forward and reverse versions.

Experiments

The experiments were conducted on five biomedical text datasets containing information about gene mention recognition, chemical entity mention recognition, disease mention recognition and biomedical entity recognition.
The datasets used were as follows:

  • BC2GM: The gene mentions recognition BioCreative II aimed at labeling the proteins and genes.
  • BC4CHEMD: The chemical entity mentions recognition BioCreative IV for labeling the proteins and genes.
  • BC5CDR: The most recent chemical and disease mention recognition, BioCreative V, which was a combination of BC5CDR-chem and BC5DR-disease datasets.
  • NCBI-Disease: This database was introduced for disease name normalization and recognition and has various applications.
  • JNLPBA: The biomedical entity recognition dataset for labeling protein/genes, RNA, DNA, cell line, and cell types.

Biomedical entity datasets were gathered for three types of entities, proteins/genes, diseases, and chemicals from the database of Comparative Toxicogenomics and biomedical data website. The DBGN was compared with various methods such as MTM, BERT, BioBERT, and CollaboNet, where all methods were enhanced with the help of conditional random field (CRF).

The Graphics Processing Unit used to train the neural network models was the GeForce GTX2080Ti. The pre-trained BioBERT contained 12 hidden layers with 768 hidden units for each layer, and  Adam was the optimizer used for BioBERT and DBGN. For all experiments, the layer size is set to two for BiCGN.  The performance metrics used to evaluate the model were Precision, Recall, and Macro-averaged F1.

Results

There was no notable improvement in the performance of the original BERT. However, BioBERT showed improved performance throughout all the datasets due to its domain-specific representation. DBGN was able to outperform all of its competitors across all the databases used with the help of the dictionary-based matching graph. The proposed method was able to achieve noteworthy performance improvements in less training time compared to both BERT and BioBert.

In the context of layer size, the best performance was delivered by two for all the databases. The BiCGN improved performance with the help of both forward and backward information. The Fuse Layer enhanced performance by fusing two GCNs in each layer. Even though Residual connections did not improve results, they reduced the training epochs. BiLSTM utilized the capturing of bidirectional long-range dependencies to improve performance.

Conclusion

In conclusion, DBGN significantly advanced biomedical entity recognition, outperforming state-of-the-art models. The BiGCN module contributed to the model's success with minimal training time increase. Future research could extend this approach to enhance various NLP applications and address entity boundary challenges, promising a coherent system for the efficient recognition of diverse biomedical entities.

Journal reference:
Soham Nandi

Written by

Soham Nandi

Soham Nandi is a technical writer based in Memari, India. His academic background is in Computer Science Engineering, specializing in Artificial Intelligence and Machine learning. He has extensive experience in Data Analytics, Machine Learning, and Python. He has worked on group projects that required the implementation of Computer Vision, Image Classification, and App Development.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Nandi, Soham. (2023, December 14). Enhancing Biomedical Named Entity Recognition with Dictionary-Based Matching Graph Network. AZoAi. Retrieved on December 22, 2024 from https://www.azoai.com/news/20231214/Enhancing-Biomedical-Named-Entity-Recognition-with-Dictionary-Based-Matching-Graph-Network.aspx.

  • MLA

    Nandi, Soham. "Enhancing Biomedical Named Entity Recognition with Dictionary-Based Matching Graph Network". AZoAi. 22 December 2024. <https://www.azoai.com/news/20231214/Enhancing-Biomedical-Named-Entity-Recognition-with-Dictionary-Based-Matching-Graph-Network.aspx>.

  • Chicago

    Nandi, Soham. "Enhancing Biomedical Named Entity Recognition with Dictionary-Based Matching Graph Network". AZoAi. https://www.azoai.com/news/20231214/Enhancing-Biomedical-Named-Entity-Recognition-with-Dictionary-Based-Matching-Graph-Network.aspx. (accessed December 22, 2024).

  • Harvard

    Nandi, Soham. 2023. Enhancing Biomedical Named Entity Recognition with Dictionary-Based Matching Graph Network. AZoAi, viewed 22 December 2024, https://www.azoai.com/news/20231214/Enhancing-Biomedical-Named-Entity-Recognition-with-Dictionary-Based-Matching-Graph-Network.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Anthropic Tests AI for Hidden Threats: Evaluating Sabotage Risks to Ensure Safe Deployment