In a paper published in the journal Applied Sciences, researchers presented a novel architecture for abstractive summarization. They addressed the limitations of large language models (LLMs) like GPT-3/4 and ChatGPT by incorporating knowledge graph (KG) information and structured semantics.
The researchers improved the summarization quality by enhancing the encoder architecture with multi-source transformer modules based on BART. Comparative experiments on the Wiki-Sum dataset show that the proposed approach excels in generating informative summaries, overcoming the limitations of LLMs in verifying accuracy.
Background
Abstractive summarization is a challenging task in natural language processing. It requires models that understand long document contexts and generate informative summaries. While existing models like sequence-to-sequence with attention mechanism, BART and bidirectional encoder representations from transformers (BERT) perform well, they may need help with wordplay and factual accuracy.
Generating abstracts from information-sparse crowdsourcing articles like those on Wikipedia is challenging compared to concise and information-dense news articles from sources like the NYT Corpus and the CNN/DailyMail dataset. To address this information sparsity, the researchers introduced the Wiki-Sum dataset, extracting internal semantics to prioritize entities and relations. Additionally, they propose the encoder-decoder model MultiBART-GAT, which combines transformers and knowledge graphs to enhance accuracy and coherence in text summarization. The evaluation demonstrates improved performance on both Wiki-Sum and CNN/DailyMail datasets, highlighting the potential of integrating certified facts like accuracy and factual text into large language models for text summarization.
Related work
Abstractive text summarization employs a neural agent to encode source documents and generate summaries. Model training maximizes conditional likelihood. BART, an enhanced BERT, excels in abstractive summarization.
Knowledge graphs with entities and relations aid representation learning. Graph Attention Networks (GAT) capture global context effectively. Extracting internal structures such as knowledge graphs, dependency parsing trees, and sentence structures enhances text representation. These structures facilitate text generation and have been explored in diverse models for various tasks. Existing research confirms the value of knowledge graphs in providing latent semantics for abstractive summarization.
Proposed model: MultiBART-GAT
The researchers proposed MultiBART-GAT, an encoder-decoder architecture model based on BART, which features modified decoders for abstractive summarization that accept multiple inputs.
The architecture utilizes a multi-head transformer introduced in BART, with Gaussian error linear units (GeLUs) as the activation function. In the decoder, cross-attention is performed over the final layer of the encoder, eliminating the need for additional fully connected layers for word prediction.
The graph encoder employs two GAT layers with residual connections, converting contextual information from the document-level KG into embeddings for entities and relations. These embeddings are then fed into the decoder.
Encoder-decoder network: The base model configuration consists of six layers for both the encoder and decoder, similar to BART. During training, the input includes tokenized text and extracted KG triples linked to pretrained embeddings using the TransE model. GATs pre-process the graph embeddings, which are concatenated with token embeddings as encoder inputs. Both the encoder and decoder utilize embeddings, and during training, the decoder uses multi-source BART decoding with cross-entropy loss. During inference, the decoder output distribution can be further processed using greedy or beam search.
The BART model is fine-tuned based on the specific sub-task and input/output structure. The loss function initially incorporates a textual-side loss, comparing the generated abstract with the ground truth using cross-entropy loss. Additionally, an entity salience objective is introduced to predict entity presence in the abstract. The loss function includes a masking mechanism based on ground truth values.
Wiki-Sum dataset
The researchers introduced the Wiki-Sum dataset, aiming to generate longer and more informative abstracts compared to general news text. The dataset comprises automatically tokenized and processed data from Wikipedia articles. To create a representative dataset, they selected the 94,000 most widely-read articles, excluding certain types like home pages, lists, and pages with only a couple of lines. They further analyzed the dataset, observing that the articles align with the interests of the general public. However, some articles focus on technological topics.
The articles in the dataset are notably longer, posing challenges for neural models. To address this, they selected the first three sentences from each paragraph as inputs. Further, they performed knowledge graph construction using OpenIE, extracting entity-relation triples with coreference resolution.
Experiments and results
The researchers evaluated the MultiBART-GAT model by comparing it with baseline models, ASGARD and BART. The models were trained without cloze reward or reinforcement losses for 10 epochs. The evaluation utilized the ROUGE score as the main metric, measuring the quality of the generated summaries.
To handle computational constraints, the Wiki-Sum dataset was reduced by selecting the first three sentences from each subsection, while the CNN/DailyMail dataset was downscaled by 10%.
MultiBART-GAT outperformed BART in generating longer and more informative summaries but struggled with topical information. ASGARD's LSTM-based encoder performed better due to the limited context of the smaller dataset. Results on the Wiki-Sum dataset were slightly lower, indicating a potential information gap in Wikipedia articles. Future research directions include enhancing the dataset as an area of focus.
Conclusions
In summary, the researchers propose a novel approach to abstractive summarization, overcoming limitations in large language models (LLMs). By incorporating knowledge graph information and structured semantics, they enhanced factual correctness. The MultiBART-GAT model, based on BART, serves as an encoder for textual and graphical inputs, resulting in higher-quality summaries. Evaluation of the Wiki-Sum dataset shows superior performance. They suggest dataset cleaning, considering other graph encoders, improving model convergence speed, and incorporating multi-hop generation for aligned and accurate summaries to further improve LLM-generated summaries' factual accuracy.