![]() |
Automatic text summarization refers to a group of methods that employ algorithms to compress a certain amount of text while preserving the text’s key points. Although it may not receive as much attention as other machine learning successes, this field of computer automation has witnessed consistent advancement and improvement. Therefore, systems capable of extracting the key concepts from the text while maintaining the overall meaning have the potential to revolutionize a variety of industries, including banking, law, and even healthcare. Types of Text SummarizationThere are typically two basic methods for automatic text summarization:
Extractive SummarizationExtractive summarization algorithms are employed to generate a summary by selecting and combining key passages from the source material. Unlike humans, these models emphasize creating the most essential sentences from the original text rather than generating new ones. Extractive summarization utilizes the Text Rank algorithm, which is highly suitable for text summarization tasks. Let’s explore how it functions by considering a sample text summarization scenario. Utilizing the TextRank Algorithm for Extractive Text Summarization:The implementation of TextRank offers a spaCy pipeline as an additional feature. SpaCy is an excellent Python library for addressing challenges in natural language processing. Additionally, you need pytextrank, a spaCy extension that effectively implements the TextRank algorithm. It is evident that the TextRank algorithm can produce reasonably satisfactory results. Nevertheless, extractive summarization techniques merely provide a modified version of the original text, retaining certain phrases that were not eliminated, instead of generating new text (new data) to summarize the information contained in the original text. PrerequisiteSpacy To Install the Spacy and Dowload the English Language Dependency run the below code in terminal !pip install spacy To install the english laguage dependency !python3 -m spacy download en_core_web_lg TextRank To Install the TextRank !pip install pytextrank Text SummarizationsThis code uses spaCy and PyTextRank to automatically summarize a given text. It first installs the required packages, downloads a spaCy language model, and loads the model with the TextRank summarization pipeline. It then processes a lengthy text and generates a summary of the text’s key phrases and sentences. The summary is limited to 2 phrases and 2 sentences. Python3
Output: Original Document Size: 1808 Abstractive Summarization:Abstractive summarization techniques emulate human writing by generating entirely new sentences to convey key concepts from the source text, rather than merely rephrasing portions of it. These fresh sentences distill the vital information while eliminating irrelevant details, often incorporating novel vocabulary absent in the original text. The term “Transformers” has recently dominated the natural language processing field, although these models initially relied on designs based on recurrent neural networks (RNNs). What Are Transformers?Transformers represent a series of systems that employ a unique encoder-decoder architecture to transform an input sequence into an output sequence. Transformers feature a distinctive “self-attention” mechanism, along with several other enhancements like positional encoding, which set them apart. NOTE: Not all Transformers are intended for use in text summarization. Let’s delve into the recently released model called PEGASUS, which appears to excel in terms of output quality for text summarization. PEGASUS shares similarities with other transformer models, with its primary distinction lying in a unique approach used during the model’s pre-training. Specifically, the most crucial sentences in the training text corpora are “masked” (hidden from the model) during PEGASUS pre-training. The model is then tasked with generating these concealed sentences as a single output sequence. PrerequisiteTo run the text summarizations below code, First we need to install the below python libraries and framework. !pip install git+https://github.com/PyTorchLightning/pytorch-lightning This code uses the Hugging Face Transformers library to summarize text using the PEGASUS model. It installs necessary packages, selects the model, tokenizes the input text, generates a summary, and prints it. Additionally, it demonstrates using the summarization pipeline for text summarization. Python3
Output: Original Document Size: 1825 ConclusionAs we come to a conclusion, the future of text summarization seems bright. We are working to uncover the possibility of summarizing text with even more accuracy and human-like intuition by using extractive and abstractive approaches, as well as potent models like PEGASUS. This journey is continuing to transform how we condense massive volumes of information into succinct, insightful insights, and it promises a future in which we will be able to distil knowledge more effectively than before. The development of text summarization is evidence of the ever-expanding potential of AI and its dedication to improving human comprehension. |
Reffered: https://www.geeksforgeeks.org
AI ML DS |
Type: | Geek |
Category: | Coding |
Sub Category: | Tutorial |
Uploaded by: | Admin |
Views: | 12 |