Summarize Any Text, File, or Webpage with BART

🧠 Introduction

Reading long documents, blogs, or articles can be time-consuming. What if you could get the gist of any content from a textbook paragraph to a web page in just a few seconds?

Thanks to Hugging Face's Inference API, combined with Gradio, summarizing large text inputs is easier than ever. Here, we shall build a clean, simple web app where we can:

Type or paste raw text
Upload a .txt file
Enter a webpage URL

All with a single goal: “generate a concise summary instantly”.

📚 What Is Text Summarization?

Text summarization is the process of automatically generating a shorter version of a longer text while retaining its essential meaning.

There are two main types:

Extractive: Picks and rearranges key sentences.
Abstractive: Generates new, shorter phrases like how humans summarize.

This app uses abstractive summarization through the Hugging Face model: facebook/bart-large-cnn.

🤝 Why Hugging Face + Gradio?

Hugging Face Inference API gives access to powerful pre-trained models — no training or hosting needed.
Gradio lets you build interfaces with just a few lines of Python code.
You can deploy the entire app in Hugging Face Spaces and get a public URL to share.

Useful for:

Summarizing notes or articles
Extracting key points from papers
Integrating NLP into projects

🧠 What is BART?

BART (Bidirectional and Auto-Regressive Transformers) is a hybrid model introduced by Facebook AI. It merges two popular architectures:

BERT → Good at understanding text by reading it bidirectionally
GPT → Good at generating text by predicting the next word in a sequence

🧩 How BART Works?

BART learns by intentionally damaging the input and training itself to fix it similar to solving a jumbled puzzle.

📝 Why BART?

Because BART is trained to reconstruct meaningful text from noisy input, it excels at summarization tasks. It can:

Extract the core idea of a paragraph
Rewrite it fluently
Keep the original meaning intact

💻 Summarize from Any Input

Our app lets users choose from three input types:

✍️ Typed Text: Paste any paragraph
📁 .txt File Upload: Summarize content inside uploaded .txt files
🌍 Webpage URL: Enter any article/blog URL

🔍 Working Code

import gradio as gr
from bs4 import BeautifulSoup
import requests
from transformers import pipeline

# Load summarization pipeline
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

# Function to extract text from a webpage
def fetch_url_text(url):
    try:
        headers_req = {'User-Agent': 'Mozilla/5.0'}
        response = requests.get(url, headers=headers_req, timeout=10)
        response.raise_for_status()
        soup = BeautifulSoup(response.text, "html.parser")
        text = soup.get_text(separator=" ", strip=True)
        text = " ".join(text.split())
        if len(text) < 100:
            return None, "❌ Extracted text from the webpage is too short to summarize."
        return text, None
    except Exception as e:
        return None, f"❌ URL error: {e}"

# Summarization function
def summarize_text(text_input, file_upload, url_input):
    text = ""

    if file_upload:
        try:
            with open(file_upload.name, "r", encoding="utf-8") as f:
                text = f.read()
        except Exception as e:
            return f"❌ File read error: {e}"

    elif url_input:
        text, error_msg = fetch_url_text(url_input)
        if error_msg:
            return error_msg

    elif text_input:
        text = text_input

    else:
        return "⚠️ Please provide some input."

    try:
        summary = summarizer(text[:1024], max_length=150, min_length=30, do_sample=False)
        return summary[0]["summary_text"]
    except Exception as e:
        return f"❌ Summarization error: {e}"

# Gradio Interface
demo = gr.Interface(
    fn=summarize_text,
    inputs=[
        gr.Textbox(label="✍️ Enter Text", lines=4, placeholder="Paste or type text here..."),
        gr.File(label="📄 Upload a .txt File", file_types=[".txt"]),
        gr.Textbox(label="🌐 Enter Webpage URL", placeholder="https://example.com/article")
    ],
    outputs="text",
    title="🧠 Multi-Input Text Summarizer",
    description="Summarize content from text, uploaded files, or web URLs using the BART model."
)

demo.launch()

🔍 Code Explanation

Imports
- gradio: For building the UI
- BeautifulSoup & requests: For extracting text from webpages
- pipeline from transformers: To load the summarization model
Summarization Pipeline
```
 summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
```
Loads the pre-trained BART model optimized for summarizing long news-like articles.
Webpage Text Extraction
```
 fetch_url_text(url)
```
- Sends an HTTP request to the webpage
- Uses BeautifulSoup to extract all visible text
- Cleans up extra whitespace
- Returns error if text is too short
Summarization Function
```
 summarize_text(text_input, file_upload, url_input)
```
- Determines the input type (text box, file, or URL)
- Extracts text accordingly
- Passes the text (max 1024 characters) to the summarizer
- Returns the generated summary
Gradio Interface
```
 gr.Interface(...)
```
- Creates an app with three inputs and a single text output
- Launches a web app where users can try the summarizer easily

📂 Supported Inputs

Input Type	Details / Limitations
Text Box	Up to 1024 characters summarized per input
File Upload	Only `.txt` files supported, UTF-8 encoded
Web URL	Must be a clean, HTML-readable webpage with enough content

⚠️ Limitations

If no Torch backend is available, the pipeline won’t run (use Spaces with PyTorch or Colab)
URLs with dynamic content (like JavaScript-based pages) may fail
Summarizer is trained on English and is not great with other languages

📦 How to Deploy on Hugging Face Spaces

Create a new Gradio space
Upload app.py with the above code
Add a requirements.txt with:
```
 gradio
 transformers
 torch
 requests
 beautifulsoup4
```
Set the hardware to GPU if available for faster results
Click Commit and Deploy

Try out the sample app at:
https://huggingface.co/spaces/divivetri/text_summarization

🔍Sample Code for JavaScript-rendered webpages

Most websites today are built with JavaScript, meaning their content is rendered dynamically after the page loads. Standard Python tools like requests + BeautifulSoup can only access the raw HTML and often miss key content. To fix this, we use Selenium, a headless browser automation tool, to fully load JavaScript-powered pages and extract visible text, making our summarization app much more powerful and real-world ready.

If you want to try out summarization on JS rendered webpages, do try this code. This works on a GPU environment and so the free spaces will to be of help. Do try the code in Colab and choose GPU.
TIP: Try running the code as separate cells for easy execution

!pip install gradio transformers torch selenium beautifulsoup4
!apt-get update
!apt install -y chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin

import sys
sys.path.insert(0, '/usr/lib/chromium-browser/chromedriver')

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import requests
import gradio as gr
from transformers import pipeline
import time

# Load BART summarization pipeline
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

# Function to extract text from JS-enabled webpages
def fetch_url_text(url):
    try:
        chrome_options = Options()
        chrome_options.add_argument("--headless")
        chrome_options.add_argument("--disable-dev-shm-usage")
        chrome_options.add_argument("--no-sandbox")
        driver = webdriver.Chrome(options=chrome_options)

        driver.get(url)
        time.sleep(5)  # Wait for JS to load

        html = driver.page_source
        driver.quit()

        soup = BeautifulSoup(html, "html.parser")
        text = soup.get_text(separator=" ", strip=True)
        text = " ".join(text.split())
        if len(text) < 100:
            return None, "❌ Extracted text is too short to summarize."
        return text, None
    except Exception as e:
        return None, f"❌ URL fetch error: {e}"

# Main function to handle all inputs
def summarize_text(text_input, file_upload, url_input):
    text = ""

    if file_upload:
        try:
            text = file_upload.read().decode("utf-8")
        except Exception as e:
            return f"❌ File read error: {e}"

    elif url_input:
        text, error_msg = fetch_url_text(url_input)
        if error_msg:
            return error_msg

    elif text_input:
        text = text_input

    else:
        return "⚠️ Please provide some input."

    try:
        summary = summarizer(text[:1024], max_length=150, min_length=30, do_sample=False)
        return summary[0]["summary_text"]
    except Exception as e:
        return f"❌ Summarization error: {e}"

demo = gr.Interface(
    fn=summarize_text,
    inputs=[
        gr.Textbox(label="✍️ Enter Text", lines=4, placeholder="Paste or type text here..."),
        gr.File(label="📄 Upload a .txt File", file_types=[".txt"]),
        gr.Textbox(label="🌐 Enter Webpage URL", placeholder="https://example.com/article")
    ],
    outputs="text",
    title="🧠 Smart Text Summarizer with JS Page Support",
    description="Summarize content from text, files, or JavaScript-rendered webpages using Hugging Face's BART model."
)

demo.launch(share=True)

🔍 Code Explanation

Selenium for Full Webpage Rendering
```
 from selenium import webdriver
 from selenium.webdriver.chrome.options import Options
```
- Configures a headless Chrome browser
- Loads the page like a real browser would, executing JavaScript
Dynamic Content Extraction
```
 driver.get(url)
 html = driver.page_source
 soup = BeautifulSoup(html, "html.parser")
```
- Loads the full page source after JavaScript execution
- BeautifulSoup extracts readable text from the fully rendered HTML
Summarization Remains the Same
```
 summary = summarizer(text[:1024], max_length=150, min_length=30)
```
- Uses the Hugging Face BART model for summarizing the text
Gradio Interface
- Allows the user to paste a URL, enter text manually, or upload a file
- Results are shown instantly in the browser

📚 References

BART Model for Summarization
- Lewis, M., Liu, Y., Goyal, N., et al. (2019). BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. arXiv preprint arXiv:1910.13461.
- Hugging Face Model Page: https://huggingface.co/facebook/bart-large-cnn
Transformers Library (Hugging Face)
- https://github.com/huggingface/transformers
- Documentation: https://huggingface.co/docs/transformers/index
Selenium for Python
- Official Documentation: https://www.selenium.dev/documentation/webdriver/
- PyPI Package: https://pypi.org/project/selenium/
- ChromeDriver Setup: https://chromedriver.chromium.org/
BeautifulSoup (bs4)
- Documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/
Gradio – Build ML Apps Quickly
- Website: https://gradio.app
- GitHub Repo: https://github.com/gradio-app/gradio

🙌 Wrap-Up

You’ve just built a smart summarizer that works with text, files, and even web pages all without training a model! 💡 Thanks to BART and Gradio, turning long content into clear, concise summaries is now just a click away.

🚀 Try it out, explore its limits, and let AI do the reading for you! Build it. Run it. Summarize it!!!

Enhance Your Summarization: Multi-Input App Built with Hugging Face API and Gradio

🧠 Introduction

📚 What Is Text Summarization?

🤝 Why Hugging Face + Gradio?

🧠 What is BART?

🧩 How BART Works?

📝 Why BART?

💻 Summarize from Any Input

🔍 Working Code

🔍 Code Explanation

📂 Supported Inputs

⚠️ Limitations

📦 How to Deploy on Hugging Face Spaces

🔍Sample Code for JavaScript-rendered webpages

🔍 Code Explanation

📚 References

🙌 Wrap-Up

Subscribe to my newsletter

Divya Vetriveeran

Divya Vetriveeran