Enhance Your Summarization: Multi-Input App Built with Hugging Face API and Gradio

🧠 Introduction

Reading long documents, blogs, or articles can be time-consuming. What if you could get the gist of any content from a textbook paragraph to a web page in just a few seconds?

Thanks to Hugging Face's Inference API, combined with Gradio, summarizing large text inputs is easier than ever. Here, we shall build a clean, simple web app where we can:

  • Type or paste raw text

  • Upload a .txt file

  • Enter a webpage URL

All with a single goal: β€œgenerate a concise summary instantly”.

πŸ“š What Is Text Summarization?

Text summarization is the process of automatically generating a shorter version of a longer text while retaining its essential meaning.

There are two main types:

  • Extractive: Picks and rearranges key sentences.

  • Abstractive: Generates new, shorter phrases like how humans summarize.

This app uses abstractive summarization through the Hugging Face model: facebook/bart-large-cnn.

🀝 Why Hugging Face + Gradio?

  • Hugging Face Inference API gives access to powerful pre-trained models β€” no training or hosting needed.

  • Gradio lets you build interfaces with just a few lines of Python code.

  • You can deploy the entire app in Hugging Face Spaces and get a public URL to share.

Useful for:

  • Summarizing notes or articles

  • Extracting key points from papers

  • Integrating NLP into projects

🧠 What is BART?

BART (Bidirectional and Auto-Regressive Transformers) is a hybrid model introduced by Facebook AI. It merges two popular architectures:

  • BERT β†’ Good at understanding text by reading it bidirectionally

  • GPT β†’ Good at generating text by predicting the next word in a sequence

🧩 How BART Works?

BART learns by intentionally damaging the input and training itself to fix it similar to solving a jumbled puzzle.

πŸ“ Why BART?

Because BART is trained to reconstruct meaningful text from noisy input, it excels at summarization tasks. It can:

  • Extract the core idea of a paragraph

  • Rewrite it fluently

  • Keep the original meaning intact

πŸ’» Summarize from Any Input

Our app lets users choose from three input types:

  • ✍️ Typed Text: Paste any paragraph

  • πŸ“ .txt File Upload: Summarize content inside uploaded .txt files

  • 🌍 Webpage URL: Enter any article/blog URL

    πŸ” Working Code

import gradio as gr
from bs4 import BeautifulSoup
import requests
from transformers import pipeline

# Load summarization pipeline
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

# Function to extract text from a webpage
def fetch_url_text(url):
    try:
        headers_req = {'User-Agent': 'Mozilla/5.0'}
        response = requests.get(url, headers=headers_req, timeout=10)
        response.raise_for_status()
        soup = BeautifulSoup(response.text, "html.parser")
        text = soup.get_text(separator=" ", strip=True)
        text = " ".join(text.split())
        if len(text) < 100:
            return None, "❌ Extracted text from the webpage is too short to summarize."
        return text, None
    except Exception as e:
        return None, f"❌ URL error: {e}"

# Summarization function
def summarize_text(text_input, file_upload, url_input):
    text = ""

    if file_upload:
        try:
            with open(file_upload.name, "r", encoding="utf-8") as f:
                text = f.read()
        except Exception as e:
            return f"❌ File read error: {e}"

    elif url_input:
        text, error_msg = fetch_url_text(url_input)
        if error_msg:
            return error_msg

    elif text_input:
        text = text_input

    else:
        return "⚠️ Please provide some input."

    try:
        summary = summarizer(text[:1024], max_length=150, min_length=30, do_sample=False)
        return summary[0]["summary_text"]
    except Exception as e:
        return f"❌ Summarization error: {e}"

# Gradio Interface
demo = gr.Interface(
    fn=summarize_text,
    inputs=[
        gr.Textbox(label="✍️ Enter Text", lines=4, placeholder="Paste or type text here..."),
        gr.File(label="πŸ“„ Upload a .txt File", file_types=[".txt"]),
        gr.Textbox(label="🌐 Enter Webpage URL", placeholder="https://example.com/article")
    ],
    outputs="text",
    title="🧠 Multi-Input Text Summarizer",
    description="Summarize content from text, uploaded files, or web URLs using the BART model."
)

demo.launch()

πŸ” Code Explanation

  1. Imports

    • gradio: For building the UI

    • BeautifulSoup & requests: For extracting text from webpages

    • pipeline from transformers: To load the summarization model

  2. Summarization Pipeline

     summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
    

    Loads the pre-trained BART model optimized for summarizing long news-like articles.

  3. Webpage Text Extraction

     fetch_url_text(url)
    
    • Sends an HTTP request to the webpage

    • Uses BeautifulSoup to extract all visible text

    • Cleans up extra whitespace

    • Returns error if text is too short

  4. Summarization Function

     summarize_text(text_input, file_upload, url_input)
    
    • Determines the input type (text box, file, or URL)

    • Extracts text accordingly

    • Passes the text (max 1024 characters) to the summarizer

    • Returns the generated summary

  5. Gradio Interface

     gr.Interface(...)
    
    • Creates an app with three inputs and a single text output

    • Launches a web app where users can try the summarizer easily

πŸ“‚ Supported Inputs

Input TypeDetails / Limitations
Text BoxUp to 1024 characters summarized per input
File UploadOnly .txt files supported, UTF-8 encoded
Web URLMust be a clean, HTML-readable webpage with enough content

⚠️ Limitations

  • If no Torch backend is available, the pipeline won’t run (use Spaces with PyTorch or Colab)

  • URLs with dynamic content (like JavaScript-based pages) may fail

  • Summarizer is trained on English and is not great with other languages

πŸ“¦ How to Deploy on Hugging Face Spaces

  1. Create a new Gradio space

  2. Upload app.py with the above code

  3. Add a requirements.txt with:

     gradio
     transformers
     torch
     requests
     beautifulsoup4
    

    Set the hardware to GPU if available for faster results

  4. Click Commit and Deploy

Try out the sample app at:
https://huggingface.co/spaces/divivetri/text_summarization

πŸ”Sample Code for JavaScript-rendered webpages

Most websites today are built with JavaScript, meaning their content is rendered dynamically after the page loads. Standard Python tools like requests + BeautifulSoup can only access the raw HTML and often miss key content. To fix this, we use Selenium, a headless browser automation tool, to fully load JavaScript-powered pages and extract visible text, making our summarization app much more powerful and real-world ready.

If you want to try out summarization on JS rendered webpages, do try this code. This works on a GPU environment and so the free spaces will to be of help. Do try the code in Colab and choose GPU.
TIP: Try running the code as separate cells for easy execution

!pip install gradio transformers torch selenium beautifulsoup4
!apt-get update
!apt install -y chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
import sys
sys.path.insert(0, '/usr/lib/chromium-browser/chromedriver')

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import requests
import gradio as gr
from transformers import pipeline
import time
# Load BART summarization pipeline
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
# Function to extract text from JS-enabled webpages
def fetch_url_text(url):
    try:
        chrome_options = Options()
        chrome_options.add_argument("--headless")
        chrome_options.add_argument("--disable-dev-shm-usage")
        chrome_options.add_argument("--no-sandbox")
        driver = webdriver.Chrome(options=chrome_options)

        driver.get(url)
        time.sleep(5)  # Wait for JS to load

        html = driver.page_source
        driver.quit()

        soup = BeautifulSoup(html, "html.parser")
        text = soup.get_text(separator=" ", strip=True)
        text = " ".join(text.split())
        if len(text) < 100:
            return None, "❌ Extracted text is too short to summarize."
        return text, None
    except Exception as e:
        return None, f"❌ URL fetch error: {e}"
# Main function to handle all inputs
def summarize_text(text_input, file_upload, url_input):
    text = ""

    if file_upload:
        try:
            text = file_upload.read().decode("utf-8")
        except Exception as e:
            return f"❌ File read error: {e}"

    elif url_input:
        text, error_msg = fetch_url_text(url_input)
        if error_msg:
            return error_msg

    elif text_input:
        text = text_input

    else:
        return "⚠️ Please provide some input."

    try:
        summary = summarizer(text[:1024], max_length=150, min_length=30, do_sample=False)
        return summary[0]["summary_text"]
    except Exception as e:
        return f"❌ Summarization error: {e}"
demo = gr.Interface(
    fn=summarize_text,
    inputs=[
        gr.Textbox(label="✍️ Enter Text", lines=4, placeholder="Paste or type text here..."),
        gr.File(label="πŸ“„ Upload a .txt File", file_types=[".txt"]),
        gr.Textbox(label="🌐 Enter Webpage URL", placeholder="https://example.com/article")
    ],
    outputs="text",
    title="🧠 Smart Text Summarizer with JS Page Support",
    description="Summarize content from text, files, or JavaScript-rendered webpages using Hugging Face's BART model."
)

demo.launch(share=True)

πŸ” Code Explanation

  1. Selenium for Full Webpage Rendering

     from selenium import webdriver
     from selenium.webdriver.chrome.options import Options
    
    • Configures a headless Chrome browser

    • Loads the page like a real browser would, executing JavaScript

  2. Dynamic Content Extraction

     driver.get(url)
     html = driver.page_source
     soup = BeautifulSoup(html, "html.parser")
    
    • Loads the full page source after JavaScript execution

    • BeautifulSoup extracts readable text from the fully rendered HTML

  3. Summarization Remains the Same

     summary = summarizer(text[:1024], max_length=150, min_length=30)
    
    • Uses the Hugging Face BART model for summarizing the text
  4. Gradio Interface

    • Allows the user to paste a URL, enter text manually, or upload a file

    • Results are shown instantly in the browser

πŸ“š References

  1. BART Model for Summarization

  2. Transformers Library (Hugging Face)

  3. Selenium for Python

    • Official Documentation: https://www.selenium.dev/documentation/webdriver/

    • PyPI Package: https://pypi.org/project/selenium/

    • ChromeDriver Setup: https://chromedriver.chromium.org/

  4. BeautifulSoup (bs4)

    • Documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/
  5. Gradio – Build ML Apps Quickly

πŸ™Œ Wrap-Up

You’ve just built a smart summarizer that works with text, files, and even web pages all without training a model! πŸ’‘ Thanks to BART and Gradio, turning long content into clear, concise summaries is now just a click away.

πŸš€ Try it out, explore its limits, and let AI do the reading for you! Build it. Run it. Summarize it!!!

0
Subscribe to my newsletter

Read articles from Divya Vetriveeran directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Divya Vetriveeran
Divya Vetriveeran

I am currently serving as an Assistant Professor at CHRIST (Deemed to be University), Bangalore. With a Ph.D. in Information and Communication Engineering from Anna University and ongoing post-doctoral research at the Singapore Institute of Technology, her expertise lies in Ethical AI, Edge Computing, and innovative teaching methodologies. I have published extensively in reputed international journals and conferences, hold multiple patents, and actively contribute as a reviewer for leading journals, including IEEE and Springer. A UGC-NET qualified educator with a computer science background, I am committed to fostering impactful research and technological innovation for societal good.