Extract Text from PowerPoint in Python: Quick and Detailed Guide
Do you ever feel frustrated navigating through a lengthy PowerPoint presentation just to find key information? Or maybe you need to repurpose the content for another project. The easiest solution is to extract the text from the slides. Since the text is often spread across multiple text boxes, manually copying it can be time-consuming. Fortunately, using code to complete this task is simple and fast. This article will show you how to extract text from a PowerPoint presentation using Python, saving you both time and effort.
Python Library to Extract Text from PowerPoint
In this guide, we recommend Spire.Presentation for Python to finish the task. It is a professional Python library with comprehensive features. This tool allows users to create, edit, and convert PowerPoint presentations without Microsoft Office. Extracting text from presentations is also viable.
You can install it with the PyPI command:
pip install Spire.Presentation
Extract Text from PowerPoint: Slides
Typically, a PowerPoint presentation contains text in three main areas: slides, speaker notes, and comments. In this section, we will first focus on how to extract text from PowerPoint slides. Since a presentation usually consists of multiple slides, each containing various shapes, especially text boxes, it’s essential to loop through all the slides and shapes to retrieve the text. Let's dive into the detailed coding steps.
Steps to get text from PowerPoint slides:
Create an object of the Presentation class, and use the Presentation.LoadFromFile() method to load a PowerPoint file.
Iterate through each slide and shape on slides, then check if the slide is IAutoShape class.
If it is, extract text from PowerPoint using the IAutoShape.TextFrame.Paragraphs property and append the text to the list.
Export the list to a text file and release the resource.
Here is a code example of extracting text from PowerPoint slides:
from spire.presentation import *
from spire.presentation.common import *
# Create an object of Presentation class
pres = Presentation()
# Load a PowerPoint presentation
pres.LoadFromFile("E:/Administrator/Python1/input/pre1.pptx")
text = []
# Loop through each shape
for shape in slide.Shapes:
# Check if the shape is an IAutoShape instance
if isinstance(shape, IAutoShape):
# Extract the text from the shape
for paragraph in shape.TextFrame.Paragraphs:
text.append(paragraph.Text)
# Write the text to a text file
f = open("E:/Administrator/Python1/output/SlideText.txt","w", encoding = 'utf-8')
for s in text:
f.write(s + "\n")
f.close()
# Dispose of presentation object
pres.Dispose()
Get Text from PowerPoint: Speaker Notes
Speaker notes are a crucial component of a presentation, aiding the presenter in delivering content and key points clearly and fluently, thus preventing any disruptions or missed information. As a result, important details can often be found within these notes. In this chapter, we will explore how to use Python to extract text from the speaker notes. Let’s take a closer look.
Steps to get text from speaker notes:
Create an instance of the Presentation class, and open a PowerPoint presentation from the disk using the Presentation.LoadFromFile() method.
Create a list to store information extracted from the presentation.
Loop through slides in the PowerPoint presentation and find the slide with notes with the Slides.NotesSlide property.
Get text from PowerPoint speaker notes using the NotesSlide.NotesTextFrame.Text property, and append them to the list.
Write the notes in the list to a text file and release the resource.
Below is the code example of exporting text from PowerPoint speaker notes:
from spire.presentation.common import *
# Create an object of Presentation class
pres = Presentation()
# Load a PowerPoint presentation from file
pres.LoadFromFile("E:/Administrator/Python1/input/pre1.pptx")
list = []
# Iterate through each slide
for slide in pres.Slides:
# Get the notes slide
notesSlide = slide.NotesSlide
# Get the notes
notes = notesSlide.NotesTextFrame.Text
list.append(notes)
# Write the notes to a text file
f = open("E:/Administrator/Python1/output/SpeakerNoteText.txt", "w", encoding="utf-8")
for note in list:
f.write(note)
f.write("\n")
f.close()
# Release the resource
pres.Dispose()
How to Extract All Text from PowerPoint: Comments
When reviewing or collaborating on a PowerPoint presentation, comments are often used to provide feedback, suggest improvements, or clarify certain points. These comments can contain valuable information. The section will illustrate how to extract text from comments in PowerPoint presentations.
Steps to get text from PowerPoint comments:
Instantiate a Presentation class and load a PowerPoint presentation from the file path with the Presentation.LoadFromFile() method.
Iterate through all slides in the presentation.
Retrieve all comments on the slide by accessing the Comments property.
Iterate over each comment object within the comments collection, extracting the text of each comment using the Comments.Text property.
Append the extracted comment text to the list.
Save the comment text from the PowerPoint presentation as a text file.
Here is an example of extracting text from comments in a PowerPoint presentation:
from spire.presentation import *
from spire.presentation.common import *
# Create an object of Presentation class
pres = Presentation()
# Load a PowerPoint presentation from file
pres.LoadFromFile("E:/Administrator/Python1/input/pre1.pptx")
list = []
# Iterate through all slides
for slide in pres.Slides:
# Get all comments from the slide
comments = slide.Comments
# Iterate through the comments
for comment in comments:
# Get the comment text
commentText = comment.Text
list.append(commentText)
# Write the comments to a text file
f = open("E:/Administrator/Python1/output/CommentText.txt", "w", encoding="utf-8")
for i in range(len(list)):
f.write(list[i] + "\n")
f.close()
# Release resources
pres.Dispose()
The Conclusion
This guide has walked you through how to extract text from PowerPoint presentations, whether it’s from slides, speaker notes, or comments. With step-by-step instructions and Python code examples, you'll now be equipped to easily handle these tasks on your own. We hope this article has helped simplify your workflow and save you valuable time!
Subscribe to my newsletter
Read articles from Casie Liu directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by