1. Introduction
1.1 Summary
1.2 What is PyPDF2
2. Analyzing code snippet focusing on PdfReader
3. Analyzing code snippet focusing on PdfWriter
3.1 Import
4. Analyzing code snippet focusing on PdfMerger
4.1 Import
4.3 Complete code
1. Introduction
1.1 Summary
How about exploring PDF files with PyPDF2 in Python? In this example, we have a script that is capable of performing several tasks, such as reading PDF files, extracting their information, manipulating images, generating new independent files, merging, etc. I will explain each line of code, believing that I will be as objective as possible, I hope I succeeded.
Note that this code was built during the publication and for this reason you will see more than one image containing the same code snippet (import), but with new information.
1.2 What is PyPDF2
PyPDF2 is a Python library for manipulating PDF files. It allows you to perform various operations with PDF files, such as merging them, extracting text and data, among other features. This library is quite popular among Python developers who need to work with PDF documents in their projects.
1.3 Which auxiliary document will we use
For this example, we will use the PDF file called “Relatório Focus” (Focus Report). This document is a weekly bulletin published by the Central Bank of Brazil that contains economic projections of various macroeconomic indicators. These projections are made by economists from financial institutions, consultancies and companies, and include variables such as inflation, exchange rate, PIB (Produto Interno Bruto), interest rates, among others. If you are interested in reading, access this link.
Below, images from the archive:
1.4 Documentation and repository
Documentation, script1, script2, script3 and my repository.
2. Analyzing code snippet focusing on PdfReader
2.1 Installation and import
#1: Installation of the PyPDF2 library using pip. Enable your virtual environment (source bin/activate).
#2: This is a class to represent file or directory paths independently of the operating system.
#3: This is already a class from the PyPDF2 library used to read PDF files.
2.2 Defining directories
#4: Here, the directories for the original files and the new files are defined.
#5: Sets the path for the input PDF file.
#6: Creates the output directory if it doesn't already exist.
2.3 Reading, extracting and printing
#7: Reads the PDF file and prints the number of pages.
#8: Response: 2.
#9: Extracts text from the first page and prints it.
#10: Response: see image:
#11: Installing "image".
#12: Displays information about the images present on the first page and prints details about the first image.
#13: Response: see image:
#14: Response: File(name=X5.png, data: 1.3 kB).
2.4 Saving the image to another folder
#15: Saves the first image in a new directory.
2.5 Complete code so far
3. Analyzing code snippet focusing on PdfWriter
3.1 Import
#16: Let's write PDF files.
3.2 Generating new single page PDF
#17: Adds the first page to the PdfWriter object.
#18: Opens a new file in the specified directory to write the first page extracted from the PDF.
#19: Writes the extracted page to the opened PDF file.
#20: View image:
3.3 Generating new PDF for each page
#21: Iterates over each page in the original PDF.
#22: Opens a new file in the specified directory to write the current page of the PDF.
#23: Adds the current page to the PdfWriter object.
#24: Writes the current page to the opened PDF file.
#25: View image:
3.4 Complete code so far
4. Analyzing code snippet focusing on PdfMerger
4.1 Import
#26: PDF files merging.
4.2 Merging PDF files
#27: We create a list of paths to the individual PDF files we want to merge.
#28: We initialize a PdfMerger object, which will allow us to merge the PDF files.
#29: Initiates a loop that iterates over each file in the list of files to be merged.
#30: Adds each PDF file to the PdfMerger object. By the end of this loop, all PDF files in the list will be merged into a single file.
#31: Writes the merged PDF file to the NEW_FOLDER directory with the name zmerged.pdf:
#32: Closes the PdfMerger object after the merging operation is completed!
4.3 Complete code
Any doubts, contact me. ;-)