How toTech

How to program PDF in Python?

Program PDF in Python: As we all know that Python has been doing some wonders in the field of software languages. It has been so compatible with the user friendly environment. Python has managed to scale its use to a large extent since its launch in 1991. The Python 1.0 had the module system of Modula-3 and interacted with Amoeba Operating System with varied functioning tools. Python 2.0 introduced in the year 2000 had features of garbage collector and Unicode Support. Python 3.0 introduced in the year 2008 had a constructive design that avoids duplicate modules and constructs. With the added features, now the companies are using Python 3.5.

PDF’s are so effective in day to day use. PDF stands for the Portable Document Format. This was first designed by the Adobe and now has been an open standard maintained by the International Organization for Standardization(ISO). The main advantages of PDF’s are they can obtain links and buttons, form fields, audio and videos. PDF’s can be extracted in Python but we need to use an external tool for it. This external tool is a third party module. This module is an already Python library built one, which is built as PDF tool kit.

Program a PDF in Python

In order to program a PDF in python, firstly we need to install the third party module. This is the PyPDF2. As mentioned above this is an inbuilt library function in Python. This PyPDF2 module can,

  • Extracting document information (title, author, …)
  • Splitting documents page by page
  • Merging documents page by page
  • Cropping pages
  • Merging multiple pages into a single page
  • Encrypting and Decrypting PDF files, and much more.

Installing PyPDF2 from Python:
To install PyPDF2 from Python you need to run the following command from the command line.

pip install PyPDF2

The above module is case sensitive so make sure that you type in exactly as what is mentioned above. Now we will go through the different functions that PyPDF2 can perform in Python. Before that you can go through how you can read a file in Python language by accessing the link below.

How to read a file in Python

Also, if you are unaware of the different options of how you can print a program in Python then you can simply click the below highlighted link to learn more.

How to Print a program in Python

1) Extracting text File from Python

PyPDF2 module helps you to extract a text file from Python. This is simply done by just programming your PDF. Check out the below illustrated example and the output of it.

Extracting PDF file program

Output of the program looks like,

20 
PythonBasics
S.R.Doty
August27,2008
Contents
1 Preliminaries
4
1.1 WhatisPython?...................................
..4
1.2 Installationanddocumentation....................
.........4

2) Rotating PDF pages

PyPDF2 module can help you to rotate your PDF files and images with a simple programming code. Here is a small example you can refer to, to rotate your PDF pages using Python.

#importing the required modules 

import PyPDF2
def PDFrotate(origFileName, newFileName, rotation):
# creating a pdf File object of original pdf 
pdfFileObj = open(origFileName, 'rb') 

# creating a pdf Reader object 
pdfReader = PyPDF2.PdfFileReader(pdfFileObj) 

# creating a pdf writer object for new pdf 
pdfWriter = PyPDF2.PdfFileWriter() 

# rotating each page 
for page in range(pdfReader.numPages): 

    # creating rotated page object 
    pageObj = pdfReader.getPage(page) 
    pageObj.rotateClockwise(rotation) 

    # adding rotated page object to pdf writer 
    pdfWriter.addPage(pageObj) 

# new pdf file object 
newFile = open(newFileName, 'wb') 

# writing rotated pages to new file 
pdfWriter.write(newFile) 

# closing the original pdf file object 
pdfFileObj.close() 

# closing the new pdf file object 
newFile.close() 
def main(): 
# original pdf file name 
origFileName = 'example.pdf'

# new pdf file name 
newFileName = 'rotated_example.pdf'

# rotation angle 
rotation = 270

# calling the PDFrotate function 
PDFrotate(origFileName, newFileName, rotation) 
if name == "main": 
# calling the main function
main()

Now, the Output of the above program appears like,

PDf Rotation using Python

3) Merging PDF Files

Using the PyPDF2 module you can merge your PDF files in Python. Here is a normal program to illustrate how you can merge your PDF files using the Python’s PyPDF2 module.

PDF Merge

The output of the above program is a combined pdf, combined_example.pdf obtained by merging example.pdf and rotated_example.pdf.

4) Splitting PDF File

To split a PDF file using Python you can use the PyPDF2 module available as a built in library in Python. Here is a small example of how you can program to Split a pdf file using Python.

#importing the required modules

import PyPDF2
def PDFsplit(pdf, splits): 
# creating input pdf file object
pdfFileObj = open(pdf, 'rb')
# creating pdf reader object 
pdfReader = PyPDF2.PdfFileReader(pdfFileObj) 

# starting index of first slice 
start = 0

# starting index of last slice 
end = splits[0] 


for i in range(len(splits)+1): 
    # creating pdf writer object for (i+1)th split 
    pdfWriter = PyPDF2.PdfFileWriter() 

    # output pdf file name 
    outputpdf = pdf.split('.pdf')[0] + str(i) + '.pdf'

    # adding pages to pdf writer object 
    for page in range(start,end): 
        pdfWriter.addPage(pdfReader.getPage(page)) 

    # writing split pdf pages to pdf file 
    with open(outputpdf, "wb") as f: 
        pdfWriter.write(f) 

    # interchanging page split start position for next split 
    start = end 
    try: 
        # setting split end positon for next split 
        end = splits[i+1] 
    except IndexError: 
        # setting split end position for last split 
        end = pdfReader.numPages 

# closing the input pdf file object 
pdfFileObj.close() 
def main(): 
# pdf file to split
pdf = 'example.pdf'
# split page positions 
splits = [2,4] 

# calling PDFsplit function to split pdf 
PDFsplit(pdf, splits) 
if name == "main": 
# calling the main function
main()

The output of the above program will be
Three new PDF files with split 1 (page 0,1), split 2(page 2,3), split 3(page 4-end).

5) Adding Watermarks to PDF

Using the PyPDF2 module in Python language you can even add watermarks to your PDF files. Here it is how you can add Watermarks to your PDF using a simple program.

#importing the required modules
import PyPDF2
def add_watermark(wmFile, pageObj): 
# opening watermark pdf file
wmFileObj = open(wmFile, 'rb')
# creating pdf reader object of watermark pdf file 
pdfReader = PyPDF2.PdfFileReader(wmFileObj) 

# merging watermark pdf's first page with passed page object. 
pageObj.mergePage(pdfReader.getPage(0)) 

# closing the watermark pdf file object 
wmFileObj.close() 

# returning watermarked page object 
return pageObj 
def main(): 
# watermark pdf file name
mywatermark = 'watermark.pdf'
# original pdf file name 
origFileName = 'example.pdf'

# new pdf file name 
newFileName = 'watermarked_example.pdf'

# creating pdf File object of original pdf 
pdfFileObj = open(origFileName, 'rb') 

# creating a pdf Reader object 
pdfReader = PyPDF2.PdfFileReader(pdfFileObj) 

# creating a pdf writer object for new pdf 
pdfWriter = PyPDF2.PdfFileWriter() 

# adding watermark to each page 
for page in range(pdfReader.numPages): 
    # creating watermarked page object 
    wmpageObj = add_watermark(mywatermark, pdfReader.getPage(page)) 

    # adding watermarked page object to pdf writer 
    pdfWriter.addPage(wmpageObj) 

# new pdf file object 
newFile = open(newFileName, 'wb') 

# writing watermarked pages to new file 
pdfWriter.write(newFile) 

# closing the original pdf file object 
pdfFileObj.close() 
# closing the new pdf file object 
newFile.close() 
if name == "main": 
# calling the main function
main()

Output of the above program looks like

Watermark PDF

The left side picture is the original one before programming it to add a watermark. The right side picture is the watermark added to the program after programming it with PyPDF2 module.

Note: While PDF files are great for laying out text in a way that’s easy for people to print and read, they’re not straightforward for software to parse into plaintext. As such, PyPDF2 might make mistakes when extracting text from a PDF and may even be unable to open some PDFs at all. There isn’t much you can do about this, unfortunately. PyPDF2 may simply be unable to work with some of your particular PDF files.

FAQ’s

In response to the many questions that have been frequently asked on Web in relation to PDF’s and Python language. We’ve handpicked the top 3 FAQ’s that can clear up your minds. Scroll down below and refer them.

How do I clean up a PDF File?

a) Open the document to be ‘cleaned’ in Adobe Acrobat.
b) Select Advanced > PDF Optimizer.
c) In the PDF Optimizer dialog, select Clean Up.
d) Ensure that the appropriate boxes for your Clean Up are checked.
e) Check that the selected Acrobat version in the ‘Make compatible with’ list is correctly set.
f) Click ‘OK’.

What is readline in Python

In addition to the for loop, Python provides three methods to read data from the input file. The readline method reads one line from the file and returns it as a string. The string returned by readline will contain the newline character at the end. To know how to read a File in Python, you can visit
How to read a file in python

What is pickling and unpickling in Python?

Pickling in python refers to the process of serializing objects into binary streams, while unpickling is the inverse of that. It’s called that because of the pickle module in Python which implements the methods to do this.

To conclude: Hence through this article, we have learnt how to program a PDF in Python using the PyPDF2 module. Comment your views below in the comment section and do share the content. For some more of interesting and exciting stuff visit Morphigo.com and delve into the coolest tech article content on web.

How useful was this post?

Click on a star to rate it!

Average rating / 5. Vote count:

No votes so far! Be the first to rate this post.

We are sorry that this post was not useful for you!

Let us improve this post!

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button
Close