Question:
Fix error while retrieving pages from a pdf using convert_from_path (pdf2image)

Problem:

I am retrieving pages from a pdf using convert_from_path (pdf2image) This is the error i am facing:


<ipython-input-45-4ebf020b9136> in <cell line: 1>()

      1 for pdf in list_of_pdfs:

----> 2   images = convert_from_path(pdf,first_page= 1,last_page=2)


2 frames

/usr/local/lib/python3.10/dist-packages/pdf2image/pdf2image.py in convert_from_path(pdf_path, dpi, output_folder, first_page, last_page, fmt, jpegopt, thread_count, userpw, ownerpw, use_cropbox, strict, transparent, single_file, output_file, poppler_path, grayscale, size, paths_only, use_pdftocairo, timeout, hide_annotations)

    266                 )

    267             else:

--> 268                 images += parse_buffer_func(data)

    269     finally:

    270         if auto_temp_dir:


/usr/local/lib/python3.10/dist-packages/pdf2image/parsers.py in parse_buffer_to_ppm(data)

     26         size_x, size_y = tuple(size.split(b" "))

     27         file_size = len(code) + len(size) + len(rgb) + 3 + int(size_x) * int(size_y) * 3

---> 28         images.append(Image.open(BytesIO(data[index : index + file_size])))

     29         index += file_size

     30 


/usr/local/lib/python3.10/dist-packages/PIL/Image.py in open(fp, mode, formats)

   3281                 raise

   3282         return None

-> 3283 

   3284     im = _open_core(fp, filename, prefix, formats)

   3285 


UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7820221cd290>


Here is the code I am using :


import io

from io import BytesIO

from PIL import Image

from pdf2image import convert_from_path


pdf_list = ['path_to_pdf.pdf','path_to_pdf2.pdf']

for pdf in pdf_list:

  images = convert_from_path(pdf,first_page= 1,last_page=2)

This code was working perfectly fine a few days back. I am not sure what broke now. I can't figure out why it fails for me.


Solution: 

You can figure it our through exception handling like this.


from pdf2image.exceptions import PDFInfoNotInstalledError

from pdf2image.exceptions import PDFPageCountError

from pdf2image.exceptions import PDFSyntaxError


pdf_list = ['path_to_pdf.pdf', 'path_to_pdf2.pdf']


for pdf in pdf_list:

    try:

        images = convert_from_path(pdf, first_page=1, last_page=2)

  

    except PDFInfoNotInstalledError as e:

        print(f"PDFInfoNotInstalledError: {e}")

    except PDFPageCountError as e:

        print(f"PDFPageCountError: {e}")

    except PDFSyntaxError as e:

        print(f"PDFSyntaxError: {e}")

    except Exception as e:

        print(f"An error occurred: {e}")


Suggested blogs:

>Complete guide on Life Cycle of Angular Component

>Build a minimal API using ASP.NET Core with Android Studio

>Complete guide to Perform crud operation in angular using modal popup

>5 easy ways to Repair the .NET Framework on Windows

>Adding new column/columns to the existing table in a migration-Laravel

>Authentication with Vue 3 and Firebase

>Create a Vue.js application with CLI Services


Ritu Singh

Ritu Singh

Submit
0 Answers