The first article of our PDF Optimization In-depth Series is available here.

The PDF format is interactive.

During the release cycle of a PDF document, different people will use tools such as forms, annotations, attachments, and more.
Archived PDF documents generally do not have the same use as those in circulation in a collaborative context. The set of data generated by the interactivity instruments will no longer be useful once the document is archived.

Even if the document is pending, it may be interesting to optimize it before distribution. The reasons are, for example, not to exceed the size limit of the attached files on some platforms or to speed up the opening from mobiles or tablets.

Deleting content deemed unnecessary

The most obvious approach here is to remove the interactive content that is not required by the audience of the document.
At the same, once the document is archived, it may not be useful at all to keep some data from a file. The best candidates for such a lossless optimization are:

  • File attachments – attached files definitively increase the file size. Removal of those not relevant for viewing does not affect the document’s content.
  • Bookmarks, hyperlinks – these elements are convenient and allow easy navigation, but they are not essential to view the PDF file correctly.
  • Annotations, form fields, JavaScript actions – the contents of such elements, which aren’t in use anymore, can be deleted from the document.
  • Page thumbnails – thumbnails images, when stored inside the document, enable faster navigation. But they are still rendered in real-time after removal.
  • Metadata – these can sometimes be very bulky as they also include any type of data like photos, files, and more. Here you should be more careful. Metadata may contain information useful for indexing the document if necessary.
  • Color profiles – it is the content intended for the printing chain used by the printers.

You can select which content you will remove depending on your needs.
These options are independent of each other.
With the GdPicturePDFReducer class, it is easy and straightforward.

Deleting unused objects and unused content

Version 1.4 of the PDF specification introduces support for incremental updates.
It is a method for saving new updates to a PDF document without completely rewriting it. The content of a document is updating gradually without the need to regenerate existing data. The changes are added to the end of the file, leaving the original content unaltered.