Pajdeg  0.2.2
Pajdeg
Add metadata diff example

This demonstrates what the modifications Pajdeg make to a PDF end up looking like, byte-wise. You can check this out yourself by doing diff -a on the original and new PDF files after using Pajdeg.

diff -a ../testpdf.pdf out.pdf
9a10,15
> 21 0 obj
> <</Length 12 >>
> stream
> Hello World!
> endstream
> endobj

First off, we see a number of lines added to the new PDF that weren't in the old one. It's an object (ID 21, generation number 0) with a stream whose length is 12 bytes, and the stream itself consists of the 12 characters "Hello World!".

133c139
< <</Fini 20 0 R /Type /Catalog /Pages 10 0 R /OpenAction [ 1 0 R /XYZ
null null 0 ] /Lang (ja-JP) >>
---
> <</Metadata 21 0 R /Fini 20 0 R /Type /Catalog /Pages 10 0 R
/OpenAction [ 1 0 R /XYZ null null 0 ] /Lang (ja-JP) >>
209c215

Next up we see a replaced PDF dictionary. If you look closer, you'll notice that the only change is that /Metadata 21 0 R was added to the new PDF.

< 0 21
---
> 0 22
211c217
< 0000009299 00000 n
---
> 0000009361 00000 n
212a219,236
> 0000000258 00000 n
> 0000009505 00000 n
> 0000000278 00000 n
> 0000000452 00000 n
> 0000009649 00000 n
> 0000000472 00000 n
> 0000000646 00000 n
> 0000009793 00000 n
> 0000000666 00000 n
> 0000008573 00000 n
> 0000008595 00000 n
> 0000008796 00000 n
> 0000009097 00000 n
> 0000009273 00000 n
> 0000009306 00000 n
> 0000009905 00000 n
> 0000010038 00000 n
> 0000010225 00000 n
214,230d237
< 0000009443 00000 n
< 0000000216 00000 n
< 0000000390 00000 n
< 0000009587 00000 n
< 0000000410 00000 n
< 0000000584 00000 n
< 0000009731 00000 n
< 0000000604 00000 n
< 0000008511 00000 n
< 0000008533 00000 n
< 0000008734 00000 n
< 0000009035 00000 n
< 0000009211 00000 n
< 0000009244 00000 n
< 0000009843 00000 n
< 0000009959 00000 n
< 0000010146 00000 n
232c239

At the very top, 0 21 is replaced with 0 22. This is the XREF (cross reference) header, which is changed because the PDF has an extra object (our metadata object). This follows by a chunk of lines being replaced. These are XREF entries, and almost all of them have been updated, because almost every single object's byte position in the PDF has changed. That's unfortunate but normal. It can be alleviated to some extent when making small modifications by opting for the PDParserCreateAppendedObject() method over PDParserCreateNewObject(), but shouldn't be thought too hard into.

< <</Size 21 /Root 18 0 R /Info 19 0 R /ID [
<967EA1728C3CC524000105A6C8F88744> <967EA1728C3CC524000105A6C8F88744>
] /DocChecksum /253428E6A53FF28ECA00A27FE9073CE9 >>
---
> <</Size 22 /Root 18 0 R /Info 19 0 R /ID [
<967EA1728C3CC524000105A6C8F88744> <967EA1728C3CC524000105A6C8F88744>
] /DocChecksum /253428E6A53FF28ECA00A27FE9073CE9 >>
234c241

Next up is the trailer object which has been updated as well: /Size has been set to 22, because we now have 22 objects in the PDF.

That's the whole diff. Chances are you won't get quite as clean results, because the original PDF here originated from Pajdeg, which often is not the case.

Back to Adding metadata to a PDF