A PDF object.
More...
|
enum | PDInstanceType {
PDInstanceTypeUnset =-2,
PDInstanceTypeUnknown =-1,
PDInstanceTypeNull = 0,
PDInstanceTypeNumber = 1,
PDInstanceTypeString = 2,
PDInstanceTypeArray = 3,
PDInstanceTypeDict = 4,
PDInstanceTypeRef = 5,
PDInstanceTypeObj = 6,
PDInstanceTypeParser = 7,
PDInstanceTypePipe = 8,
PDInstanceTypeScanner = 9,
PDInstanceTypeCStream = 10,
PDInstanceTypeOStream = 11,
PDInstanceTypeOperator = 12,
PDInstanceTypePage = 13,
PDInstanceTypeParserAtt = 14,
PDInstanceTypeTree = 15,
PDInstanceTypeState = 16,
PDInstanceTypeSFilter = 17,
PDInstanceTypeTask = 18,
PDInstanceType2Stream = 19,
PDInstanceTypeXTable = 20,
PDInstanceTypeCSOper = 21,
PDInstanceTypeFontDict = 22,
PDInstanceTypeFont = 23,
PDInstanceTypeCMap = 24,
PDInstanceTypePSExec = 25,
PDInstanceTypeDictStack = 26,
PDInstanceType__SIZE = 27
} |
|
enum | PDObjectType {
PDObjectTypeNull = 0,
PDObjectTypeUnknown = 1,
PDObjectTypeBoolean,
PDObjectTypeInteger,
PDObjectTypeReal,
PDObjectTypeName,
PDObjectTypeString,
PDObjectTypeArray,
PDObjectTypeDictionary,
PDObjectTypeStream,
PDObjectTypeReference,
PDObjectTypeSize
} |
|
enum | PDObjectClass { PDObjectClassRegular = 1,
PDObjectClassCompressed = 2,
PDObjectClassTrailer = 3
} |
|
|
void | PDObjectSkipStream (PDObjectRef object) |
|
void | PDObjectSetStream (PDObjectRef object, char *str, PDInteger len, PDBool includeLength, PDBool allocated, PDBool encrypted) |
|
PDBool | PDObjectSetStreamFiltered (PDObjectRef object, char *str, PDInteger len, PDBool encrypted) |
|
void | PDObjectSetFlateDecodedFlag (PDObjectRef object, PDBool state) |
|
void | PDObjectSetPredictionStrategy (PDObjectRef object, PDPredictorType strategy, PDInteger columns) |
|
void | PDObjectSetStreamEncrypted (PDObjectRef object, PDBool encrypted) |
|
A PDF object.
Objects in PDFs range from simple numbers indicating the length of some stream somewhere, to streams, images, and so on. In fact, the only things other than objects in a PDF, at the root level, are XREF (cross reference) tables, trailers, and the "startxref" marker.
Pajdeg objects are momentarily mutable.
- Warning
- Pajdeg object mutability expires. If you are attempting to modify a PDObjectRef instance and it's not reflected in the resulting PDF, you may be updating the object too late.
What this means is that the objects can, with a few exceptions (see below paragraph), be modified at the moment of creation, and the modifications will be reflected in the resulting PDF document. An object can also be kept around indefinitely (by retaining it), but will at a certain point silently become immutable (changes made to the object instance will update the object itself, but the resulting PDF will not have the changes).
Objects that are always immutable are:
- The PDParserRef's root object. To modify the root object, check its object ID and add a filter task.
- Any object fetched via PDParserLocateAndCreateDefinitionForObject() or PDParserLocateAndCreateObject(). Same deal; add a filter and mutator.
Pinpointing mutability expiration
A mutable object is mutable for as long as it has not been written to the output file. This happens as soon as the parser iterates to the next object via PDParserIterate(), and this will happen as soon as all the tasks triggered for the object finish executing.
In other words, an object can be kept mutable forever by simply having a task do
or, an arguably more useful example, an asynchronous operation can be triggered by simply keeping the task waiting for some flag, e.g.
{
do_asynchronous_thing(object, &asyncDone);
while (! asyncDone) sleep(1);
}
{
}
{
*asyncDone = 1;
}
Object internal structure
Data Fields |
PDInteger | obid |
| object id
|
|
PDInteger | genid |
| generation id
|
|
PDObjectClass | obclass |
| object class (regular, compressed, or trailer)
|
|
PDObjectType | type |
| data structure of def below
|
|
pd_stack | def |
| the object content
|
|
void * | inst |
| instance of def, or NULL if not yet instantiated
|
|
PDBool | hasStream |
| if set, object has a stream
|
|
PDInteger | streamLen |
| length of stream (if one exists)
|
|
PDInteger | extractedLen |
| length of extracted stream; -1 until stream has been fetched via the parser
|
|
char * | streamBuf |
| the stream, if fetched via parser, otherwise an undefined value
|
|
PDBool | skipStream |
| if set, even if an object has a stream, the stream (including keywords) is skipped when written to output
|
|
PDBool | skipObject |
| if set, entire object is discarded
|
|
PDBool | deleteObject |
| if set, the object's XREF table slot is marked as free
|
|
char * | ovrStream |
| stream override
|
|
PDInteger | ovrStreamLen |
| length of ^
|
|
PDBool | ovrStreamAlloc |
| if set, ovrStream will be free()d by the object after use
|
|
char * | ovrDef |
| definition override
|
|
PDInteger | ovrDefLen |
| take a wild guess
|
|
PDBool | encryptedDoc |
| if set, the object is contained in an encrypted PDF; if false, PDObjectSetStreamEncrypted is NOP
|
|
char * | refString |
| reference string, cached from calls to
|
|
PDSynchronizer | synchronizer |
| synchronizer callback, called right before the object is serialized and written to the output stream
|
|
const void * | syncInfo |
| user info object for synchronizer callback (usually a class instance, for wrappers)
|
|
pd_crypto | crypto |
| crypto object, if available
|
|
PDCryptoInstanceRef | cryptoInstance |
| crypto instance, if set up
|
|
The PD instance type of a value.
The class of an object;
Enumerator |
---|
PDObjectClassRegular |
A regular object in a PDF.
|
PDObjectClassCompressed |
An object inside of an object stream.
|
PDObjectClassTrailer |
A trailer.
|
The type of object.
- Note
- This enum is matched with CGPDFObject's type enum (Core Graphics), with extensions
- Warning
- Not all types are currently used.
Enumerator |
---|
PDObjectTypeNull |
Null object, often used to indicate the type of a dictionary entry for a key that it doesn't contain.
|
PDObjectTypeUnknown |
The type of the object has not (yet) been determined.
|
PDObjectTypeBoolean |
A boolean.
|
PDObjectTypeInteger |
An integer.
|
PDObjectTypeReal |
A real (internally represented as a float).
|
PDObjectTypeName |
A name. Names in PDFs are things that begin with a slash, e.g. /Info.
|
PDObjectTypeString |
A string.
|
PDObjectTypeArray |
An array.
|
PDObjectTypeDictionary |
A dictionary. Most objects are considered dictionaries.
|
PDObjectTypeStream |
A stream.
|
PDObjectTypeReference |
A reference to another object.
|
PDObjectTypeSize |
A size (not in CGPDFObject type)
|
Create an object from a definitions stack (e.g. fetched via PDParserLocateAndCreateDefinitionForObject()).
- Warning
- The object is always immutable.
- Parameters
-
defs | The definitions for the object. |
- Returns
- An immutable object instance based on the defs.
Delete this object, thus excluding it from the output PDF file, and marking it as freed in the XREF table.
- Parameters
-
object | The object to remove. |
Attempt to determinee the type of the object based on its definitions stack.
Generates an object definition up to and excluding the stream definition, from "<obid> <genid> obj" to right before "endobj" or "stream" depending on whether a stream exists or not.
The results are written into dstBuf, reallocating it if necessary (i.e. it must be a valid allocation and not a point inside a heap).
- Note
- This method ignores definition replacements via PDObjectReplaceWithString().
- Parameters
-
object | The object. |
dstBuf | Pointer to buffer into which definition should be written. Must be a proper allocation. |
capacity | The number of bytes allocated into *dstBuf already. |
- Returns
- Bytes written.
Get the array of the object, or NULL if the object does not have an array.
- Parameters
-
- Returns
- PDArray instance for the object
Get the instance type of the object.
- Parameters
-
- Returns
- Instance type value. PDInstanceTypeUnknown is returned if the instance type could not be determined. Get the instance for the object's definition. The instance is a PDDictionary, PDArray, PDString, etc. depending on what the object's definition looks like.
- Parameters
-
- Returns
- Appropriate object type. Use PDResolve() to determine its type if unsure. Get the dictionary of the object, or NULL if the object does not have a dictionary.
- Parameters
-
- Returns
- PDDictionary instance for the object
Determine the extracted length of the previously fetched object stream.
This can be compared to the size of a file.txt after decompressing a file.txt.gz.
- Warning
- Assertion thrown if the object stream has not been fetched before this call.
- Parameters
-
Get generation ID for an object.
- Parameters
-
Get object ID for object.
- Parameters
-
Get the obstream-flag for this object.
The obstream-flag is true for objects which are embedded inside of other objects, as a part of an object stream.
- Parameters
-
- Returns
- true if the object is in an object stream.
Determine the raw (unextracted) length of the object stream.
This can be compared to the size of a file.txt.gz.
- Parameters
-
const char* PDObjectGetReferenceString |
( |
PDObjectRef |
object | ) |
|
Get reference string for this object.
- Parameters
-
Get type of an object.
- Parameters
-
- Returns
- The PDObjectType of the object.
- Note
- Types are restricted to PDObjectTypeUnknown, PDObjectTypeDictionary, and PDObjectTypeString in the current implementation.
Fetch the value of the given object, as an instantiation of the appropriate type, or as a char* if the object is represented as a pure string (this is the case for some constants, such as null).
- Parameters
-
- Returns
- The appropriate instance type. Use PDResolve() to determine its type.
Determine if the object has a stream or not.
- Parameters
-
Determine if the object's stream is text or binary data.
This is determined by looking at the first 10 (or all, if length <= 10) bytes and seeing 80% or more of them are defined text characters. If this is the case, true is returned. The very last byte must also be 0 (the string must be NULL-terminated).
- Warning
- Assertion thrown if the object stream has not been fetched before this call.
- Parameters
-
- Returns
- true if the object's stream is text-based, false otherwise
Replaces the entire object's definition with the given string of the given length; does not replace the stream and the caller is responsible for asserting that the /Length key is preserved; if the stream was turned off, this may include a stream element by abiding by the PDF specification, which requires that
- the object is a dictionary, and has a /Length key with the exact length of the stream (excluding the keywords and newlines wrapping it),
- the keyword is directly below the object dictionary on its own line followed by the stream content, and
- followed by right after the stream length (extraneous whitespace is allowed between the content's last byte and the 'endstream' keyword's beginning).
Also note that filters and encodings are often used, but not required.
- Parameters
-
object | The object. |
str | The replacement string. |
len | The length of the replacement string. |
Enable or disable compression (FlateDecode) filter flag for the object stream.
- Note
- Passing false to the state will remove the Filter and DecodeParms dictionary entries from the object.
- Parameters
-
object | The object. |
state | Boolean value of whether the stream is compressed or not. |
Define prediction strategy for the stream.
- Warning
- Pajdeg currently only supports PDPredictorNone and PDPredictorPNG_UP. Updating an existing stream (e.g. fixing its predictor values) is possible, however, but replacing the stream or requiring Pajdeg to predict the content in some other way will cause an assertion.
- Parameters
-
object | The object. |
strategy | The PDPredictorType value. |
columns | Columns value. |
- See also
- PDPredictorType
-
PDStreamFilterPrediction.h
Replaces the stream with given data.
- Note
- The stream is inserted as is, with no filtering applied to it whatsoever. To insert a filtered stream, e.g. FlateDecoded, use PDObjectSetStreamFiltered() instead.
- Parameters
-
object | The object. |
str | The stream data. |
len | The length of the stream data. |
includeLength | Whether the object's /Length entry should be updated to reflect the new stream content length. |
allocated | Whether str should be free()d after the object is done using it. |
encrypted | If true, str is presumed to be already encrypted (e.g. copied from original PDF or pre-encrypted); if false, Pajdeg will encrypt the string before inserting it into the pipe. If the PDF is not encrypted, this argument has no effect |
Sets the encrypted flag for the object's stream.
- Warning
- If the document is encrypted, and the stream is not, this must be set to false or the stream will not function properly
- Parameters
-
object | The object. |
encrypted | Whether or not the stream is encrypted. |
Replaces the stream with given data, filtered according to the object's /Filter and /DecodeParams settings.
- Note
- Pajdeg only supports a limited number of filters. If the object's filter settings are not supported, the operation is aborted.
-
If no filter is defined, PDObjectSetStream is called and and true is returned.
- See also
- PDObjectSetStream
-
PDObjectSetFlateDecodedFlag
-
PDObjectSetPredictionStrategy
- Warning
- str is not freed.
- Parameters
-
object | The object. |
str | The stream data. |
len | The length of the stream data. |
encrypted | If true, str is presumed to be already encrypted (e.g. copied from original PDF or pre-encrypted); if false, Pajdeg will encrypt the string before inserting it into the pipe. If the PDF is not encrypted, this argument has no effect |
- Returns
- Success value. If false is returned, the stream remains unset.
Set the synchronization callback, called right before the parser serializes the object and writes it to the output stream.
- Note
- Only one callback is supported.
- Parameters
-
object | The object. |
callback | The synchronization callback. |
syncInfo | The user info object to be passed as the final parameter to the callback. |
Explicitly set the type of an object.
- Parameters
-
object | The object whose type should be (re)defined. |
type | The (new) object type. |
void PDObjectSetValue |
( |
PDObjectRef |
object, |
|
|
void * |
value |
|
) |
| |
Set the value of the given object.
- Note
- If object is non-primitive (e.g. dictionary), this operation will at best leak memory and at worst crash.
- Parameters
-
object | The object |
The | new value of the primitive (string, integer, real, ...) object. |
Removes the stream from the object.
The stream will be skipped when written. This has no effect if the object had no stream to begin with.
- Parameters
-
Undelete this object.
If PDObjectDelete() was called previously, calling this method will cancel the deletion.
The object may end up deleted anyway, if the pipe has moved past the object definition since the delete call.