Pajdeg  0.2.2
Pajdeg
Files | Data Structures | Typedefs | Enumerations
PDObject

A PDF object. More...

Files

file  PDObject.h
 

Data Structures

struct  PDObject
 

Typedefs

typedef struct PDObjectPDObjectRef
 

Enumerations

enum  PDInstanceType {
  PDInstanceTypeUnset =-2, PDInstanceTypeUnknown =-1, PDInstanceTypeNull = 0, PDInstanceTypeNumber = 1,
  PDInstanceTypeString = 2, PDInstanceTypeArray = 3, PDInstanceTypeDict = 4, PDInstanceTypeRef = 5,
  PDInstanceTypeObj = 6, PDInstanceTypeParser = 7, PDInstanceTypePipe = 8, PDInstanceTypeScanner = 9,
  PDInstanceTypeCStream = 10, PDInstanceTypeOStream = 11, PDInstanceTypeOperator = 12, PDInstanceTypePage = 13,
  PDInstanceTypeParserAtt = 14, PDInstanceTypeTree = 15, PDInstanceTypeState = 16, PDInstanceTypeSFilter = 17,
  PDInstanceTypeTask = 18, PDInstanceType2Stream = 19, PDInstanceTypeXTable = 20, PDInstanceTypeCSOper = 21,
  PDInstanceTypeFontDict = 22, PDInstanceTypeFont = 23, PDInstanceTypeCMap = 24, PDInstanceTypePSExec = 25,
  PDInstanceTypeDictStack = 26, PDInstanceType__SIZE = 27
}
 
enum  PDObjectType {
  PDObjectTypeNull = 0, PDObjectTypeUnknown = 1, PDObjectTypeBoolean, PDObjectTypeInteger,
  PDObjectTypeReal, PDObjectTypeName, PDObjectTypeString, PDObjectTypeArray,
  PDObjectTypeDictionary, PDObjectTypeStream, PDObjectTypeReference, PDObjectTypeSize
}
 
enum  PDObjectClass { PDObjectClassRegular = 1, PDObjectClassCompressed = 2, PDObjectClassTrailer = 3 }
 

Creating and deleting

PDObjectRef PDObjectCreateFromDefinitionsStack (PDInteger obid, pd_stack defs)
 
void PDObjectSetSynchronizationCallback (PDObjectRef object, PDSynchronizer callback, const void *syncInfo)
 
void PDObjectDelete (PDObjectRef object)
 
void PDObjectUndelete (PDObjectRef object)
 

Examining

PDInteger PDObjectGetObID (PDObjectRef object)
 
PDInteger PDObjectGetGenID (PDObjectRef object)
 
PDBool PDObjectGetObStreamFlag (PDObjectRef object)
 
const char * PDObjectGetReferenceString (PDObjectRef object)
 
PDObjectType PDObjectGetType (PDObjectRef object)
 
PDObjectType PDObjectDetermineType (PDObjectRef object)
 
void PDObjectSetType (PDObjectRef object, PDObjectType type)
 
PDBool PDObjectHasStream (PDObjectRef object)
 
PDInteger PDObjectGetRawStreamLength (PDObjectRef object)
 
PDInteger PDObjectGetExtractedStreamLength (PDObjectRef object)
 
PDBool PDObjectHasTextStream (PDObjectRef object)
 
char * PDObjectGetStream (PDObjectRef object)
 
void * PDObjectGetValue (PDObjectRef object)
 
void PDObjectSetValue (PDObjectRef object, void *value)
 
PDDictionaryRef PDObjectGetDictionary (PDObjectRef object)
 
PDArrayRef PDObjectGetArray (PDObjectRef object)
 

Miscellaneous

void PDObjectReplaceWithString (PDObjectRef object, char *str, PDInteger len)
 

PDF stream support

void PDObjectSkipStream (PDObjectRef object)
 
void PDObjectSetStream (PDObjectRef object, char *str, PDInteger len, PDBool includeLength, PDBool allocated, PDBool encrypted)
 
PDBool PDObjectSetStreamFiltered (PDObjectRef object, char *str, PDInteger len, PDBool encrypted)
 
void PDObjectSetFlateDecodedFlag (PDObjectRef object, PDBool state)
 
void PDObjectSetPredictionStrategy (PDObjectRef object, PDPredictorType strategy, PDInteger columns)
 
void PDObjectSetStreamEncrypted (PDObjectRef object, PDBool encrypted)
 

Conversion

PDInteger PDObjectGenerateDefinition (PDObjectRef object, char **dstBuf, PDInteger capacity)
 
PDInteger PDObjectPrinter (void *inst, char **buf, PDInteger offs, PDInteger *cap)
 

Detailed Description

A PDF object.

Objects in PDFs range from simple numbers indicating the length of some stream somewhere, to streams, images, and so on. In fact, the only things other than objects in a PDF, at the root level, are XREF (cross reference) tables, trailers, and the "startxref" marker.

Pajdeg objects are momentarily mutable.

Warning
Pajdeg object mutability expires. If you are attempting to modify a PDObjectRef instance and it's not reflected in the resulting PDF, you may be updating the object too late.

What this means is that the objects can, with a few exceptions (see below paragraph), be modified at the moment of creation, and the modifications will be reflected in the resulting PDF document. An object can also be kept around indefinitely (by retaining it), but will at a certain point silently become immutable (changes made to the object instance will update the object itself, but the resulting PDF will not have the changes).

Objects that are always immutable are:

  1. The PDParserRef's root object. To modify the root object, check its object ID and add a filter task.
  2. Any object fetched via PDParserLocateAndCreateDefinitionForObject() or PDParserLocateAndCreateObject(). Same deal; add a filter and mutator.

Pinpointing mutability expiration

A mutable object is mutable for as long as it has not been written to the output file. This happens as soon as the parser iterates to the next object via PDParserIterate(), and this will happen as soon as all the tasks triggered for the object finish executing.

In other words, an object can be kept mutable forever by simply having a task do

while (1) sleep(1);

or, an arguably more useful example, an asynchronous operation can be triggered by simply keeping the task waiting for some flag, e.g.

PDTaskResult asyncWait(PDPipeRef pipe, PDTaskRef task, PDObjectRef object)
{
PDInteger asyncDone = 0;
do_asynchronous_thing(object, &asyncDone);
while (! asyncDone) sleep(1);
return PDTaskDone;
}
void do_asynchronous_thing(PDObjectRef object, PDInteger *asyncDone)
{
// start whatever asynchronous thing needs doing
}
void finish_asynchronous_thing(PDObjectRef object, PDInteger *asyncDone)
{
PDDictionarySet(PDObjectGetDictionary(object), "Foo", "bar");
*asyncDone = 1;
}

Data Structure Documentation

struct PDObject

Object internal structure

Data Fields

PDInteger obid
 object id
 
PDInteger genid
 generation id
 
PDObjectClass obclass
 object class (regular, compressed, or trailer)
 
PDObjectType type
 data structure of def below
 
pd_stack def
 the object content
 
void * inst
 instance of def, or NULL if not yet instantiated
 
PDBool hasStream
 if set, object has a stream
 
PDInteger streamLen
 length of stream (if one exists)
 
PDInteger extractedLen
 length of extracted stream; -1 until stream has been fetched via the parser
 
char * streamBuf
 the stream, if fetched via parser, otherwise an undefined value
 
PDBool skipStream
 if set, even if an object has a stream, the stream (including keywords) is skipped when written to output
 
PDBool skipObject
 if set, entire object is discarded
 
PDBool deleteObject
 if set, the object's XREF table slot is marked as free
 
char * ovrStream
 stream override
 
PDInteger ovrStreamLen
 length of ^
 
PDBool ovrStreamAlloc
 if set, ovrStream will be free()d by the object after use
 
char * ovrDef
 definition override
 
PDInteger ovrDefLen
 take a wild guess
 
PDBool encryptedDoc
 if set, the object is contained in an encrypted PDF; if false, PDObjectSetStreamEncrypted is NOP
 
char * refString
 reference string, cached from calls to
 
PDSynchronizer synchronizer
 synchronizer callback, called right before the object is serialized and written to the output stream
 
const void * syncInfo
 user info object for synchronizer callback (usually a class instance, for wrappers)
 
pd_crypto crypto
 crypto object, if available
 
PDCryptoInstanceRef cryptoInstance
 crypto instance, if set up
 

Typedef Documentation

typedef struct PDObject* PDObjectRef

A PDF object.

Enumeration Type Documentation

The PD instance type of a value.

Enumerator
PDInstanceTypeUnset 

The associated instance value has not been set yet.

PDInstanceTypeUnknown 

Undefined / non-allocated instance.

PDInstanceTypeNull 

NULL.

PDInstanceTypeNumber 

PDNumber.

PDInstanceTypeString 

PDString.

PDInstanceTypeArray 

PDArray.

PDInstanceTypeDict 

PDDictionary.

PDInstanceTypeRef 

PDReference.

PDInstanceTypeObj 

PDObject.

PDInstanceTypeCStream 

PDContentStream.

PDInstanceTypeOStream 

PDObjectStream.

PDInstanceTypeTree 

PDSplayTree.

PDInstanceTypeState 

PDState.

PDInstanceTypeSFilter 

PDStreamFilter.

PDInstanceTypeTask 

PDTask.

PDInstanceTypeXTable 

PDXTable.

PDInstanceTypeCSOper 

Content stream operator.

PDInstanceTypeFontDict 

PDFontDictionary.

PDInstanceTypeFont 

PDFont.

PDInstanceTypeCMap 

PDCMap.

PDInstanceTypePSExec 

PostScript executable code.

PDInstanceTypeDictStack 

PDDictionaryStack.

The class of an object;

Enumerator
PDObjectClassRegular 

A regular object in a PDF.

PDObjectClassCompressed 

An object inside of an object stream.

PDObjectClassTrailer 

A trailer.

The type of object.

Note
This enum is matched with CGPDFObject's type enum (Core Graphics), with extensions
Warning
Not all types are currently used.
Enumerator
PDObjectTypeNull 

Null object, often used to indicate the type of a dictionary entry for a key that it doesn't contain.

PDObjectTypeUnknown 

The type of the object has not (yet) been determined.

PDObjectTypeBoolean 

A boolean.

PDObjectTypeInteger 

An integer.

PDObjectTypeReal 

A real (internally represented as a float).

PDObjectTypeName 

A name. Names in PDFs are things that begin with a slash, e.g. /Info.

PDObjectTypeString 

A string.

PDObjectTypeArray 

An array.

PDObjectTypeDictionary 

A dictionary. Most objects are considered dictionaries.

PDObjectTypeStream 

A stream.

PDObjectTypeReference 

A reference to another object.

PDObjectTypeSize 

A size (not in CGPDFObject type)

Function Documentation

PDObjectRef PDObjectCreateFromDefinitionsStack ( PDInteger  obid,
pd_stack  defs 
)

Create an object from a definitions stack (e.g. fetched via PDParserLocateAndCreateDefinitionForObject()).

Warning
The object is always immutable.
Parameters
defsThe definitions for the object.
Returns
An immutable object instance based on the defs.
void PDObjectDelete ( PDObjectRef  object)

Delete this object, thus excluding it from the output PDF file, and marking it as freed in the XREF table.

Parameters
objectThe object to remove.
PDObjectType PDObjectDetermineType ( PDObjectRef  object)

Attempt to determinee the type of the object based on its definitions stack.

PDInteger PDObjectGenerateDefinition ( PDObjectRef  object,
char **  dstBuf,
PDInteger  capacity 
)

Generates an object definition up to and excluding the stream definition, from "<obid> <genid> obj" to right before "endobj" or "stream" depending on whether a stream exists or not.

The results are written into dstBuf, reallocating it if necessary (i.e. it must be a valid allocation and not a point inside a heap).

Note
This method ignores definition replacements via PDObjectReplaceWithString().
Parameters
objectThe object.
dstBufPointer to buffer into which definition should be written. Must be a proper allocation.
capacityThe number of bytes allocated into *dstBuf already.
Returns
Bytes written.
PDArrayRef PDObjectGetArray ( PDObjectRef  object)

Get the array of the object, or NULL if the object does not have an array.

Parameters
objectThe object
Returns
PDArray instance for the object
PDDictionaryRef PDObjectGetDictionary ( PDObjectRef  object)

Get the instance type of the object.

Parameters
objectThe object
Returns
Instance type value. PDInstanceTypeUnknown is returned if the instance type could not be determined. Get the instance for the object's definition. The instance is a PDDictionary, PDArray, PDString, etc. depending on what the object's definition looks like.
Parameters
objectThe object
Returns
Appropriate object type. Use PDResolve() to determine its type if unsure. Get the dictionary of the object, or NULL if the object does not have a dictionary.
Parameters
objectThe object
Returns
PDDictionary instance for the object
PDInteger PDObjectGetExtractedStreamLength ( PDObjectRef  object)

Determine the extracted length of the previously fetched object stream.

This can be compared to the size of a file.txt after decompressing a file.txt.gz.

Warning
Assertion thrown if the object stream has not been fetched before this call.
Parameters
objectThe object.
PDInteger PDObjectGetGenID ( PDObjectRef  object)

Get generation ID for an object.

Parameters
objectThe object.
PDInteger PDObjectGetObID ( PDObjectRef  object)

Get object ID for object.

Parameters
objectThe object.
PDBool PDObjectGetObStreamFlag ( PDObjectRef  object)

Get the obstream-flag for this object.

The obstream-flag is true for objects which are embedded inside of other objects, as a part of an object stream.

Parameters
objectThe object.
Returns
true if the object is in an object stream.
PDInteger PDObjectGetRawStreamLength ( PDObjectRef  object)

Determine the raw (unextracted) length of the object stream.

This can be compared to the size of a file.txt.gz.

Parameters
objectThe object.
const char* PDObjectGetReferenceString ( PDObjectRef  object)

Get reference string for this object.

Parameters
objectThe object.
char* PDObjectGetStream ( PDObjectRef  object)

Get the object's stream. Assertion thrown if the stream has not been fetched via PDParserFetchCurrentObjectStream() first.

Parameters
objectThe object.
PDObjectType PDObjectGetType ( PDObjectRef  object)

Get type of an object.

Parameters
objectThe object.
Returns
The PDObjectType of the object.
Note
Types are restricted to PDObjectTypeUnknown, PDObjectTypeDictionary, and PDObjectTypeString in the current implementation.
void* PDObjectGetValue ( PDObjectRef  object)

Fetch the value of the given object, as an instantiation of the appropriate type, or as a char* if the object is represented as a pure string (this is the case for some constants, such as null).

Parameters
objectThe object.
Returns
The appropriate instance type. Use PDResolve() to determine its type.
PDBool PDObjectHasStream ( PDObjectRef  object)

Determine if the object has a stream or not.

Parameters
objectThe object.
PDBool PDObjectHasTextStream ( PDObjectRef  object)

Determine if the object's stream is text or binary data.

This is determined by looking at the first 10 (or all, if length <= 10) bytes and seeing 80% or more of them are defined text characters. If this is the case, true is returned. The very last byte must also be 0 (the string must be NULL-terminated).

Warning
Assertion thrown if the object stream has not been fetched before this call.
Parameters
objectThe object.
Returns
true if the object's stream is text-based, false otherwise
void PDObjectReplaceWithString ( PDObjectRef  object,
char *  str,
PDInteger  len 
)

Replaces the entire object's definition with the given string of the given length; does not replace the stream and the caller is responsible for asserting that the /Length key is preserved; if the stream was turned off, this may include a stream element by abiding by the PDF specification, which requires that

  1. the object is a dictionary, and has a /Length key with the exact length of the stream (excluding the keywords and newlines wrapping it),
  2. the keyword
    1 stream
    is directly below the object dictionary on its own line followed by the stream content, and
  3. followed by
    1 endstream
    right after the stream length (extraneous whitespace is allowed between the content's last byte and the 'endstream' keyword's beginning).

Also note that filters and encodings are often used, but not required.

Parameters
objectThe object.
strThe replacement string.
lenThe length of the replacement string.
void PDObjectSetFlateDecodedFlag ( PDObjectRef  object,
PDBool  state 
)

Enable or disable compression (FlateDecode) filter flag for the object stream.

Note
Passing false to the state will remove the Filter and DecodeParms dictionary entries from the object.
Parameters
objectThe object.
stateBoolean value of whether the stream is compressed or not.
void PDObjectSetPredictionStrategy ( PDObjectRef  object,
PDPredictorType  strategy,
PDInteger  columns 
)

Define prediction strategy for the stream.

Warning
Pajdeg currently only supports PDPredictorNone and PDPredictorPNG_UP. Updating an existing stream (e.g. fixing its predictor values) is possible, however, but replacing the stream or requiring Pajdeg to predict the content in some other way will cause an assertion.
Parameters
objectThe object.
strategyThe PDPredictorType value.
columnsColumns value.
See also
PDPredictorType
PDStreamFilterPrediction.h
void PDObjectSetStream ( PDObjectRef  object,
char *  str,
PDInteger  len,
PDBool  includeLength,
PDBool  allocated,
PDBool  encrypted 
)

Replaces the stream with given data.

Note
The stream is inserted as is, with no filtering applied to it whatsoever. To insert a filtered stream, e.g. FlateDecoded, use PDObjectSetStreamFiltered() instead.
Parameters
objectThe object.
strThe stream data.
lenThe length of the stream data.
includeLengthWhether the object's /Length entry should be updated to reflect the new stream content length.
allocatedWhether str should be free()d after the object is done using it.
encryptedIf true, str is presumed to be already encrypted (e.g. copied from original PDF or pre-encrypted); if false, Pajdeg will encrypt the string before inserting it into the pipe. If the PDF is not encrypted, this argument has no effect
void PDObjectSetStreamEncrypted ( PDObjectRef  object,
PDBool  encrypted 
)

Sets the encrypted flag for the object's stream.

Warning
If the document is encrypted, and the stream is not, this must be set to false or the stream will not function properly
Parameters
objectThe object.
encryptedWhether or not the stream is encrypted.
PDBool PDObjectSetStreamFiltered ( PDObjectRef  object,
char *  str,
PDInteger  len,
PDBool  encrypted 
)

Replaces the stream with given data, filtered according to the object's /Filter and /DecodeParams settings.

Note
Pajdeg only supports a limited number of filters. If the object's filter settings are not supported, the operation is aborted.
If no filter is defined, PDObjectSetStream is called and and true is returned.
See also
PDObjectSetStream
PDObjectSetFlateDecodedFlag
PDObjectSetPredictionStrategy
Warning
str is not freed.
Parameters
objectThe object.
strThe stream data.
lenThe length of the stream data.
encryptedIf true, str is presumed to be already encrypted (e.g. copied from original PDF or pre-encrypted); if false, Pajdeg will encrypt the string before inserting it into the pipe. If the PDF is not encrypted, this argument has no effect
Returns
Success value. If false is returned, the stream remains unset.
void PDObjectSetSynchronizationCallback ( PDObjectRef  object,
PDSynchronizer  callback,
const void *  syncInfo 
)

Set the synchronization callback, called right before the parser serializes the object and writes it to the output stream.

Note
Only one callback is supported.
Parameters
objectThe object.
callbackThe synchronization callback.
syncInfoThe user info object to be passed as the final parameter to the callback.
void PDObjectSetType ( PDObjectRef  object,
PDObjectType  type 
)

Explicitly set the type of an object.

Parameters
objectThe object whose type should be (re)defined.
typeThe (new) object type.
void PDObjectSetValue ( PDObjectRef  object,
void *  value 
)

Set the value of the given object.

Note
If object is non-primitive (e.g. dictionary), this operation will at best leak memory and at worst crash.
Parameters
objectThe object
Thenew value of the primitive (string, integer, real, ...) object.
void PDObjectSkipStream ( PDObjectRef  object)

Removes the stream from the object.

The stream will be skipped when written. This has no effect if the object had no stream to begin with.

Parameters
objectThe object.
void PDObjectUndelete ( PDObjectRef  object)

Undelete this object.

If PDObjectDelete() was called previously, calling this method will cancel the deletion.

The object may end up deleted anyway, if the pipe has moved past the object definition since the delete call.