Pajdeg  0.2.2
Pajdeg
Modules | Files | Data Structures | Typedefs | Enumerator | Functions | Variables
PDObjectStream

A PDF object stream, i.e. a stream of PDF objects inside a stream. More...

Modules

 PDContentStream
 A content stream, i.e. a stream containing Adobe commands, such as for writing text and similar.
 
 PDPage
 A PDF page.
 

Files

file  PDObjectStream.h
 

Data Structures

struct  PDObjectStreamElement
 
struct  PDObjectStream
 
struct  PDEnv
 
struct  pd_btree
 
struct  PDOperator
 
struct  PDParser
 
struct  PDPageReference
 
struct  PDCatalog
 
struct  PDScannerSymbol
 
struct  PDScanner
 
struct  pd_stack
 
struct  PDArray
 
struct  PDDictionaryNode
 
struct  PDDictionary
 
struct  PDDictionaryStack
 
struct  PDFontDictionary
 
struct  PDFont
 
struct  PDCMap
 
struct  PDCryptoInstance
 
struct  pd_crypto
 
struct  PDState
 
struct  PDStaticHash
 
struct  PDTask
 
struct  PDTwinStream
 
struct  PDPipe
 
struct  PDReference
 
struct  PDNumber
 
struct  PDString
 
struct  PDStringConv
 

Typedefs

typedef struct PDObjectStreamPDObjectStreamRef
 

Functions

PDObjectRef PDObjectStreamGetObjectByID (PDObjectStreamRef obstm, PDInteger obid)
 
PDObjectRef PDObjectStreamGetObjectAtIndex (PDObjectStreamRef obstm, PDInteger index)
 

Variables

PDInteger PDObject::obid
 object id
 
PDInteger PDObject::genid
 generation id
 
PDObjectClass PDObject::obclass
 object class (regular, compressed, or trailer)
 
PDObjectType PDObject::type
 data structure of def below
 
pd_stack PDObject::def
 the object content
 
void * PDObject::inst
 instance of def, or NULL if not yet instantiated
 
PDBool PDObject::hasStream
 if set, object has a stream
 
PDInteger PDObject::streamLen
 length of stream (if one exists)
 
PDInteger PDObject::extractedLen
 length of extracted stream; -1 until stream has been fetched via the parser
 
char * PDObject::streamBuf
 the stream, if fetched via parser, otherwise an undefined value
 
PDBool PDObject::skipStream
 if set, even if an object has a stream, the stream (including keywords) is skipped when written to output
 
PDBool PDObject::skipObject
 if set, entire object is discarded
 
PDBool PDObject::deleteObject
 if set, the object's XREF table slot is marked as free
 
char * PDObject::ovrStream
 stream override
 
PDInteger PDObject::ovrStreamLen
 length of ^
 
PDBool PDObject::ovrStreamAlloc
 if set, ovrStream will be free()d by the object after use
 
char * PDObject::ovrDef
 definition override
 
PDInteger PDObject::ovrDefLen
 take a wild guess
 
PDBool PDObject::encryptedDoc
 if set, the object is contained in an encrypted PDF; if false, PDObjectSetStreamEncrypted is NOP
 
char * PDObject::refString
 reference string, cached from calls to
 
PDSynchronizer PDObject::synchronizer
 synchronizer callback, called right before the object is serialized and written to the output stream
 
const void * PDObject::syncInfo
 user info object for synchronizer callback (usually a class instance, for wrappers)
 
pd_crypto PDObject::crypto
 crypto object, if available
 
PDCryptoInstanceRef PDObject::cryptoInstance
 crypto instance, if set up
 
PDInteger PDObjectStreamElement::obid
 object id of element
 
PDInteger PDObjectStreamElement::offset
 offset inside object stream
 
PDInteger PDObjectStreamElement::length
 length of the (stringified) definition; only valid during a commit
 
PDObjectType PDObjectStreamElement::type
 element object type
 
void * PDObjectStreamElement::def
 definition; NULL if a construct has been made for this element
 
PDObjectRef PDObjectStream::ob
 obstream object
 
PDInteger PDObjectStream::n
 number of objects
 
PDInteger PDObjectStream::first
 first object's offset
 
PDStreamFilterRef PDObjectStream::filter
 filter used to extract the initial raw content
 
PDObjectStreamElementRef PDObjectStream::elements
 n sized array of elements (non-pointered!)
 
PDSplayTreeRef PDObjectStream::constructs
 instances of objects (i.e. constructs)
 
PDSplayTreeRef PDContentStream::opertree
 operator tree
 
PDArrayRef PDContentStream::args
 pending operator arguments
 
pd_stack PDContentStream::opers
 current operators stack
 
pd_stack PDContentStream::deallocators
 deallocator (func, userInfo) pairs – called when content stream object is about to be destroyed
 
pd_stack PDContentStream::resetters
 resetter (func, userInfo) pairs – called at the end of every execution call; these adhere to the PDDeallocator signature
 
const char * PDContentStream::lastOperator
 the last operator that was encountered
 
PDObjectRef PDPage::ob
 the /Page object
 
PDParserRef PDPage::parser
 the parser associated with the owning PDF document
 
PDInteger PDPage::contentCount
 

of content objects in the page


 
PDArrayRef PDPage::contentRefs
 array of content references for the page
 
PDObjectRefPDPage::contentObs
 array of content objects for the page, or NULL if unfetched
 
PDFontDictionaryRef PDPage::fontDict
 The font dictionary for the page; lazily constructed on first call to PDPageGetFont().
 
PDStateRef PDEnv::state
 The wrapped state.
 
pd_stack PDEnv::buildStack
 Build stack (for sub-components)
 
pd_stack PDEnv::varStack
 Variable stack (for incomplete components)
 
PDInteger pd_btree::key
 The (primitive) key.
 
void * pd_btree::value
 The value.
 
PDSplayTreeRef pd_btree::branch [2]
 The left and right branches of the tree.
 
PDOperatorType PDOperator::type
 The operator type.
 
PDStateRef   PDOperator::pushedState
 for "PushNewEnv", this is the environment being pushed
 
char *   PDOperator::key
 the argument to the operator, for PopVariable, Push/StoveComplex, PullBuildVariable
 
PDID   PDOperator::identifier
 identifier (constant string pointer pointer)
 
union {
   PDStateRef   PDOperator::pushedState
 for "PushNewEnv", this is the environment being pushed
 
   char *   PDOperator::key
 the argument to the operator, for PopVariable, Push/StoveComplex, PullBuildVariable
 
   PDID   PDOperator::identifier
 identifier (constant string pointer pointer)
 
}; 
 
PDOperatorRef PDOperator::next
 the next operator, if any
 
PDTwinStreamRef PDParser::stream
 The I/O stream from the pipe.
 
PDScannerRef PDParser::scanner
 The main scanner.
 
PDParserState PDParser::state
 The parser state.
 
pd_stack PDParser::xstack
 A stack of partial xref tables based on offset; see [1] below.
 
PDXTableRef PDParser::mxt
 master xref table, used for output
 
PDXTableRef PDParser::cxt
 current input xref table
 
PDBool PDParser::done
 parser has passed the last object in the input PDF
 
PDSize PDParser::xrefnewiter
 iterator for locating unused id's for usage in master xref table
 
pd_stack PDParser::appends
 stack of objects that are meant to be appended at the end of the PDF
 
pd_stack PDParser::inserts
 stack of objects that are meant to be inserted as soon as the current object is dealt with
 
PDSplayTreeRef PDParser::aiTree
 bin-tree identifying the (in-memory) PDObjectRefs in appends and inserts by their object ID's
 
PDObjectRef PDParser::construct
 cannot be relied on to contain anything; is used to hold constructed objects until iteration (at which point they're released)
 
PDSize PDParser::streamLen
 stream length of the current object
 
PDSize PDParser::obid
 object ID of the current object
 
PDSize PDParser::genid
 generation number of the current object
 
PDSize PDParser::oboffset
 offset of the current object
 
PDReferenceRef PDParser::rootRef
 reference to the root object
 
PDReferenceRef PDParser::infoRef
 reference to the info object
 
PDReferenceRef PDParser::encryptRef
 reference to the encrypt object
 
PDObjectRef PDParser::trailer
 the trailer object
 
PDObjectRef PDParser::root
 the root object, if instantiated
 
PDObjectRef PDParser::info
 the info object, if instantiated
 
PDObjectRef PDParser::encrypt
 the encrypt object, if instantiated
 
PDCatalogRef PDParser::catalog
 the root catalog, if instantiated
 
pd_crypto PDParser::crypto
 crypto instance, if the document is encrypted
 
PDBool PDParser::success
 if true, the parser has so far succeeded at parsing the input file
 
PDSplayTreeRef PDParser::skipT
 whenever an object is ignored due to offset discrepancy, its ID is put on the skip tree; when the last object has been parsed, if the skip tree is non-empty, the parser aborts, as it means objects were lost
 
PDFontDictionaryRef PDParser::mfd
 Master font dictionary, containing all fonts processed so far.
 
PDBool PDPageReference::collection
 If set, this is a /Type /Pages object, which is a group of page and pages references.
 
PDInteger   PDPageReference::count
 Number of entries.
 
PDPageReference *   PDPageReference::kids
 Kids.
 
struct {
   PDInteger   PDPageReference::count
 Number of entries.
 
   PDPageReference *   PDPageReference::kids
 Kids.
 
 
PDInteger   PDPageReference::obid
 The object ID.
 
PDInteger   PDPageReference::genid
 The generation ID.
 
struct {
   PDInteger   PDPageReference::obid
 The object ID.
 
   PDInteger   PDPageReference::genid
 The generation ID.
 
 
union {
   struct {
      PDInteger   PDPageReference::count
 Number of entries.
 
      PDPageReference *   PDPageReference::kids
 Kids.
 
   } 
 
   struct {
      PDInteger   PDPageReference::obid
 The object ID.
 
      PDInteger   PDPageReference::genid
 The generation ID.
 
   } 
 
}; 
 
PDParserRef PDCatalog::parser
 The parser owning the catalog.
 
PDObjectRef PDCatalog::object
 The object representation of the catalog.
 
PDRect PDCatalog::mediaBox
 The media box of the catalog object.
 
PDPageReference PDCatalog::pages
 The root pages.
 
PDInteger PDCatalog::count
 Number of pages (in total)
 
PDInteger PDCatalog::capacity
 Size of kids array.
 
PDIntegerPDCatalog::kids
 Array of object IDs for all pages.
 
char * PDScannerSymbol::sstart
 symbol start
 
short PDScannerSymbol::shash
 symbol hash (not normalized)
 
PDInteger PDScannerSymbol::slen
 symbol length
 
PDScannerSymbolType PDScannerSymbol::stype
 symbol type
 
PDEnvRef PDScanner::env
 the current environment
 
PDScannerBufFunc PDScanner::bufFunc
 buffer function
 
void * PDScanner::bufFuncInfo
 buffer function info object
 
pd_stack PDScanner::contextStack
 context stack for buffer function/info
 
pd_stack PDScanner::envStack
 environment stack; e.g. root -> arb -> array -> arb -> ...
 
pd_stack PDScanner::resultStack
 results stack
 
pd_stack PDScanner::symbolStack
 symbols stack; used to "rewind" when misinterpretations occur (e.g. for "number_or_obref" when one or two numbers)
 
pd_stack PDScanner::garbageStack
 temporary allocations; only used in operator function when a symbol is regenerated from a malloc()'d string
 
PDStreamFilterRef PDScanner::filter
 filter, if any
 
char * PDScanner::buf
 buffer
 
PDInteger PDScanner::bresoffset
 previously popped result's offset relative to buf
 
PDInteger PDScanner::bsize
 buffer capacity
 
PDInteger PDScanner::boffset
 buffer offset (we are at position &buf[boffset]
 
PDInteger PDScanner::bmark
 buffer mark
 
PDScannerSymbolRef PDScanner::sym
 the latest symbol
 
PDScannerPopFunc PDScanner::popFunc
 the symbol pop function
 
PDBool PDScanner::fixedBuf
 if set, the buffer is fixed (i.e. buffering function should not be called)
 
PDBool PDScanner::failed
 if set, the scanner aborted due to a failure
 
PDBool PDScanner::outgrown
 if true, a scanner with fixedBuf set needed more data
 
PDBool PDScanner::strict
 if true, the scanner will complain loudly when erroring out, otherwise it will silently fail
 
pd_stack pd_stack::prev
 Previous object in stack.
 
char pd_stack::type
 Stack type.
 
void * pd_stack::info
 The stack content, based on its type.
 
PDInteger PDArray::count
 Number of elements.
 
PDInteger PDArray::capacity
 Capacity of array.
 
void ** PDArray::values
 Resolved values.
 
pd_stackPDArray::vstacks
 Unresolved values in pd_stack form.
 
PDCryptoInstanceRef PDArray::ci
 Crypto instance, if array is encrypted.
 
char * PDDictionaryNode::key
 the key for this node
 
void * PDDictionaryNode::data
 the data
 
void * PDDictionaryNode::decrypted
 the data, in decrypted form
 
PDSize PDDictionaryNode::hash
 the hash code
 
PDInteger PDDictionary::count
 Number of entries.
 
PDInteger PDDictionary::bucketc
 Number of buckets.
 
PDInteger PDDictionary::bucketm
 Bucket mask.
 
PDArrayRefPDDictionary::buckets
 Buckets containing content.
 
PDArrayRef PDDictionary::populated
 Array of buckets which were created (as opposed to remaining NULL due to index never being touched)
 
PDCryptoInstanceRef PDDictionary::ci
 Crypto instance, if dictionary is encrypted.
 
pd_stack PDDictionaryStack::dicts
 Stack of PDDictionaries.
 
PDParserRef PDFontDictionary::parser
 The owning parser.
 
PDDictionaryRef PDFontDictionary::fonts
 Dictionary mapping font names to their values.
 
PDDictionaryRef PDFontDictionary::encodings
 Encoding dictionary.
 
PDParserRef PDFont::parser
 Parser reference.
 
PDFontDictionaryRef PDFont::fontDict
 The owning font dictionary.
 
PDObjectRef PDFont::obj
 Font object reference.
 
PDCMapRef PDFont::toUnicode
 CMap, or NULL if not yet compiled or if non-existent.
 
PDStringEncoding PDFont::enc
 String encoding, or NULL if not a string encoding.
 
unsigned char * PDFont::encMap
 Encoding map, a 256 byte array mapping bytes to bytes.
 
PDDictionaryRef PDCMap::systemInfo
 The CIDSystemInfo dictionary, which has /Registry, /Ordering, and /Supplement.
 
PDStringRef PDCMap::name
 CMapName.
 
PDNumberRef PDCMap::type
 CMapType.
 
PDSize PDCMap::csrCap
 Codespace range capacity.
 
PDSize PDCMap::bfrCap
 BF range capacity.
 
PDSize PDCMap::bfcCap
 BF char capacity.
 
PDSize PDCMap::csrCount
 Number of codespace ranges.
 
PDSize PDCMap::bfrCount
 Number of BF ranges.
 
PDSize PDCMap::bfcCount
 Number of BF chars.
 
PDSize PDCMap::bfcLength
 Length (in bytes) of input characters in BF ranges.
 
PDCMapRangePDCMap::csrs
 Array of codespace ranges.
 
PDCMapRangeMappingPDCMap::bfrs
 BF ranges.
 
PDCMapCharMappingPDCMap::bfcs
 BF chars.
 
pd_crypto PDCryptoInstance::crypto
 Crypto object.
 
PDInteger PDCryptoInstance::obid
 Associated object ID.
 
PDInteger PDCryptoInstance::genid
 Associated generation number.
 
PDStringRef pd_crypto::identifier
 PDF /ID found in the trailer dictionary.
 
PDStringRef pd_crypto::filter
 filter name
 
PDStringRef pd_crypto::subfilter
 sub-filter name
 
PDInteger pd_crypto::version
 algorithm version (V key in PDFs)
 
PDInteger pd_crypto::length
 length of the encryption key, in bits; must be a multiple of 8 in the range 40 - 128; default = 40
 
PDInteger pd_crypto::revision
 revision ("R") of algorithm: 2 if version < 2 and perms have no 3 or greater values, 3 if version is 2 or 3, or P has rev 3 stuff, 4 if version = 4
 
PDStringRef pd_crypto::owner
 owner string ("O"), 32-byte string based on owner and user passwords, used to compute encryption key and determining whether a valid owner password was entered
 
PDStringRef pd_crypto::user
 user string ("U"), 32-byte string based on user password, used in determining whether to prompt the user for a password and whether given password was a valid user or owner password
 
int32_t pd_crypto::privs
 privileges (see Table 3.20 in PDF spec v 1.7, p. 123-124)
 
PDBool pd_crypto::encryptMetadata
 whether metadata should be encrypted or not ("/EncryptMetadata true")
 
PDStringRef pd_crypto::enckey
 encryption key
 
PDInteger pd_crypto::cfLength
 crypt filter length, e.g. 16 for AESV2
 
pd_crypto_method pd_crypto::cfMethod
 crypt filter method
 
pd_auth_event pd_crypto::cfAuthEvent
 when authentication occurs; currently only supports '/DocOpen'
 
PDBool PDState::iterates
 if true, scanner will stop while in this state, after reading one entry
 
char * PDState::name
 name of the state
 
char ** PDState::symbol
 symbol strings
 
PDInteger PDState::symbols
 number of symbols in total
 
PDIntegerPDState::symindex
 symbol indices (for hash)
 
short PDState::symindices
 number of index slots in total (not = symbols, often bigger)
 
PDOperatorRefPDState::symbolOp
 symbol operators
 
PDOperatorRef PDState::numberOp
 number operator
 
PDOperatorRef PDState::delimiterOp
 delimiter operator
 
PDOperatorRef PDState::fallbackOp
 fallback operator
 
PDInteger PDStaticHash::entries
 Number of entries in static hash.
 
PDInteger PDStaticHash::mask
 The mask.
 
PDInteger PDStaticHash::shift
 The shift.
 
PDBool PDStaticHash::leaveKeys
 if set, the keys are not deallocated on destruction; default = false (i.e. dealloc keys)
 
PDBool PDStaticHash::leaveValues
 if set, the values are not deallocated on destruction; default = false (i.e. dealloc values)
 
void ** PDStaticHash::keys
 Keys array.
 
void ** PDStaticHash::values
 Values array.
 
void ** PDStaticHash::table
 The static hash table.
 
PDBool PDTask::isActive
 Whether task is still active; sometimes tasks cannot be unloaded properly even though the task returned PDTaskUnload; these tasks have their active flag unset instead.
 
PDBool PDTask::isFilter
 Whether task is a filter or not. Internally, a task is only a filter if it is assigned to a specific object ID or IDs.
 
PDPropertyType PDTask::propertyType
 The filter property type.
 
PDInteger PDTask::value
 The filter value, if any.
 
PDTaskFunc PDTask::func
 The function callback, if the task is not a filter.
 
PDTaskRef PDTask::child
 The task's child task; child tasks are called in order.
 
PDDeallocator PDTask::deallocator
 The deallocator for the task.
 
void * PDTask::info
 The (user) info object.
 
PDTwinStreamMethod PDTwinStream::method
 The current method.
 
PDScannerRef PDTwinStream::scanner
 the master scanner
 
FILE * PDTwinStream::fi
 Reader.
 
FILE * PDTwinStream::fo
 writer
 
fpos_t PDTwinStream::offsi
 absolute offset in input for heap
 
fpos_t PDTwinStream::offso
 absolute offset in output for file pointer
 
char * PDTwinStream::heap
 heap in which buffer is located
 
PDSize PDTwinStream::size
 size of heap
 
PDSize PDTwinStream::holds
 bytes in heap
 
PDSize PDTwinStream::cursor
 position in heap (bytes 0..cursor have been written (unless discarded) to output)
 
char * PDTwinStream::sidebuf
 temporary buffer (e.g. for Fetch)
 
PDBool PDTwinStream::outgrown
 if true, a buffer with growth disallowed attempted to grow and failed
 
PDBool PDPipe::opened
 Whether pipe has been opened or not.
 
PDBool PDPipe::dynamicFiltering
 Whether dynamic filtering is necessary; if set, the static hash filtering of filters is skipped and filters are checked for all objects.
 
PDBool PDPipe::typedTasks
 Whether type tasks (excluding unfiltered tasks) are activated; activation results in a slight decrease in performance due to all dictionary objects needing to be resolved in order to check their Type dictionary key.
 
char * PDPipe::pi
 The path of the input file.
 
char * PDPipe::po
 The path of the output file.
 
FILE * PDPipe::fi
 Reader.
 
FILE * PDPipe::fo
 Writer.
 
PDInteger PDPipe::filterCount
 Number of filters in the pipe.
 
PDTwinStreamRef PDPipe::stream
 The pipe stream.
 
PDParserRef PDPipe::parser
 The parser.
 
PDSplayTreeRef PDPipe::filter
 The filters, in a tree with the object ID as key.
 
pd_stack PDPipe::typeTasks [_PDFTypeCount]
 Tasks which run depending on all objects of the given type; the 0'th element (type NULL) is triggered for all objects, and not just objects without a /Type dictionary key.
 
PDSplayTreeRef PDPipe::attachments
 PDParserAttachment entries.
 
PDInteger PDReference::obid
 The object ID.
 
PDInteger PDReference::genid
 The generation number.
 
PDObjectType PDNumber::type
 Type of the number; as a special case, PDObjectTypeReference is used for pointers.
 
PDInteger   PDNumber::i
 
PDReal   PDNumber::r
 
PDBool   PDNumber::b
 
PDSize   PDNumber::s
 
void *   PDNumber::p
 
union {
   PDInteger   i
 
   PDReal   r
 
   PDBool   b
 
   PDSize   s
 
   void *   p
 
}; 
 
PDStringType PDString::type
 Type of the string.
 
PDFontRef PDString::font
 The font associated with the string.
 
PDStringEncoding PDString::enc
 Encoding of the string.
 
PDSize PDString::length
 Length of the string.
 
PDBool PDString::wrapped
 Whether the string is wrapped.
 
char * PDString::data
 Buffer containing string data.
 
PDStringRef PDString::alt
 Alternative representation.
 
PDCryptoInstanceRef PDString::ci
 Crypto instance.
 
PDBool PDString::encrypted
 Flag indicating whether the string is encrypted or not; the value of this flag is UNDEFINED if ci == NULL.
 
char * PDStringConv::allocBuf
 The allocated buffer.
 
PDInteger PDStringConv::offs
 The current offset inside the buffer.
 
PDInteger PDStringConv::left
 The current bytes left (remaining) in the buffer.
 

Private structs

PDObjectStreamRef PDObjectStreamCreateWithObject (PDObjectRef object)
 
PDBool PDObjectStreamParseRawObjectStream (PDObjectStreamRef obstm, char *rawBuf)
 
void PDObjectStreamParseExtractedObjectStream (PDObjectStreamRef obstm, char *buf)
 
void PDObjectStreamCommit (PDObjectStreamRef obstm)
 

Environment

void PDEnvDestroy (PDEnvRef env)
 

Parser

enum  PDParserState { PDParserStateBase, PDParserStateObjectDefinition, PDParserStateObjectAppendix, PDParserStateObjectPostStream }
 
typedef struct PDPageReference PDPageReference
 
typedef struct PDXTablePDXTableRef
 

Scanner

enum  PDScannerSymbolType {
  PDScannerSymbolTypeDefault = PDOperatorSymbolGlobRegular, PDScannerSymbolTypeWhitespace = PDOperatorSymbolGlobWhitespace, PDScannerSymbolTypeDelimiter = PDOperatorSymbolGlobDelimiter, PDScannerSymbolTypeNumeric = PDOperatorSymbolExtNumeric,
  PDScannerSymbolTypeEOB = PDOperatorSymbolExtEOB, PDScannerSymbolTypeFake = PDOperatorSymbolExtFake
}
 
typedef struct PDScannerSymbolPDScannerSymbolRef
 

Stack

typedef struct PDDictionaryNodePDDictionaryNodeRef
 
typedef struct PDCMapRange PDCMapRange
 Private structure for ranges (<hex> <hex>), where multi-byte ranges are rectangles, not sequences (see PDF specification)
 
typedef struct PDCMapRangeMapping PDCMapRangeMapping
 Private structure for range mappings (<hex> <hex> <hex>)
 
typedef struct PDCMapCharMapping PDCMapCharMapping
 Private structure for individual char mappings (<hex> <hex>)
 
#define PD_STACK_STRING   0
 Stack string type.
 
#define PD_STACK_ID   1
 Stack identifier type.
 
#define PD_STACK_STACK   2
 Stack stack type.
 
#define PD_STACK_PDOB   3
 Stack object (PDTypeRef managed) type.
 
#define PD_STACK_FREEABLE   4
 Stack freeable type.
 

Twin streams

void PDPipeCloseFileStream (FILE *stream)
 
FILE * PDPipeOpenInputStream (const char *path)
 
FILE * PDPipeOpenOutputStream (const char *path)
 

String

void PDStringAttachCryptoInstance (PDStringRef string, PDCryptoInstanceRef ci, PDBool encrypted)
 
void PDArrayAttachCryptoInstance (PDArrayRef array, PDCryptoInstanceRef ci, PDBool encrypted)
 
void PDDictionaryAttachCryptoInstance (PDDictionaryRef dictionary, PDCryptoInstanceRef ci, PDBool encrypted)
 

Conversion (PDF specification)

typedef struct PDStringConvPDStringConvRef
 

Macros / convenience

void _PDBreak ()
 
void PDTwinStreamAsserts (PDTwinStreamRef ts)
 
 fmatox (long long, atoll) fmatox(long
 
#define PDDEF   const void*[]
 
#define PDDef(defs...)   (PDDEF){(void*)defs, NULL}
 
#define PDError(args...)
 
#define PD_WARNINGS
 
#define PD_WARNINGS
 
#define PDWarn(args...)
 
#define PDNotice(args...)
 
#define PDAssert(args...)
 
#define PDRequire(state, retval, msg...)
 
#define as(type, expr...)   ((type)(expr))
 
#define PDInstancePrinterRequire(b, r)
 
#define PDInstancePrinterInit(itype, b, r)
 
#define fmatox(x, ato)
 

Detailed Description

A PDF object stream, i.e. a stream of PDF objects inside a stream.

Normally, objects are located directly inside of the PDF, but an alternative way is to keep objects as so called object streams (Chapter 3.4.6 of PDF specification v 1.7, p. 100).

When a filtering task is made for an object that is determined to be located inside of an object stream, supplementary tasks are automatically set up to generate the object stream instance of the container object and to present the given object as a regular, mutable instance to the requesting task. Upon completion (that is, when returning from the task callback), the object stream is "committed" as stream content to the actual containing object, which in turn is written to the output as normal.


Data Structure Documentation

struct PDObjectStreamElement

Object stream element structure

This is a wrapper around an object inside of an object stream.

Data Fields

PDInteger obid
 object id of element
 
PDInteger offset
 offset inside object stream
 
PDInteger length
 length of the (stringified) definition; only valid during a commit
 
PDObjectType type
 element object type
 
void * def
 definition; NULL if a construct has been made for this element
 
struct PDObjectStream

Object stream internal structure

The object stream is an object in a PDF which itself contains a stream of objects in a specially formatted form.

Data Fields

PDObjectRef ob
 obstream object
 
PDInteger n
 number of objects
 
PDInteger first
 first object's offset
 
PDStreamFilterRef filter
 filter used to extract the initial raw content
 
PDObjectStreamElementRef elements
 n sized array of elements (non-pointered!)
 
PDSplayTreeRef constructs
 instances of objects (i.e. constructs)
 
struct PDEnv

PDState wrapping structure

Data Fields

PDStateRef state
 The wrapped state.
 
pd_stack buildStack
 Build stack (for sub-components)
 
pd_stack varStack
 Variable stack (for incomplete components)
 
struct pd_btree

Binary tree structure

Data Fields

PDInteger key
 The (primitive) key.
 
void * value
 The value.
 
PDSplayTreeRef branch [2]
 The left and right branches of the tree.
 
struct PDOperator

The PDperator internal structure

Data Fields

PDOperatorType type
 The operator type.
 
union {
   PDStateRef   PDOperator::pushedState
 for "PushNewEnv", this is the environment being pushed
 
   char *   PDOperator::key
 the argument to the operator, for PopVariable, Push/StoveComplex, PullBuildVariable
 
   PDID   PDOperator::identifier
 identifier (constant string pointer pointer)
 
}; 
 
PDOperatorRef next
 the next operator, if any
 
struct PDParser

The PDParser internal structure.

Data Fields

PDTwinStreamRef stream
 The I/O stream from the pipe.
 
PDScannerRef scanner
 The main scanner.
 
PDParserState state
 The parser state.
 
pd_stack xstack
 A stack of partial xref tables based on offset; see [1] below.
 
PDXTableRef mxt
 master xref table, used for output
 
PDXTableRef cxt
 current input xref table
 
PDBool done
 parser has passed the last object in the input PDF
 
PDSize xrefnewiter
 iterator for locating unused id's for usage in master xref table
 
pd_stack appends
 stack of objects that are meant to be appended at the end of the PDF
 
pd_stack inserts
 stack of objects that are meant to be inserted as soon as the current object is dealt with
 
PDSplayTreeRef aiTree
 bin-tree identifying the (in-memory) PDObjectRefs in appends and inserts by their object ID's
 
PDObjectRef construct
 cannot be relied on to contain anything; is used to hold constructed objects until iteration (at which point they're released)
 
PDSize streamLen
 stream length of the current object
 
PDSize obid
 object ID of the current object
 
PDSize genid
 generation number of the current object
 
PDSize oboffset
 offset of the current object
 
PDReferenceRef rootRef
 reference to the root object
 
PDReferenceRef infoRef
 reference to the info object
 
PDReferenceRef encryptRef
 reference to the encrypt object
 
PDObjectRef trailer
 the trailer object
 
PDObjectRef root
 the root object, if instantiated
 
PDObjectRef info
 the info object, if instantiated
 
PDObjectRef encrypt
 the encrypt object, if instantiated
 
PDCatalogRef catalog
 the root catalog, if instantiated
 
pd_crypto crypto
 crypto instance, if the document is encrypted
 
PDBool success
 if true, the parser has so far succeeded at parsing the input file
 
PDSplayTreeRef skipT
 whenever an object is ignored due to offset discrepancy, its ID is put on the skip tree; when the last object has been parsed, if the skip tree is non-empty, the parser aborts, as it means objects were lost
 
PDFontDictionaryRef mfd
 Master font dictionary, containing all fonts processed so far.
 
struct PDPageReference

Data Fields

PDBool collection
 If set, this is a /Type /Pages object, which is a group of page and pages references.
 
union {
   struct {
      PDInteger   PDPageReference::count
 Number of entries.
 
      PDPageReference *   PDPageReference::kids
 Kids.
 
   } 
 
   struct {
      PDInteger   PDPageReference::obid
 The object ID.
 
      PDInteger   PDPageReference::genid
 The generation ID.
 
   } 
 
}; 
 
struct PDCatalog

The PDCatalog internal structure.

Data Fields

PDParserRef parser
 The parser owning the catalog.
 
PDObjectRef object
 The object representation of the catalog.
 
PDRect mediaBox
 The media box of the catalog object.
 
PDPageReference pages
 The root pages.
 
PDInteger count
 Number of pages (in total)
 
PDInteger capacity
 Size of kids array.
 
PDIntegerkids
 Array of object IDs for all pages.
 
struct PDScannerSymbol

A scanner symbol.

Data Fields

char * sstart
 symbol start
 
short shash
 symbol hash (not normalized)
 
PDInteger slen
 symbol length
 
PDScannerSymbolType stype
 symbol type
 
struct PDScanner

The internal scanner structure.

Data Fields

PDEnvRef env
 the current environment
 
PDScannerBufFunc bufFunc
 buffer function
 
void * bufFuncInfo
 buffer function info object
 
pd_stack contextStack
 context stack for buffer function/info
 
pd_stack envStack
 environment stack; e.g. root -> arb -> array -> arb -> ...
 
pd_stack resultStack
 results stack
 
pd_stack symbolStack
 symbols stack; used to "rewind" when misinterpretations occur (e.g. for "number_or_obref" when one or two numbers)
 
pd_stack garbageStack
 temporary allocations; only used in operator function when a symbol is regenerated from a malloc()'d string
 
PDStreamFilterRef filter
 filter, if any
 
char * buf
 buffer
 
PDInteger bresoffset
 previously popped result's offset relative to buf
 
PDInteger bsize
 buffer capacity
 
PDInteger boffset
 buffer offset (we are at position &buf[boffset]
 
PDInteger bmark
 buffer mark
 
PDScannerSymbolRef sym
 the latest symbol
 
PDScannerPopFunc popFunc
 the symbol pop function
 
PDBool fixedBuf
 if set, the buffer is fixed (i.e. buffering function should not be called)
 
PDBool failed
 if set, the scanner aborted due to a failure
 
PDBool outgrown
 if true, a scanner with fixedBuf set needed more data
 
PDBool strict
 if true, the scanner will complain loudly when erroring out, otherwise it will silently fail
 
struct pd_stack

The internal stack structure

Data Fields

pd_stack prev
 Previous object in stack.
 
char type
 Stack type.
 
void * info
 The stack content, based on its type.
 
struct PDArray

"Get string object for key" signature for arrays/dictionaries. (Arrays pass integers as keys.) "Get raw object for key" signature for arrays/dictionaries. (Arrays pass integers as keys.) "Remove object for key" signature for arrays/dictionaries. (Arrays pass integers as keys.) "Set object for key to value" signature for arrays/dictionaries. "Make room at index" signature for arrays. The internal array structure. The internal dictionary structure. The internal array structure.

Data Fields

PDInteger count
 Number of elements.
 
PDInteger capacity
 Capacity of array.
 
void ** values
 Resolved values.
 
pd_stackvstacks
 Unresolved values in pd_stack form.
 
PDCryptoInstanceRef ci
 Crypto instance, if array is encrypted.
 
struct PDDictionaryNode

Data Fields

char * key
 the key for this node
 
void * data
 the data
 
void * decrypted
 the data, in decrypted form
 
PDSize hash
 the hash code
 
struct PDDictionary

If set, profiling is done (and printed to stdout occasionally) about how well the hash map is performing The internal dictionary structure.

Data Fields

PDInteger count
 Number of entries.
 
PDInteger bucketc
 Number of buckets.
 
PDInteger bucketm
 Bucket mask.
 
PDArrayRefbuckets
 Buckets containing content.
 
PDArrayRef populated
 Array of buckets which were created (as opposed to remaining NULL due to index never being touched)
 
PDCryptoInstanceRef ci
 Crypto instance, if dictionary is encrypted.
 
struct PDDictionaryStack

The internal dictionary stack structure.

Data Fields

pd_stack dicts
 Stack of PDDictionaries.
 
struct PDFontDictionary

The internal font dictionary structure.

Data Fields

PDParserRef parser
 The owning parser.
 
PDDictionaryRef fonts
 Dictionary mapping font names to their values.
 
PDDictionaryRef encodings
 Encoding dictionary.
 
struct PDFont

The internal font object structure.

Data Fields

PDParserRef parser
 Parser reference.
 
PDFontDictionaryRef fontDict
 The owning font dictionary.
 
PDObjectRef obj
 Font object reference.
 
PDCMapRef toUnicode
 CMap, or NULL if not yet compiled or if non-existent.
 
PDStringEncoding enc
 String encoding, or NULL if not a string encoding.
 
unsigned char * encMap
 Encoding map, a 256 byte array mapping bytes to bytes.
 
struct PDCMap

The internal CID map object structure.

Data Fields

PDDictionaryRef systemInfo
 The CIDSystemInfo dictionary, which has /Registry, /Ordering, and /Supplement.
 
PDStringRef name
 CMapName.
 
PDNumberRef type
 CMapType.
 
PDSize csrCap
 Codespace range capacity.
 
PDSize bfrCap
 BF range capacity.
 
PDSize bfcCap
 BF char capacity.
 
PDSize csrCount
 Number of codespace ranges.
 
PDSize bfrCount
 Number of BF ranges.
 
PDSize bfcCount
 Number of BF chars.
 
PDSize bfcLength
 Length (in bytes) of input characters in BF ranges.
 
PDCMapRangecsrs
 Array of codespace ranges.
 
PDCMapRangeMappingbfrs
 BF ranges.
 
PDCMapCharMappingbfcs
 BF chars.
 
struct PDCryptoInstance

Crypto instance for arrays/dicts.

Data Fields

pd_crypto crypto
 Crypto object.
 
PDInteger obid
 Associated object ID.
 
PDInteger genid
 Associated generation number.
 
struct pd_crypto

The internal crypto structure.

Data Fields

PDStringRef identifier
 PDF /ID found in the trailer dictionary.
 
PDStringRef filter
 filter name
 
PDStringRef subfilter
 sub-filter name
 
PDInteger version
 algorithm version (V key in PDFs)
 
PDInteger length
 length of the encryption key, in bits; must be a multiple of 8 in the range 40 - 128; default = 40
 
PDInteger revision
 revision ("R") of algorithm: 2 if version < 2 and perms have no 3 or greater values, 3 if version is 2 or 3, or P has rev 3 stuff, 4 if version = 4
 
PDStringRef owner
 owner string ("O"), 32-byte string based on owner and user passwords, used to compute encryption key and determining whether a valid owner password was entered
 
PDStringRef user
 user string ("U"), 32-byte string based on user password, used in determining whether to prompt the user for a password and whether given password was a valid user or owner password
 
int32_t privs
 privileges (see Table 3.20 in PDF spec v 1.7, p. 123-124)
 
PDBool encryptMetadata
 whether metadata should be encrypted or not ("/EncryptMetadata true")
 
PDStringRef enckey
 encryption key
 
PDInteger cfLength
 crypt filter length, e.g. 16 for AESV2
 
pd_crypto_method cfMethod
 crypt filter method
 
pd_auth_event cfAuthEvent
 when authentication occurs; currently only supports '/DocOpen'
 
struct PDState

The internal PDState structure

Data Fields

PDBool iterates
 if true, scanner will stop while in this state, after reading one entry
 
char * name
 name of the state
 
char ** symbol
 symbol strings
 
PDInteger symbols
 number of symbols in total
 
PDIntegersymindex
 symbol indices (for hash)
 
short symindices
 number of index slots in total (not = symbols, often bigger)
 
PDOperatorRefsymbolOp
 symbol operators
 
PDOperatorRef numberOp
 number operator
 
PDOperatorRef delimiterOp
 delimiter operator
 
PDOperatorRef fallbackOp
 fallback operator
 
struct PDStaticHash

The internal static hash structure

Data Fields

PDInteger entries
 Number of entries in static hash.
 
PDInteger mask
 The mask.
 
PDInteger shift
 The shift.
 
PDBool leaveKeys
 if set, the keys are not deallocated on destruction; default = false (i.e. dealloc keys)
 
PDBool leaveValues
 if set, the values are not deallocated on destruction; default = false (i.e. dealloc values)
 
void ** keys
 Keys array.
 
void ** values
 Values array.
 
void ** table
 The static hash table.
 
struct PDTask

The internal task structure

Data Fields

PDBool isActive
 Whether task is still active; sometimes tasks cannot be unloaded properly even though the task returned PDTaskUnload; these tasks have their active flag unset instead.
 
PDBool isFilter
 Whether task is a filter or not. Internally, a task is only a filter if it is assigned to a specific object ID or IDs.
 
PDPropertyType propertyType
 The filter property type.
 
PDInteger value
 The filter value, if any.
 
PDTaskFunc func
 The function callback, if the task is not a filter.
 
PDTaskRef child
 The task's child task; child tasks are called in order.
 
PDDeallocator deallocator
 The deallocator for the task.
 
void * info
 The (user) info object.
 
struct PDTwinStream

The internal twin stream structure

Data Fields

PDTwinStreamMethod method
 The current method.
 
PDScannerRef scanner
 the master scanner
 
FILE * fi
 Reader.
 
FILE * fo
 writer
 
fpos_t offsi
 absolute offset in input for heap
 
fpos_t offso
 absolute offset in output for file pointer
 
char * heap
 heap in which buffer is located
 
PDSize size
 size of heap
 
PDSize holds
 bytes in heap
 
PDSize cursor
 position in heap (bytes 0..cursor have been written (unless discarded) to output)
 
char * sidebuf
 temporary buffer (e.g. for Fetch)
 
PDBool outgrown
 if true, a buffer with growth disallowed attempted to grow and failed
 
struct PDPipe

Internal structure.

Data Fields

PDBool opened
 Whether pipe has been opened or not.
 
PDBool dynamicFiltering
 Whether dynamic filtering is necessary; if set, the static hash filtering of filters is skipped and filters are checked for all objects.
 
PDBool typedTasks
 Whether type tasks (excluding unfiltered tasks) are activated; activation results in a slight decrease in performance due to all dictionary objects needing to be resolved in order to check their Type dictionary key.
 
char * pi
 The path of the input file.
 
char * po
 The path of the output file.
 
FILE * fi
 Reader.
 
FILE * fo
 Writer.
 
PDInteger filterCount
 Number of filters in the pipe.
 
PDTwinStreamRef stream
 The pipe stream.
 
PDParserRef parser
 The parser.
 
PDSplayTreeRef filter
 The filters, in a tree with the object ID as key.
 
pd_stack typeTasks [_PDFTypeCount]
 Tasks which run depending on all objects of the given type; the 0'th element (type NULL) is triggered for all objects, and not just objects without a /Type dictionary key.
 
PDSplayTreeRef attachments
 PDParserAttachment entries.
 
struct PDReference

Internal reference structure

Data Fields

PDInteger obid
 The object ID.
 
PDInteger genid
 The generation number.
 
struct PDNumber

Internal number structure.

Data Fields

PDObjectType type
 Type of the number; as a special case, PDObjectTypeReference is used for pointers.
 
union {
   PDInteger   i
 
   PDReal   r
 
   PDBool   b
 
   PDSize   s
 
   void *   p
 
}; 
 
struct PDString

Internal string structure.

Data Fields

PDStringType type
 Type of the string.
 
PDFontRef font
 The font associated with the string.
 
PDStringEncoding enc
 Encoding of the string.
 
PDSize length
 Length of the string.
 
PDBool wrapped
 Whether the string is wrapped.
 
char * data
 Buffer containing string data.
 
PDStringRef alt
 Alternative representation.
 
PDCryptoInstanceRef ci
 Crypto instance.
 
PDBool encrypted
 Flag indicating whether the string is encrypted or not; the value of this flag is UNDEFINED if ci == NULL.
 
struct PDStringConv

Internal string conversion structure

Data Fields

char * allocBuf
 The allocated buffer.
 
PDInteger offs
 The current offset inside the buffer.
 
PDInteger left
 The current bytes left (remaining) in the buffer.
 

Macro Definition Documentation

#define as (   type,
  expr... 
)    ((type)(expr))

Macro for making casting of types a bit less of an eyesore.

as(PDInteger, stack->info)->prev is the same as ((PDInteger)(stack->info))->prev

Parameters
typeThe cast-to type
exprThe expression that should be cast
#define fmatox (   x,
  ato 
)
Value:
static inline x fast_mutative_##ato(char *str, PDInteger len) \
{ \
char t = str[len]; \
str[len] = 0; \
x l = ato(str); \
str[len] = t; \
return l; \
}
long PDInteger
Definition: PDDefines.h:184

Fast mutative atoXXX inline function generation macro.

Parameters
xFunction return type.
atoMethod
#define PDAssert (   args...)
Value:
if (!(args)) { \
PDWarn("assertion failure : %s", #args); \
assert(args); \
}
#define PDWarn(args...)
Definition: pd_internal.h:962

Assert that expression is non-false.

If PD_WARNINGS is set, prints out the expression to stderr along with "assertion failure", then re-asserts expression using stdlib's assert()

If PD_WARNINGS is unset, simply re-asserts expression using stdlib's assert().

Parameters
argsExpression which must resolve to non-false (i.e. not 0, not nil, not NULL, etc).
#define PDDEF   const void*[]

Pajdeg definition list.

#define PDDef (   defs...)    (PDDEF){(void*)defs, NULL}

Wrapper for null terminated definitions.

#define PDError (   args...)
Value:
do {\
fprintf(stderr, "[pajdeg::error] %s:%d - ", __FILE__,__LINE__); \
fprintf(stderr, args); \
fprintf(stderr, "\n"); \
_PDBreak();\
} while (0)

Print an error to stderr, if user has turned on PD_WARNINGS.

Parameters
argsFormatted variable argument list.
#define PDInstancePrinterInit (   itype,
  b,
 
)
Value:
itype i = (itype) inst;\
PDInstancePrinterRequire(b, r)
#define PDInstancePrinterRequire (   b,
 
)
Value:
if (*cap - offs < b) { \
*cap += r; \
*buf = realloc(*buf, *cap); \
}
#define PDNotice (   args...)

Print an informational message (a "weak" warning) to stderr, if user has turned on PD_NOTICES.

Parameters
argsFormatted variable argument list.
#define PDRequire (   state,
  retval,
  msg... 
)
Value:
if (!(state)) { \
PDWarn("requirement failure : " msg); \
PDAssert(state); \
return retval; \
}
#define PDAssert(args...)
Definition: pd_internal.h:1004
#define PDWarn(args...)
Definition: pd_internal.h:962

Require that the given state is true, and print out msg (format string), and return retval if it is not. In addition, if asserts are enabled, throw an assertion. The difference between this and PDAssert is that this code is guaranteed to abort the operation even if in a production environment, whereas PDAssert will be silently ignored for !DEBUG && !PD_ASSERTS.

#define PDWarn (   args...)
Value:
do { \
fprintf(stderr, "[pajdeg::warning] %s:%d - ", __FILE__,__LINE__); \
fprintf(stderr, args); \
fprintf(stderr, "\n"); \
} while (0)

Print a warning to stderr, if user has turned on PD_WARNINGS.

Parameters
argsFormatted variable argument list.

Typedef Documentation

A PDF object stream.

The PDPageReference internal structure.

typedef struct PDXTable* PDXTableRef

Enumeration Type Documentation

The state of a PDParser instance.

Enumerator
PDParserStateBase 

parser is in between objects

PDParserStateObjectDefinition 

parser is right after 1 2 obj and right before whatever the object consists of

PDParserStateObjectAppendix 

parser is right after the object's content, and expects to see endobj or stream next

PDParserStateObjectPostStream 

parser is right after the endstream keyword, at the endobj keyword

Scanner symbol type.

Enumerator
PDScannerSymbolTypeDefault 

standard symbol

PDScannerSymbolTypeWhitespace 

PDF whitespace character.

PDScannerSymbolTypeDelimiter 

PDF delimiter character.

PDScannerSymbolTypeNumeric 

a numeric symbol

PDScannerSymbolTypeEOB 

end of buffer marked

PDScannerSymbolTypeFake 

fake symbol, which is when sstart is actually a real string, rather than a pointer into the stream buffer

Function Documentation

void PDEnvDestroy ( PDEnvRef  env)

Destroy an environment

Parameters
envThe environment
void PDObjectStreamCommit ( PDObjectStreamRef  obstm)

Commit an object stream to its associated object.

If changes to an object in an object stream are made, they are not automatically reflected.

PDObjectStreamRef PDObjectStreamCreateWithObject ( PDObjectRef  object)

Createa an object stream with the given object.

Parameters
objectObject whose stream is an object stream.
PDObjectRef PDObjectStreamGetObjectAtIndex ( PDObjectStreamRef  obstm,
PDInteger  index 
)

Get the object at the given index out of the object stream.

Assertion is thrown if index is out of bounds.

The index of an object stream object can be determined through the /N dictionary entry in the object itself.

Parameters
obstmThe object stream.
indexObject stream index.
Returns
The object.
PDObjectRef PDObjectStreamGetObjectByID ( PDObjectStreamRef  obstm,
PDInteger  obid 
)

Get the object with the given ID out of the object stream.

Object is mutable with regular object mutability conditions applied.

Parameters
obstmThe object stream.
obidThe id of the object to fetch.
Returns
The object, or NULL if not found.
void PDObjectStreamParseExtractedObjectStream ( PDObjectStreamRef  obstm,
char *  buf 
)

Parse the extracted object stream and set up the object stream structure.

This is identical to PDObjectStreamParseRawObjectStream except that this method presumes that decoding has been done, if necessary.

Parameters
obstmThe object stream.
bufThe buffer.
PDBool PDObjectStreamParseRawObjectStream ( PDObjectStreamRef  obstm,
char *  rawBuf 
)

Parse the raw object stream rawBuf and set up the object stream structure.

If the object has a defined filter, the object stream decodes the content before parsing it.

Parameters
obstmThe object stream.
rawBufThe raw buffer.
Returns
True if object stream was parsed successfully, false if an error occurred.
void PDTwinStreamAsserts ( PDTwinStreamRef  ts)

Perform assertions related to the twin stream's internal state.

Parameters
tsThe twin stream.