Lib:Architecture/Base Layer/Stream Module

From GNUpdf
Jump to: navigation, search
Library Module
Stream Module
Layer Base
API Documentation Reference Manual
Source Files src/base/pdf-stm.h

src/base/pdf-stm.c

Contents

Overview

This module provides read/write streams of data to memory buffers and open files adding the following functionality:

  • Filtering.
  • Buffering.
(thumbnail)
Stream overview

Several streams can be created to operate on the same open file. This provides a convenient access to files with several parts requiring different filters to read or write its contents.

(thumbnail)
Several streams operating in one open file

Filters (such as the PDF standard ones) are supported for both reading and writing (depending on the mode of operation of the stream. See below). Many filters may be used in a single stream (those filters are applied in order when writing or reading).

(thumbnail)
Stream filtering support

The streams maintain a buffer for both reading and writing. The size of the buffer is specified by the client in creation time. This is used, for example, to provide efficient character-based I/O.

Operation Modes

A stream can be opened in one of the following operation modes:

Read mode
The stream is opened for reading. Writing is forbidden. The initial position of the read-pointer is 0. If it is a file stream then the underlying open file should support reading.
Write mode
The stream is opened for writing. Reading is forbidden. The initial position of the write-pointer is 0. If it is a file stream then the underlying open file should support writing.

Reading data

A stream provides the following operations to read data from the memory buffer or open file:

  • Read a chunk of a specified number of consecutive octects and store it in a given buffer.
  • Read a single character (octet) in an efficient way and return its numeric code.
  • Peek a single character (octet) in an efficient way and return its numeric code.

The stream manages the end-of-data condition in the following way:

  • When reading a specified number of consecutive octects, it will return the number of octects currently readed and stored in the specified buffer. If that number if less than the requested number of octects then the stream is in a end-of-data condition.
  • When reading or peeking a single character, return an integer able to store a special value PDF_EOF. The caller then can check that condition before casting the returned value to a character one.

Writing data

The stream provides the following operations to write data in a memory buffer or open file:

  • Write a chunk of a specified number of consecutive octects.

The stream manages the end-of-data condition returning the number of octects actually written in the stream. If that number is less than the requested number of octects then the stream is in a end-of-data condition.

Positioning

The streams support the notion of read and write pointers. A pointer value is measured in positions, where a position is a offset relative to the beginning of the stream storage in octects.

(thumbnail)
A (read/write) pointer of a stream

Each stream has one pointer. When a stream is created the pointer is set to 0.

The read chunk of bytes, read a character and write a chunk of bytes modify the current value of the pointer.

In contrast, the peek a character operation does not change the value of the pointer.

Filtering support

The stream abstraction provides support for filters. A filter is a transformation F( ) that can be applied to the contents of a input buffer and return an output buffer with the modified content.

Filters can be concatenated to form what is called a filter chain. The client program is then able to build the chain, specifying the number of filters, its types and its position. The stream abstraction support a predefined set of filter types.

A stream is created with an empty filter chain.

Supported filters

The Stream module provide support for all the PDF standard filters as defined in the PDF 1.7 Reference, Chapter 3, Section 3.

Namely:

Internals

Buffers

A pdf_stm_buffer_t is a simple data type implementing a buffer in memory.

(thumbnail)
A memory buffer

The following function is used to create a new memory buffer:

pdf_stm_buffer_new (SIZE)
Create a new memory buffer with the given size.

The following functions provide information about the current state of a buffer:

pdf_stm_buffer_full_p (BUFFER)
Return PDF_TRUE if the buffer is full. Return PDF_FALSE otherwise.
pdf_stm_buffer_eob_p (BUFFER)
Return PDF_TRUE if the buffer is in an "end of buffer" condition. Return PDF_FALSE otherwise.

The following funtion can be used to resize a buffer:

pdf_status_t pdf_stm_buffer_resize (BUFFER, NEW_SIZE)
Resize a given buffer.
If NEW_SIZE is larger than the actual size of buffer then buffer size will be increased, while both reading and writing pointers will remain unchanged.
If NEW_SIZE is shorter than the actual size of BUFFER then the memory pointed by BUFFER->data will be truncated, the BUFFER size will be adjusted to the new value and both reading and writing pointers will be set to BUFFER size if they are exceeding the new size of the buffer.
pdf_status_t pdf_stm_buffer_rewind (BUFFER)
Rewind the buffer (both the reading and writing pointers are set to 0).

The following function can be used to rewind a buffer:

pdf_status_t pdf_stm_buffer_rewind (BUFFER)
Rewind the buffer (both the reading and writing pointers are set to 0).

The following properties can be used with buffer variables:

Read pointer (buffer->rp)
Write pointer (buffer->wp)
Size of the buffer (buffer->size)

Backends

A pdf_stm_be_t is an opaque data type implementing the backend used by the stream to read or write information.

(thumbnail)
A stream backend

There are two types of stream backends:

File backends
Backends wrapping filesystem files.
Memory backends
Backends wrapping memory buffers.

A backend support the following operations:

pdf_stm_be_read (BE, BYTES)
Read the requested number of BYTES and put these in BUFFER. Return the number of actually readed bytes.
pdf_stm_be_write (BE, BYTES)
Write the requested number of BYTES from BUFFER into the backend. Return the number of actually written bytes.
pdf_stm_be_seek (BE, POS)
Move the read/write pointer of the backend to POS.
pdf_stm_be_tell (BE)
Return the current position of the read/write pointer in the backend.

Filters

A filter is a component that performs some transformation in the contents of an input memory buffer and fills an output memory buffer with the result of the transformation.

Note: some filters impose certain minimum cache size limits, the default cache size defined by the library is assured to work for any filter.

(thumbnail)
A stream filter

If a filter needs more output in order to fill its output buffer it may call another filter or a backend in order to get its input buffer refilled.

A filter abstraction is composed by:

An application function F()
This internal function implement the logic of the filter. Reading from an input buffer it generates output. The specific transformation depends on the nature of the filter. Note that the size of the output of an application function can be larger than the input size (a decompression filter), shorter than the input size (a compression filter) or equal to the input size. Note also that in general it is not possible to know in advance the size of the output generated by an application function.
An input buffer
This buffer contain the input for the application function. Its purpose is to serve as a cache.
An output buffer
This is the buffer (provided by the client of the filter) where the filter writes its output.

A filter support the following operations:

init
This operation initializes the filter providing its configuration parameters.
apply
This operation ask the filter to generate bytes by calling its internal application function until the output buffer gets full. It returns a status code.
dealloc_state
This operation deallocates any private state and does clean up work, calling other functions and such.

Filter implementations

The library supports a number of predefined filter implementations, one per filter type.

Filter implementations are implemented in the files:

   src/base/pdf-stm-f-XXX.[ch]

where XXX is the name of the algorithm implemented by the filter. Note that usually we implement two filters in each pdf-stm-f-XXX.[ch]:

  • The compression filter.
  • The uncompression filter.

Of course, in the case of filters with only one-way operation (such as the null filter) the implementation files only contain one filter implementation.

A filter implementation consist of the definition of several functions:

  • pdf_stm_filter_XXX_init
  • pdf_stm_filter_XXX_apply
  • pdf_stm_filter_XXX_dealloc_state

The prototype of a filter initialization function should be like:

   pdf_status_t pdf_stm_filter_XXX_init (FILTER_PARAMS, FILTER_STATE)

The arguments of that call are:

FILTER_PARAMS
A PDF hash variable containing the configuration parameters for the specific filter.
FILTER_STATE
A void pointer variable to be used by the filter to hold its private state.

The prototype of a filter application function should be like:

   pdf_status_t pdf_stm_filter_XXX_apply (FILTER_PARAMS, FILTER_STATE, INPUT_BUFFER, OUTPUT_BUFFER, FINISH_P)

The arguments of that call are:

FILTER_PARAMS
A PDF hash variable containing the configuration parameters for the specific filter. The filter may assume that it will always receive the same hash.
FILTER_STATE
A void pointer variable to be used by the filter to hold its private state. The filter may assume that it will always receive the same pointer.
INPUT_BUFFER
The input memory buffer (a pdf_stm_buffer_t variable).
OUTPUT_BUFFER
The output memory buffer (a pdf_stm_buffer_t variable).
FINISH_P
Some filters needs to append some kind of trailer to the end of the filtered data (such as "~>" in the case of the ascii hex filter). If this parameter is true then the apply operation should append the trailer to the filtered data. If the filter does not need to write down trailer information then this argument can be ignored.

The return values of the apply function of a filter implementation are:


PDF_ENINPUT
The filter implementation has processed all the input buffer and is ready to receive more input when available, via a new call to the 'apply' function. It is assumed that the input buffer is empty after the apply function returns this value.
PDF_ENOUTPUT
The filter implementation needs more room in the output buffer, and is ready to fill it when it becomes available, via a new call to the 'apply' function. It is assumed that the output buffer is full after the apply function returns this value.
PDF_ERROR

Error in the data processed by the filter. If the filter implementation returns this value then the 'apply' function wont be called again without a previous call to 'init'.

PDF_EEOF
This value should be returned in the following situations:
There has been an EOF condition produced by some characteristics in the input data. Only the filters interpreting EOD markers or working in fixed-size blocks of data will return PDF_EEOF due to this reason.
The filter implementation has emitted data due to a finalization request (via the finish_p) parameter. Only the filters supporting finalization data will return PDF_EEOF due to this reason.

The prototype of a filter deallocation function should be like:

   pdf_status_t pdf_stm_filter_XXX_dealloc_state (FILTER_STATE)

The arguments of that call are:

FILTER_STATE
A void pointer variable to the state allocated in the initialization function.


The reading process

A client can ask a read stream to provide a specific number of bytes and to store it in a specified buffer. This interface is like the read operation we can find in C libraries:

   pdf_size_t pdf_stm_read (*BUFFER, BYTES);

The stream then tries to provide the asked number of bytes and return the number of actually readed bytes: if the returned number is less than the asked one then a backend-exhausted condition happened.

(thumbnail)
A read stream

A cache is used to get the information from the filter chain. A backend provides data to the filter chain.

The writing process

A client can ask a write stream to consume a specific number of bytes from a specified buffer. This interface is like the write operation we can fin in C libraries:

   pdf_size_t pdf_stm_write (*BUFFER, BYTES);

The stream then tries to consume the asked number of bytes and return the number of actually written bytes: if the returned number is less than the asked one then a backend-full condition happened.

(thumbnail)
A write stream

In a write stream the cache is used to store the data produced by the filter pipeline. The filter pipeline obtain the data from the user provided buffer. Finally, the data in the cache is written into a backend.

Personal tools
Namespaces

Variants
Actions
project
Tools