Appendix A. How to Write an RTF Reader

An RTF reader must do three basic things:

Separate text from RTF controls.
Parse an RTF control.
Dispatch an RTF control.

Separating text from RTF controls is relatively simple, because all RTF controls begin with a backslash. Therefore, any incoming character that is not a backslash is text and will be handled as text. (Of course, what one does with that text may be relatively complicated.)

Parsing an RTF control is also relatively simple. An RTF control is either (a) a sequence of alphabetic characters followed by an optional numeric parameter, or (b) a single non-alphanumeric character.

Dispatching an RTF control, on the other hand, is relatively complicated. A recursive-descent parser tends to be overly strict because RTF is intentionally vague about the order of various properties relative to one another. However, whatever method you use to dispatch an RTF control, your reader should do the following:

Ignore control words you don't understand.
Many readers crash when they come across an unknown RTF control. Because Microsoft is continually adding new RTF controls, this limits an RTF reader to working with the RTF from one particular product (usually some version of Word for Windows).
Always understand \*.
One of the most important things an RTF reader can do is to understand the \* control. This control introduces a destination that is not part of the document. It tells the RTF reader that if the reader does not understand the next control word, then it should skip the entire enclosing group. If your reader follows this rule and the one above, your reader will be able to cope with any future change to RTF short of a complete rewrite.
Remember that binary data can occur when you're skipping RTF.
A simple way to skip a group in RTF is to keep a running count of the opening braces that the reader has encountered in the RTF stream. When the reader sees an opening brace, it increments the count; when the reader sees a closing brace, it decrements the count. When the count becomes negative, the end of the group has been found. Unfortunately, this doesn't work when the RTF file contains a \bin control; the reader must explicitly check each control word found to see if it's a \bin control, and, if a \bin control is found, skip that many bytes before resuming its scanning for braces.

A Sample RTF Reader Implementation

The Microsoft Word Processing Conversions group uses a table-driven approach to reading RTF. This approach allows the most flexibility in reading RTF, with the corresponding problem that it's difficult to detect incorrect RTF. An RTF reader that is based on this approach is presented below. This reader works exactly as described in the RTF specification and uses the principles of operation described in the RTF specification. This reader is designed to be simple to understand but is not intended to be very efficient. This RTF reader also implements the three design principles listed in the previous section.

The RTF reader consists of four files:

Rtfdecl.h, which contains the prototypes for all the functions in the RTF reader
Rtftype.h, which contains the types used in the RTF reader
Rtfreadr.c, which contains the main program, the main loop of the RTF reader, and the RTF control parser
Rtfactn.c, which contains the dispatch routines for the RTF reader

Rtfdecl.h and Rtfreadr.c

Rtfdecl.h is straightforward and requires little explanation.

rtfreadr.c is also reasonably straightforward; the function ecRtfParse separates text from RTF controls and handles text, and the function ecParseRtfKeyword parses an RTF control and also collects any parameter that follows the RTF control.

Rtftype.h

Rtftype.h begins by declaring a sample set of character, paragraph, section, and document properties. These structures are present to demonstrate how the dispatch routines can modify any particular property and are not actually used to format text.

For example, the following enumeration describes which destination text should be routed to:

typedef enum { rdsNorm, rdsSkip } RDS;

Because this is just a sample RTF reader, there are only two destinations; a more complicated reader would add an entry to this enumeration for each destination supported [for example, headers, footnotes, endnotes, comments (annotations), bookmarks, and pictures].

The following enumeration describes the internal state of the RTF parser:

typedef enum { risNorm, risBin, risHex } RIS;

This is entirely separate from the state of the dispatch routines and the destination state; other RTF readers may not necessarily have anything similar to this.

The following structure encapsulates the state that must be saved at a group start and restored at a group end:

typedef struct save
{
struct save *pNext;
CHP chp;
PAP pap;
SEP sep;
DOP dop;
RDS rds;
RIS ris;
} SAVE;

The following enumeration describes a set of classes for RTF controls:

typedef enum {kwdChar, kwdDest, kwdProp, kwdSpec} KWD;

Use kwdChar for controls that represent special characters (such as \-, \{, or \}).

Use kwdDest for controls that introduce RTF destinations.

Use kwdProp for controls that modify some sort of property.

Use kwdSpec for controls that need to run some specialized code.

The following enumeration defines the number of PROP structures (described below) that will be used. There will typically be an iprop for every field in the character, paragraph, section, and document properties.

typedef enum {ipropBold, ipropItalic, ipropUnderline, ipropLeftInd,
ipropRightInd, ipropFirstInd, ipropCols, ipropPgnX, ipropPgnY,
ipropXaPage, ipropYaPage, ipropXaLeft, ipropXaRight,
ipropYaTop, ipropYaBottom, ipropPgnStart, ipropSbk, 
ipropPgnFormat, ipropFacingp, ipropLandscape, ipropJust,
ipropPard, ipropPlain,
ipropMax} IPROP;

The following structure is a very compact way to describe how to locate the address of a particular value in one of the property structures:

typedef enum {actnSpec, actnByte, actnWord} ACTN;
typedef enum {propChp, propPap, propSep, propDop} PROPTYPE;

typedef struct propmod
{
ACTN actn;
PROPTYPE prop;
int offset;
} PROP;

The actn field describes the width of the value being described: if the value is a byte, then actn is actnByte; if the value is a word, then actn is actnWord; if the value is neither a byte nor a word, then you can use actnSpec to indicate that some C code needs to be run to set the value. The prop field indicates which property structure is being described; propChp indicates that the value is located within the CHP structure; propPap indicates that the value is located within the PAP structure, and so on. Finally, the offset field contains the offset of the value from the start of the structure. The offsetof() macro is usually used to initialize this field.

The following structure describes how to parse a particular RTF control:

typedef enum {ipfnBin, ipfnHex, ipfnSkipDest } IPFN;
typedef enum {idestPict, idestSkip } IDEST;

typedef struct symbol
{
char *szKeyword;
int dflt;
bool fPassDflt;
KWD kwd;
int idx;
} SYM;

szKeyword points to the RTF control being described; kwd describes the class of the particular RTF control (described above); dflt is the default value for this control, and fPassDflt should be nonzero if the value in dflt should be passed to the dispatch routine. (fPassDflt is only nonzero for control words that normally set a particular value. For example, the various section break controls typically have nonzero fPassDflt controls, but controls that take parameters should not.)

Idx is a generalized index; its use depends on the kwd being used for this control.

If kwd is kwdChar, then idx is the character that should be output.
If kwd is kwdDest, then idx is the idest for the new destination.
If kwd is kwdProp, then idx is the iprop for the appropriate property.
If kwd is kwdSpec, then idx is an ipfn for the appropriate function.

With this structure, it is very simple to dispatch an RTF control word. Once the reader isolates the RTF control word and its (possibly associated) value, the reader then searches an array of SYM structures to find the RTF control word. If the control word is not found, the reader ignores it, unless the previous control was \*, in which case the reader must scan past an entire group.

If the control word is found, the reader then uses the kwd value from the SYM structure to determine what to do. This is, in fact, exactly what the function ecTranslateKeyword in the file RTFACTN.C does.

Rtfactn.c

Rtfactn.c contains the tables describing the properties and control words, and the routines to evaluate properties (ecApplyPropChange) and to dispatch control words (ecTranslateKeyword).

The tables are the keys to understanding the RTF dispatch routines. The following are some sample entries from both tables, together with a brief explanation of each entry.

The Property Table. This table must have an entry for every iprop.

The property below says that the ipropBold property is a byte parameter bound to chp.fBold.
```
actnByte,   propChp,    offsetof(CHP, fBold),       // ipropBold
```
This property says that ipropRightInd is a word parameter bound to pap.xaRight.
```
actnWord,   propPap,    offsetof(PAP, xaRight),     // ipropRightInd
```
This property says that ipropCols is a word parameter bound to sep.cCols.
```
actnWord,   propSep,    offsetof(SEP, cCols),       // ipropCols
```
This property says that ipropPlain is a special parameter. Instead of directly evaluating it, ecApplyPropChange will run some custom C code to apply a property change.
```
actnSpec,   propChp,    0,                          // ipropPlain
```

The Control Word Table

This structure says that the control \b sets the ipropBold property. Because fPassDflt is False, the reader only uses the default value if the control does not have a parameter. If no parameter is provided, the reader uses a value of 1.
```
 1, fFalse, kwdProp, ipropBold, 
		
```
This entry says that the control \sbknone sets the ipropSbk property. Because fPassDflt is True, the reader always uses the default value of sbkNon, even if the control has a parameter.
```
 sbkNon, fTrue, kwdProp, ipropSbk, 
		
```
This entry says that the control \par is equivalent to a 0x0a (linefeed) character.
```
 0, fFalse, kwdChar, 0x0a, 
		
```
This entry says that the control \tab is equivalent to a 0x09 (tab) character.
```
 0, fFalse, kwdChar, 0x09, 
		
```
This entry says that the control \bin should run some C code. The particular piece of C code can be located by the ipfnBin parameter.
```
 0, fFalse, kwdSpec, ipfnBin, 
		
```
This entry says that the control \fonttbl should change to the destination idestSkip.
```
 0, fFalse, kwdDest, idestSkip, 
		
```

Notes on Implementing Other RTF Features

The table-driven approach to dispatching RTF controls used by the sample converter does not implement any syntax checking. For most controls, this is not a problem; a control simply modifies the appropriate property. However, some controls, such as those for tabs and borders, are dependent on other control words either before or after the current control word.

There are standard techniques for handling these features.

Tabs and Other Control Sequences Terminating in a Fixed Control

The best way to implement these types of control sequences is to have a global structure that represents the current state of the tab descriptor (or other entity). As the modifiers come in, they modify the various fields of the global structure. When the fixed control at the end of the sequence is dispatched, it adds the entire descriptor and reinitializes the global variable.

Borders and Other Control Sequences Beginning with a Fixed Control

The best way to implement these types of control sequences is to have a global pointer that is initialized when the fixed control is dispatched. The controls that modify the fixed control then modify fields pointed to by the control.

Style Sheets

Style sheets can be handled as destinations; however, styles have default values, just as every other control does. RTF readers should be sure to handle a missing style control as the default style value (that is, 0).

Property Changes

Some RTF readers use various bits of RTF syntax to mark property changes. In particular, they assume that property changes will occur only after a group start, which is not correct. Because there is a variety of ways to represent identical property changes in RTF, RTF readers should look at the changes in the properties and not at any particular way of representing a property change. In particular, properties can be changed explicitly with a control word or implicitly at the end of a group. For example, these three sequences of RTF have exactly the same semantics, and should be translated identically:

{\b bold \i Bold Italic \i0 Bold again}
{\b bold {\i Bold Italic }Bold again}
{\b bold \i Bold Italic \plain\b Bold again}

Fields

All versions of Microsoft Word for Windows and version 6.0 and later of Microsoft Word for the Macintosh have fields. If you're writing an RTF reader and expect to do anything with fields, keep the following notes in mind:

Field instructions may have arbitrary amounts of character formatting and arbitrarily nested groups. While the groups will be properly nested within the field instructions, you may be inside an arbitrary number of groups by the time you know which field you are working with. If you then expect to be able to skip to the end of the field instructions, you'll have to know how many groups have started so that you can skip to the end properly.
Some fields, the INCLUDE field in particular, can have section breaks in the field results. If this occurs, then the text after the end of the field does not have the same section properties as the text at the start of the field; the section properties must not be restored when the field results contain section breaks.

Tables

Tables are probably the trickiest part of RTF to read and write correctly. Because of the way Microsoft word processors implement tables, and the table-driven approach of many Microsoft RTF readers, it is very easy to write tables in RTF that will crash Microsoft word processors when you try to read the RTF. Here are some guidelines to reduce problems with tables in RTF:

Place the entire table definition before any paragraph properties, including \pard.
Make sure the number of cells in the RTF matches the number of cell definitions.
Some controls must be the same in all paragraphs in a row. In particular, all paragraphs in a row must have the same positioning controls, and all paragraphs in a row must have \intbl specified.
Do not use the \sbys control inside a table. \sbys is a holdover from Word for MS-DOS and early versions of Word for the Macintosh. Word for Windows and current versions of Word for the Macintosh translate \sbys as a table. Because Word for Windows and Word for the Macintosh do not support nested tables, these products will probably crash if you specify \sbys in a table.
Cell definitions starting before the left margin of the paper begins (that is, the parameter plus the left margin is negative) are always in error. Appendix A-1: Listings

Rtfdecl.h

// RTF parser declarations

int ecRtfParse(FILE *fp);
int ecPushRtfState(void);
int ecPopRtfState(void);
int ecParseRtfKeyword(FILE *fp);
int ecParseChar(int c);
int ecTranslateKeyword(char *szKeyword, int param, bool fParam);
int ecPrintChar(int ch);
int ecEndGroupAction(RDS rds);
int ecApplyPropChange(IPROP iprop, int val);
int ecChangeDest(IDEST idest);
int ecParseSpecialKeyword(IPFN ipfn);
int ecParseSpecialProperty(IPROP iprop, int val);
int ecParseHexByte(void);

// RTF variable declarations

extern int cGroup;
extern RDS rds;
extern RIS ris;

extern CHP chp;
extern PAP pap;
extern SEP sep;
extern DOP dop;

extern SAVE *psave;
extern long cbBin;
extern long lParam;
extern bool fSkipDestIfUnk;
extern FILE *fpIn;

// RTF parser error codes

#define ecOK 0                      // Everything's fine!
#define ecStackUnderflow    1       // Unmatched '}'
#define ecStackOverflow     2       // Too many '{' -- memory exhausted
#define ecUnmatchedBrace    3       // RTF ended during an open group.
#define ecInvalidHex        4       // invalid hex character found in data
#define ecBadTable          5       // RTF table (sym or prop) invalid
#define ecAssertion         6       // Assertion failure
#define ecEndOfFile         7       // End of file reached while reading RTF

Rtftype.h

typedef char bool;
#define fTrue 1
#define fFalse 0

typedef struct char_prop
{
    char fBold;
    char fUnderline;
    char fItalic;
} CHP;                  // CHaracter Properties

typedef enum {justL, justR, justC, justF } JUST;
typedef struct para_prop
{
    int xaLeft;                 // left indent in twips
    int xaRight;                // right indent in twips
    int xaFirst;                // first line indent in twips
    JUST just;                  // justification
} PAP;                  // PAragraph Properties

typedef enum {sbkNon, sbkCol, sbkEvn, sbkOdd, sbkPg} SBK;
typedef enum {pgDec, pgURom, pgLRom, pgULtr, pgLLtr} PGN;
typedef struct sect_prop
{
    int cCols;                  // number of columns
    SBK sbk;                    // section break type
    int xaPgn;                  // x position of page number in twips
    int yaPgn;                  // y position of page number in twips
    PGN pgnFormat;              // how the page number is formatted
} SEP;                  // SEction Properties

typedef struct doc_prop
{
    int xaPage;                 // page width in twips
    int yaPage;                 // page height in twips
    int xaLeft;                 // left margin in twips
    int yaTop;                  // top margin in twips
    int xaRight;                // right margin in twips
    int yaBottom;               // bottom margin in twips
    int pgnStart;               // starting page number in twips
    char fFacingp;              // facing pages enabled?
    char fLandscape;            // landscape or portrait??
} DOP;                  // DOcument Properties

typedef enum { rdsNorm, rdsSkip } RDS;              // Rtf Destination State
typedef enum { risNorm, risBin, risHex } RIS;       // Rtf Internal State

typedef struct save             // property save structure
{
    struct save *pNext;         // next save
    CHP chp;
    PAP pap;
    SEP sep;
    DOP dop;
    RDS rds;
    RIS ris;
} SAVE;

// What types of properties are there?
typedef enum {ipropBold, ipropItalic, ipropUnderline, ipropLeftInd,
              ipropRightInd, ipropFirstInd, ipropCols, ipropPgnX,
              ipropPgnY, ipropXaPage, ipropYaPage, ipropXaLeft,
              ipropXaRight, ipropYaTop, ipropYaBottom, ipropPgnStart,
              ipropSbk, ipropPgnFormat, ipropFacingp, ipropLandscape,
              ipropJust, ipropPard, ipropPlain, ipropSectd,
              ipropMax } IPROP;

typedef enum {actnSpec, actnByte, actnWord} ACTN;
typedef enum {propChp, propPap, propSep, propDop} PROPTYPE;

typedef struct propmod
{
    ACTN actn;              // size of value
    PROPTYPE prop;          // structure containing value
    int  offset;            // offset of value from base of structure
} PROP;

typedef enum {ipfnBin, ipfnHex, ipfnSkipDest } IPFN;
typedef enum {idestPict, idestSkip } IDEST;
typedef enum {kwdChar, kwdDest, kwdProp, kwdSpec} KWD;

typedef struct symbol
{
    char *szKeyword;        // RTF keyword
    int  dflt;              // default value to use
    bool fPassDflt;         // true to use default value from this table
    KWD  kwd;               // base action to take
    int  idx;               // index into property table if kwd == kwdProp
                            // index into destination table if kwd == kwdDest
                            // character to print if kwd == kwdChar
} SYM;

Rtfreadr.c

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include "rtftype.h"
#include "rtfdecl.h"

int cGroup;
bool fSkipDestIfUnk;
long cbBin;
long lParam;

RDS rds;
RIS ris;

CHP chp;
PAP pap;
SEP sep;
DOP dop;

SAVE *psave;
FILE *fpIn;

//
// %%Function: main
//
// Main loop. Initialize and parse RTF.
//
main(int argc, char *argv[])
{
    FILE *fp;
    int ec;

    fp = fpIn = fopen("test.rtf", "r");
    if (!fp)
    {
        printf ("Can't open test file!\n");
        return 1;
    }
    if ((ec = ecRtfParse(fp)) != ecOK)
        printf("error %d parsing rtf\n", ec);
    else
        printf("Parsed RTF file OK\n");
    fclose(fp);
    return 0;
}

//
// %%Function: ecRtfParse
//
// Step 1:
// Isolate RTF keywords and send them to ecParseRtfKeyword;
// Push and pop state at the start and end of RTF groups;
// Send text to ecParseChar for further processing.
//

int
ecRtfParse(FILE *fp)
{
    int ch;
    int ec;
    int cNibble = 2;
    int b = 0;
    while ((ch = getc(fp)) != EOF)
    {
        if (cGroup < 0)
            return ecStackUnderflow;
        if (ris == risBin)                      // if we're parsing binary data, handle it directly
        {
            if ((ec = ecParseChar(ch)) != ecOK)
                return ec;
        }
        else
        {
            switch (ch)
            {
            case '{':
                if ((ec = ecPushRtfState()) != ecOK)
                    return ec;
                break;
            case '}':
                if ((ec = ecPopRtfState()) != ecOK)
                    return ec;
                break;
            case '\\':
                if ((ec = ecParseRtfKeyword(fp)) != ecOK)
                    return ec;
                break;
            case 0x0d:
            case 0x0a:          // cr and lf are noise characters...
                break;
            default:
                if (ris == risNorm)
                {
                    if ((ec = ecParseChar(ch)) != ecOK)
                        return ec;
                }
                else
                {               // parsing hex data
                    if (ris != risHex)
                        return ecAssertion;
                    b = b << 4;
                    if (isdigit(ch))
                        b += (char) ch - '0';
                    else
                    {
                        if (islower(ch))
                        {
                            if (ch < 'a' || ch > 'f')
                                return ecInvalidHex;
                            b += (char) ch - 'a';
                        }
                        else
                        {
                            if (ch < 'A' || ch > 'F')
                                return ecInvalidHex;
                            b += (char) ch - 'A';
                        }
                    }
                    cNibble--;
                    if (!cNibble)
                    {
                        if ((ec = ecParseChar(b)) != ecOK)
                            return ec;
                        cNibble = 2;
                        b = 0;
ris = risNorm;
                    }
                }                   // end else (ris != risNorm)
                break;
            }       // switch
        }           // else (ris != risBin)
    }               // while
    if (cGroup < 0)
        return ecStackUnderflow;
    if (cGroup > 0)
        return ecUnmatchedBrace;
    return ecOK;
}

//
// %%Function: ecPushRtfState
//
// Save relevant info on a linked list of SAVE structures.
//

int
ecPushRtfState(void)
{
    SAVE *psaveNew = malloc(sizeof(SAVE));
    if (!psaveNew)
        return ecStackOverflow;

    psaveNew -> pNext = psave;
    psaveNew -> chp = chp;
    psaveNew -> pap = pap;
    psaveNew -> sep = sep;
    psaveNew -> dop = dop;
    psaveNew -> rds = rds;
    psaveNew -> ris = ris;
    ris = risNorm;
    psave = psaveNew;
    cGroup++;
    return ecOK;
}

//
// %%Function: ecPopRtfState
//
// If we're ending a destination (that is, the destination is changing),
// call ecEndGroupAction.
// Always restore relevant info from the top of the SAVE list.
//

int
ecPopRtfState(void)
{
    SAVE *psaveOld;
    int ec;

    if (!psave)
        return ecStackUnderflow;

    if (rds != psave->rds)
    {
        if ((ec = ecEndGroupAction(rds)) != ecOK)
            return ec;
    }
    chp = psave->chp;
    pap = psave->pap;
    sep = psave->sep;
    dop = psave->dop;
    rds = psave->rds;
    ris = psave->ris;

    psaveOld = psave;
    psave = psave->pNext;
    cGroup--;
    free(psaveOld);
    return ecOK;
}

//
// %%Function: ecParseRtfKeyword
//
// Step 2:
// get a control word (and its associated value) and
// call ecTranslateKeyword to dispatch the control.
//

int
ecParseRtfKeyword(FILE *fp)
{
    int ch;
    char fParam = fFalse;
    char fNeg = fFalse;
    int param = 0;
    char *pch;
    char szKeyword[30];
    char szParameter[20];

    szKeyword[0] = '\0';
    szParameter[0] = '\0';
    if ((ch = getc(fp)) == EOF)
        return ecEndOfFile;
    if (!isalpha(ch))           // a control symbol; no delimiter.
    {
        szKeyword[0] = (char) ch;
        szKeyword[1] = '\0';
        return ecTranslateKeyword(szKeyword, 0, fParam);
    }
    for (pch = szKeyword; isalpha(ch); ch = getc(fp))
        *pch++ = (char) ch;
    *pch = '\0';
    if (ch == '-')
    {
        fNeg  = fTrue;
        if ((ch = getc(fp)) == EOF)
            return ecEndOfFile;
    }
    if (isdigit(ch))
    {
        fParam = fTrue;         // a digit after the control means we have a parameter
        for (pch = szParameter; isdigit(ch); ch = getc(fp))
            *pch++ = (char) ch;
        *pch = '\0';
        param = atoi(szParameter);
        if (fNeg)
            param = -param;
        lParam = atol(szParameter);
        if (fNeg)
            param = -param;
    }
    if (ch != ' ')
        ungetc(ch, fp);
    return ecTranslateKeyword(szKeyword, param, fParam);
}

//
// %%Function: ecParseChar
//
// Route the character to the appropriate destination stream.
//

int
ecParseChar(int ch)
{
    if (ris == risBin && --cbBin <= 0)
        ris = risNorm;
    switch (rds)
    {
    case rdsSkip:
        // Toss this character.
        return ecOK;
    case rdsNorm:
        // Output a character. Properties are valid at this point.
        return ecPrintChar(ch);
    default:
    // handle other destinations....
        return ecOK;
    }
}
//
// %%Function: ecPrintChar
//
// Send a character to the output file.
//

int
ecPrintChar(int ch)
{
    // unfortunately, we don't do a whole lot here as far as layout goes...
    putchar(ch);
    return ecOK;
}
RTFACTN.C
#include <stdio.h>
#include <string.h>
#include <stddef.h>
#include <ctype.h>
#include "rtftype.h"
#include "rtfdecl.h"

// RTF parser tables

// Property descriptions
PROP rgprop [ipropMax] = {
    actnByte,   propChp,    offsetof(CHP, fBold),       // ipropBold
    actnByte,   propChp,    offsetof(CHP, fItalic),     // ipropItalic
    actnByte,   propChp,    offsetof(CHP, fUnderline),  // ipropUnderline
    actnWord,   propPap,    offsetof(PAP, xaLeft),      // ipropLeftInd
    actnWord,   propPap,    offsetof(PAP, xaRight),     // ipropRightInd
    actnWord,   propPap,    offsetof(PAP, xaFirst),     // ipropFirstInd
    actnWord,   propSep,    offsetof(SEP, cCols),       // ipropCols
    actnWord,   propSep,    offsetof(SEP, xaPgn),       // ipropPgnX
    actnWord,   propSep,    offsetof(SEP, yaPgn),       // ipropPgnY
    actnWord,   propDop,    offsetof(DOP, xaPage),      // ipropXaPage
    actnWord,   propDop,    offsetof(DOP, yaPage),      // ipropYaPage
    actnWord,   propDop,    offsetof(DOP, xaLeft),      // ipropXaLeft
    actnWord,   propDop,    offsetof(DOP, xaRight),     // ipropXaRight
    actnWord,   propDop,    offsetof(DOP, yaTop),       // ipropYaTop
    actnWord,   propDop,    offsetof(DOP, yaBottom),    // ipropYaBottom
    actnWord,   propDop,    offsetof(DOP, pgnStart),    // ipropPgnStart
    actnByte,   propSep,    offsetof(SEP, sbk),         // ipropSbk
    actnByte,   propSep,    offsetof(SEP, pgnFormat),   // ipropPgnFormat
    actnByte,   propDop,    offsetof(DOP, fFacingp),    // ipropFacingp
    actnByte,   propDop,    offsetof(DOP, fLandscape),  // ipropLandscape
    actnByte,   propPap,    offsetof(PAP, just),        // ipropJust
    actnSpec,   propPap,    0,                          // ipropPard
    actnSpec,   propChp,    0,                          // ipropPlain
    actnSpec,   propSep,    0,                          // ipropSectd
};

// Keyword descriptions
SYM rgsymRtf[] = {
//  keyword     dflt    fPassDflt   kwd         idx
    "b",        1,      fFalse,     kwdProp,    ipropBold,
    "ul",       1,      fFalse,     kwdProp,    ipropUnderline,
    "i",        1,      fFalse,     kwdProp,    ipropItalic,
    "li",       0,      fFalse,     dProp,      ipropPgnFormat,
    "pgnucltr", pgULtr, fTrue,      kwdProp,    ipropPgnFormat,
    "pgnlcltr", pgLLtr, fTrue,      kwdProp,    ipropPgnFormat,
    "qc",       justC,  fTrue,      kwdProp,    ipropJust,
    "ql",       justL,  fTrue,      kwdProp,    ipropJust,
    "qr",       justR,  fTrue,      kwdProp,    ipropJust,
    "qj",       justF,  fTrue,      kwdProp,    ipropJust,
    "paperw",   12240,  fFalse,     kwdProp,    ipropXaPage,
    "paperh",   15480,  fFalse,     kwdProp,    ipropYaPage,
    "margl",    1800,   fFalse,     kwdProp,    ipropXaLeft,
    "margr",    1800,   fFalse,     kwdProp,    ipropXaRight,
    "margt",    1440,   fFalse,     kwdProp,    ipropYaTop,
    "margb",    1440,   fFalse,     kwdProp,    ipropYaBottom,
    "pgnstart", 1,      fTrue,      kwdProp,    ipropPgnStart,
    "facingp",  1,      fTrue,      kwdProp,    ipropFacingp,
    "landscape",1,      fTrue,      kwdProp,    ipropLandscape,
    "par",      0,      fFalse,     kwdChar,    0x0a,
    "\0x0a",    0,      fFalse,     kwdChar,    0x0a,
    "\0x0d",    0,      fFalse,     kwdChar,    0x0a,
    "tab",      0,      fFalse,     kwdChar,    0x09,
    "ldblquote",0,      fFalse,     kwdChar,    '"',
    "rdblquote",0,      fFalse,     kwdChar,    '"',
    "bin",      0,      fFalse,     kwdSpec,    ipfnBin,
    "*",        0,      fFalse,     kwdSpec,    ipfnSkipDest,
    "'",        0,      fFalse,     kwdSpec,    ipfnHex,
    "author",   0,      fFalse,     kwdDest,    idestSkip,
    "buptim",   0,      fFalse,     kwdDest,    idestSkip,
    "colortbl", 0,      fFalse,     kwdDest,    idestSkip,
    "comment",  0,      fFalse,     kwdDest,    idestSkip,
    "creatim",  0,      fFalse,     kwdDest,    idestSkip,
    "doccomm",  0,      fFalse,     kwdDest,    idestSkip,
    "fonttbl",  0,      fFalse,     kwdDest,    idestSkip,
    "footer",   0,      fFalse,     kwdDest,    idestSkip,
    "footerf",  0,      fFalse,     kwdDest,    idestSkip,
    "footerl",  0,      fFalse,     kwdDest,    idestSkip,
    "footerr",  0,      fFalse,     kwdDest,    idestSkip,
    "footnote", 0,      fFalse,     kwdDest,    idestSkip,
    "ftncn",    0,      fFalse,     kwdDest,    idestSkip,
    "ftnsep",   0,      fFalse,     kwdDest,    idestSkip,
    "ftnsepc",  0,      fFalse,     kwdDest,    idestSkip,
    "header",   0,      fFalse,     kwdDest,    idestSkip,
    "headerf",  0,      fFalse,     kwdDest,    idestSkip,
    "headerl",  0,      fFalse,     kwdDest,    idestSkip,
    "headerr",  0,      fFalse,     kwdDest,    idestSkip,
    "info",     0,      fFalse,     kwdDest,    idestSkip,
    "keywords", 0,      fFalse,     kwdDest,    idestSkip,
    "operator", 0,      fFalse,     kwdDest,    idestSkip,
    "pict",     0,      fFalse,     kwdDest,    idestSkip,
    "printim",  0,      fFalse,     kwdDest,    idestSkip,
    "private1", 0,      fFalse,     kwdDest,    idestSkip,
    "revtim",   0,      fFalse,     kwdDest,    idestSkip,
    "rxe",      0,      fFalse,     kwdDest,    idestSkip,
    "stylesheet",   0,      fFalse,     kwdDest,    idestSkip,
    "subject",  0,      fFalse,     kwdDest,    idestSkip,
    "tc",       0,      fFalse,     kwdDest,    idestSkip,
    "title",    0,      fFalse,     kwdDest,    idestSkip,
    "txe",      0,      fFalse,     kwdDest,    idestSkip,
    "xe",       0,      fFalse,     kwdDest,    idestSkip,
    "{",        0,      fFalse,     kwdChar,    '{',
    "}",        0,      fFalse,     kwdChar,    '}',
    "\\",       0,      fFalse,     kwdChar,    '\\'
    };
int isymMax = sizeof(rgsymRtf) / sizeof(SYM);

//
// %%Function: ecApplyPropChange
//
// Set the property identified by _iprop_ to the value _val_.
//
//

int
ecApplyPropChange(IPROP iprop, int val)
{
    char *pb;

    if (rds == rdsSkip)                 // If we're skipping text,
        return ecOK;                    // don't do anything.

    switch (rgprop[iprop].prop)
    {
    case propDop:
        pb = (char *)&dop;
        break;
    case propSep:
        pb = (char *)&sep;
        break;
    case propPap:
        pb = (char *)&pap;
        break;
    case propChp:
        pb = (char *)&chp;
        break;
    default:
        if (rgprop[iprop].actn != actnSpec)
            return ecBadTable;
        break;
    }
    switch (rgprop[iprop].actn)
    {
    case actnByte:
        pb[rgprop[iprop].offset] = (unsigned char) val;
        break;
    case actnWord:
        (*(int *) (pb+rgprop[iprop].offset)) = val;
        break;
    case actnSpec:
        return ecParseSpecialProperty(iprop, val);
        break;
    default:
        return ecBadTable;
    }
    return ecOK;
}

//
// %%Function: ecParseSpecialProperty
//
// Set a property that requires code to evaluate.
//

int
ecParseSpecialProperty(IPROP iprop, int val)
{
    switch (iprop)
    {
    case ipropPard:
        memset(&pap, 0, sizeof(pap));
        return ecOK;
    case ipropPlain:
        memset(&chp, 0, sizeof(chp));
        return ecOK;
    case ipropSectd:
        memset(&sep, 0, sizeof(sep));
        return ecOK;
    default:
        return ecBadTable;
    }
    return ecBadTable;
}

//
// %%Function: ecTranslateKeyword.
//
// Step 3.
// Search rgsymRtf for szKeyword and evaluate it appropriately.
//
// Inputs:
// szKeyword:   The RTF control to evaluate.
// param:       The parameter of the RTF control.
// fParam:      fTrue if the control had a parameter; (that is, if param is valid)
//              fFalse if it did not.
//

int
ecTranslateKeyword(char *szKeyword, int param, bool fParam)
{
    int isym;

    // search for szKeyword in rgsymRtf

    for (isym = 0; isym < isymMax; isym++)
        if (strcmp(szKeyword, rgsymRtf[isym].szKeyword) == 0)
            break;
    if (isym == isymMax)            // control word not found
    {
        if (fSkipDestIfUnk)         // if this is a new destination
            rds = rdsSkip;          // skip the destination
                                    // else just discard it
        fSkipDestIfUnk = fFalse;
        return ecOK;
    }

    // found it!  use kwd and idx to determine what to do with it.

    fSkipDestIfUnk = fFalse;
    switch (rgsymRtf[isym].kwd)
    {
    case kwdProp:
        if (rgsymRtf[isym].fPassDflt || !fParam)
            param = rgsymRtf[isym].dflt;
        return ecApplyPropChange(rgsymRtf[isym].idx, param);
    case kwdChar:
        return ecParseChar(rgsymRtf[isym].idx);
    case kwdDest:
        return ecChangeDest(rgsymRtf[isym].idx);
    case kwdSpec:
        return ecParseSpecialKeyword(rgsymRtf[isym].idx);
    default:
        return ecBadTable;
    }
    return ecBadTable;
}

//
// %%Function: ecChangeDest
//
// Change to the destination specified by idest.
// There's usually more to do here than this...
//

int
ecChangeDest(IDEST idest)
{
    if (rds == rdsSkip)             // if we're skipping text,
        return ecOK;                // don't do anything

    switch (idest)
    {
    default:
        rds = rdsSkip;              // when in doubt, skip it...
        break;
    }
    return ecOK;
}

//
// %%Function: ecEndGroupAction
//
// The destination specified by rds is coming to a close.
// If there's any cleanup that needs to be done, do it now.
//

int
ecEndGroupAction(RDS rds)
{
    return ecOK;
}

//
// %%Function: ecParseSpecialKeyword
//
// Evaluate an RTF control that needs special processing.
//

int
ecParseSpecialKeyword(IPFN ipfn)
{
    if (rds == rdsSkip && ipfn != ipfnBin)  // if we're skipping, and it's not
        return ecOK;                                // the \bin keyword, ignore it.
    switch (ipfn)
    {
    case ipfnBin:
        ris = risBin;
        cbBin = lParam;
        break;
    case ipfnSkipDest:
        fSkipDestIfUnk = fTrue;
        break;
    case ipfnHex:
 ris = risHex;
 break;
    default:
        return ecBadTable;
    }
    return ecOK;
}

Makefile

rtfreadr.exe: rtfactn.obj rtfreadr.obj
    link rtfreadr.obj rtfactn.obj <nul

rtfactn.obj: rtfactn.c rtfdecl.h rtftype.h

rtfreadr.obj: rtfreadr.c rtfdecl.h rtftype.h

latex2rtf | RTF Index | Next Section

Appendix A. How to Write an RTF Reader

An RTF reader must do three basic things:

A Sample RTF Reader Implementation

Rtfdecl.h and Rtfreadr.c

Rtftype.h

Rtfactn.c

Notes on Implementing Other RTF Features

Tabs and Other Control Sequences Terminating in a Fixed Control

Borders and Other Control Sequences Beginning with a Fixed Control

Other Problem Areas in RTF

Style Sheets

Property Changes

Fields

Tables

Rtfdecl.h

Rtftype.h

Rtfreadr.c

Makefile