From e02s96@zorro.ruca.ua.ac.be Wed Apr 16 12:10:07 1997
Mail-from: From e02s96@zorro.ruca.ua.ac.be Wed Apr 16 12:10:07 1997
Date: Wed, 16 Apr 1997 12:43:44 +0200
From: Stefaan Ponnet <e02s96@zorro.ruca.ua.ac.be>
X-Mailer: Mozilla 3.0 (Win95; I)
MIME-Version: 1.0
To: Denis Howe <dbh@doc.ic.ac.uk>
Subject: Re: IFF file format, part 1
References: <1693.9704141705@wombat.doc.ic.ac.uk>
Content-Type: multipart/mixed; boundary="------------7130410F42AE"

This is a multi-part message in MIME format.

--------------7130410F42AE
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Here you go.

	Stefaan.

------------------ IFF DOC #1 -------------------------

<IFF Format>    *.IFF
 
4 bytes         "FORM" (FORM chunk ID)
1 long          length of file that follows
4 bytes         "ILBM" (InterLeaved BitMap file ID)

4 bytes         "BMHD" (BitMap HeaDer chunk ID)
1 long          length of chunk [20]
20 bytes        1 word = image width in pixels
                1 word = image height in lines
                1 word = image x-offset [usually 0]
                1 word = image y-offset [usually 0]
                1 byte = # bitplanes
                1 byte = mask (0=no, 1=impl., 2=transparent, 3=lasso)
                1 byte = compressed [1] or uncompressed [0]
                1 byte = unused [0]
                1 word = transparent color (for mask=2)
                1 byte = x-aspect [5=640x200, 10=320x200/640x400,
20=320x400]
                1 byte = y-aspect [11]
                1 word = page width (usually the same as image width)
                1 word = page height (usually the same as image height)

4 bytes         "CMAP" (ColorMAP chunk ID)
1 long          length of chunk [3*n where n is the # colors]
3n bytes        3 bytes per RGB color.  Each color value is a byte
                and the actual color value is left-justified in the
                byte such that the most significant bit of the value
                is the MSB of the byte.  (ie. a color value of 15 ($0F)
                is stored as $F0)  The bytes are stored in R,G,B order.

4 bytes         "CRNG" (Color RaNGe chunk ID)
1 long          length of chunk [8]
8 bytes         1 word = reserved [0]
                1 word = animation speed (16384 = 60 steps per second)
                1 word = active [1] or inactive [0]
                1 byte = left/lower color animation limit
                1 byte = right/upper color animation limit

4 bytes         "CAMG" (Commodore AMiGa viewport mode chunk ID)
1 long          length of chunk [4]
1 long          viewport mode bits (bit 11 = HAM, bit 3 = interlaced)

4 bytes         "BODY" (BODY chunk ID)
1 long          length of chunk [# bytes of image data that follow]
? bytes         actual image data
 
NOTES: Some of these chunks may not be present in every IFF file, and
may
not be in this order.  You should always look for the ID bytes to find a
certain chunk.  All chunk IDs are followed by a long value that tells
the
size of the chunk.  This is the number of bytes that FOLLOW the 4 ID
bytes
and size longword.  The exception to this is the FORM chunk.  The size
longword that follows the FORM ID is the size of the remainder of the
file.
The FORM chunk must always be the first chunk in an IFF file.

The R,G,B ranges of AMIGA and ST are different (AMIGA 0...15, ST 0...7),
as is the maximum number of bitplanes (AMIGA: 5, ST: 4).

Format of body data
 
An expanded picture is simply a bitmap.  The packing method is PackBits
(see below), and is identical to MacPaint and DEGAS Elite compressed.
 
The (decompressed) body data appears in the following order:
 
        line 1 plane 0 ... line 1 plane 1 ... ... line 1 plane m
        [line 1 mask (if appropriate)]
        line 2 plane 0 ... line 2 plane 1 ... ... line 2 plane m
        [line 2 mask (if appropriate)]
        ...
        line x plane 0 ... line x plane 1 ... ... line x plane m
        [line x mask (if appropriate)]
 
The FORM chunk identifies the type of data:
 
        "ILBM" = interleaved bit map
        "8SVX" = 8-bit sample voice
        "SMUS" = simple music score
        "FTXT" = formatted text (Amiga)

------------------ IFF DOC #2 -------------------------

About Interchange File Format

Electronic Arts is a company that deserves credit for helping make life
easier for both programmers and end users. By establishing Interchange
Format Files (ie, IFF) and releasing the documentation for such, as well
as
C source code for reading and writing IFF type of files, Electronic Arts
has helped make it easier for programmers to develop "backward
compatible"
and "extensible" file formats. IFF also helps developers write programs
that easily read data files created with each others' IFF compliant
software, even if there is no business relationship between the
developers.
In a nutshell, IFF helps minimize problems such as new versions of a
particular program having trouble reading data files produced by older
versions, or needing a new file format everytime a new version needs to
store additional information. It also encourages standardized file
formats
that aren't tied to a particular product. All of this is good for
endusers
because it means that their valuable data isn't locked into some
proprietary standard that can't be used with a wide variety of hardware
and
software. Above all else, endusers don't want their work to be held
hostage
by a single, corporate entity over whom the enduser has no direct
control,
but that's exactly what happens whenever an enduser saves his data using
a
program that produces a proprietary, unpublished file format. IFF helps
to
break this needlessly proprietary stranglehold that developers have
exerted
upon endusers' works.

An IFF file is a set of data that is in a form that many, unrelated
programs can read. An IFF file should not have anything in it that was
intended specifically for just one, particular program. If a program
must
save some "personal" (ie, proprietary) data in an IFF file, it must be
saved in a manner which allows another program to "skip over" this data.
There are several different types of IFF files. ILBM and GIFF files
store
picture data. SMUS files store musical scores. WAVE and AIFF files store
sampled sounds. Each of these files must start with an ID which
indicates
that it is indeed an IFF file, followed by an ID that indicates which
type
of file. So what is an ID? An ID is four, printable ascii characters
(ie,
8-bit bytes). If you use a file viewer (capable of displaying each byte
as
an ascii character) to look at an IFF file, you will notice that every
so
often you will see 4 "readable" characters in a row. These 4 characters
are
an ID. Every IFF file must start with one of the following 3 IDs. (I've
enclosed each ID in single quotes).

'FORM'  'LIST'  'CAT '

If the first 4 chars (bytes) in a file are not one of these, then it is
not
an IFF file. These IDs are referred to as group IDs in EA literature
because each is like a "master ID" after which there may follow more IDs
(ie, chunks) that are grouped under that master ID.

Note that the last character in the 'CAT ' ID is a blank space (ie,
ascii
32).

After this group ID, there is an UNSIGNED LONG (ie, 32-bit binary value)
that indicates how many bytes are in the entire file. This count does
not
include the 4 byte group ID, nor this ULONG. This ULONG is useful if you
wish to load the rest of the file into memory to examine it. After this
ULONG, there is an ID that indicates which type of IFF file this is. As
mentioned earlier, "ILBM", "WAVE", and "AIFF" are 3 types of IFF files.
There are many more, and programmers are always inventing new types for
lack of better things to do. Here is the beginning of a typical ILBM
file.

  'FORM' <- OK. This really is an IFF file because it has one
            of the 3 defined group IDs.
  13000  <- There are 13000 more bytes after this ULONG
  'ILBM' <- It is an ILBM (picture) file

All IFF files start with something similiar to the above, 12 byte
"header",
except that instead of 'FORM', the group ID can be 'LIST' or 'CAT '. Of
course, the ULONG size and file type ID may be different in various
files,
but nevertheless, a 12 byte header always appears at the beginning of an
IFF file. For example, here's an example AIFF header:

  'FORM' <- OK. This really is an IFF file because it has one
            of the 3 defined group IDs.
  4000   <- There are 4000 more bytes after this ULONG
  'AIFF' <- It is an AIFF (digital audio) file

What you find after the header depends on which type it is (ie, From
here
on, an ILBM will be different than an AIFF).

One thing that all IFF files do have in common after the group ID, byte
count, and type ID, is that data is organized into chunks. OK, more
jargon.
What's a chunk? A chunk consists of an ID, a ULONG that tells how many
bytes of data are in the chunk, and then all those data bytes. For
example,
here is a CMAP chunk (which would be found in an ILBM file).

  'CMAP'      <- This is the 4 byte chunk ID
  6           <- This tells how many data bytes are in the chunk
chunkSize).
  0,0,0,1,1,4 <- Here are the 6 data bytes

Notice that the chunk size doesn't include the 4 byte ID or the ULONG
for
the chunk Size.

So, all IFF files are made up of several chunks (ie, groups of data).
Each
group of data starts with a convenient ID (so that a program can
ascertain
what kind of data is in the chunk) and a ULONG size (so that a program
can
ascertain how many bytes of data are in the chunk). There are a few
other
details to note. A chunk cannot have an odd number of data bytes (such
as
3). If necessary, an extra zero byte must be written to make an even
number
of data bytes. The chunk Size doesn't include this extra byte. So for
example, if you want to write 3 bytes in a CMAP chunk, it would look
like
this:

  'CMAP'
  3          <- Note that chunk Size is 3
  0,1,33,0   <- Note that there is an extra zero byte

The reason for this extra "pad byte" for odd-sized chunks has to do with
Motorola's 68000 CPU requiring that LONGs be aligned to even memory
addresses. IFF files were first used on 68000 based computers, and
padding
out odd-sized chunks made it easier to load and parse an IFF file on
such a
computer (ie, if you load the entire file into a single block of RAM
starting upon an even address, all of the chunk IDs and Sizes will
conveniently fall upon even memory addresses).

In the preceding example, the group ID was 'FORM'. There are 2 other
group
IDs as well. A 'CAT ' is a collection of many different FORMs all stuck
together consecutively in 1 IFF file. For example, if you had an
animation
with 6 sound effects, you might save the animation frames in an ANIM
FORM,
and you might save the sound effects in several AIFF FORMs (one per
sound
effect). You could save the animation and sound in 7 separate files. The
ANIM file would start this way:

  FORM
  120000  <- Whatever the size happens to be (this is expressed in 32
bits)
  ANIM

Each AIFF file would start this way:

  FORM
  8000    <- whatever size
  AIFF

If the user wanted to copy the data to another disk, he would have to
copy
7 files. On the other hand, you could save all the data in one CAT file.

  CAT
  4+120008+8008+2028+...  <- The total size of the ANIM and the 6 AIFF
files
  '      '               <- Type of CAT. 4 spaces for the type ID means
"a
                       grab bag" of IFF FORMs are going to be inside of
                       this CAT. If it just so happened that all of the
                       enclosed FORMs were 1 type, such as ILBM, then
                       this type ID would be 'ILBM'.
  FORM
  120000
  ANIM
  ...all the chunks in the ANIM file placed here (note: ANIMs have
imbedded
     ILBM FORMs. The guy who devised the ANIM type of IFF file broke the
     rules by mistake, and nobody caught his error until it was too
late).

  FORM
  8000
  AIFF
  ...all the chunks in the first sound effect here

  FORM
  2020
  AIFF
  ...all the chunks in the second sound effect here

  ...etc. for the other 4 sound effects

To further complicate matters, there are LISTs. LISTs are a lot like
CATs
except that there is an optional, additional group ID associated with
LISTs. That ID is a PROP. LISTs can have imbedded PROPS just like an
ILBM
can have an imbedded CMAP chunk. A PROP header looks very much like a
FORM
header in that you must follow it with a type ID. For example, here is
an
ILBM PROP with a CMAP in it.

  PROP       <- Here's a PROP
  4+14       <- Here's how many bytes follow in the PROP
  ILBM       <- It's an ILBM PROP
  CMAP       <- Here's a CMAP chunk inside of this ILBM PROP
  6            <- There are 6 bytes following in this CMAP chunk
  0,0,0,1,1,4

LISTs are meant to encompass similiar FORMs (i.e. several AIFF files
stuck
together). Often, when you have similiar FORMs stuck together, some of
the
chunks in the individual FORMs are the same. For example, assume that we
have 2 AIFF sound effects. AIFF FORMs can have a NAME chunk which
contains
the ascii string that is the name of the sound effect. Also assume that
both sounds are called "car crash". With a CAT, we end up having to
identical NAME chunks in each AIFF FORM like so:

  CAT           <- We put the 2 files into 1 CAT
  4+1008+508
  AIFF          <- It's an CAT of several AIFF FORMs

  FORM          <- here's the start of the first sound effect file
  1000
  AIFF

  ...some chunks

  NAME          <- here's the name chunk for the 1st sound effect
  9
  'car crash',0

  ...more chunks

  FORM          <- here's the start of the second sound effect file
  500
  AIFF

  ...some chunks

  NAME          <- here's the name chunk for the 2nd sound effect.
  9                Look familiar?
  'car crash',0

  ...more chunks

With a LIST, we can have PROPs. A PROP is group ID that allows us to
place
chunks that pertain to all the FORMs in the LIST. So, we can rip out the
NAME chunks inside both AIFF FORMs and replace it with one NAME chunk
inside of a PROP.

  LIST         <- Notice that we use a LIST instead of a CAT
  4+30+990+490+...
  AIFF

  PROP         <- Here's where we put chunks intended for ALL the
  22              subsequent FORMS; inside a PROP.
  AIFF         <- type of PROP
  NAME         <- here's the name chunk inside of the PROP
  9
  'car crash',0

  FORM         <- here's the start of the first sound effect file
  982          <- size is 18 bytes less because no NAME chunk here
  AIFF

  ...some chunks, but no NAME chunk

  FORM         <- here's the start of the second sound effect file
  482
  AIFF

  ...some chunks, but no NAME for this guy either

Notice that the PROP group ID is followed by a type ID (in this case
AIFF).
This means that the PROP only affects any AIFF FORMs. If you were to
sneak
in an SMUS FORM at the end, the NAME chunk would not apply to it. Also,
if
you included a NAME chunk in one of the AIFF FORMs, it would override
the
PROP. For example, assume that you have a LIST containing 10 AIFF FORMs.
All but 1 of them is named "Snare Hit". You can store a NAME chunk in a
PROP AIFF for "Snare Hit". Then, in the one AIFF FORM whose name is not
"Snare Hit", you can include a NAME chunk to override the NAME chunk in
the
PROP.

It should be noted that you can take several LISTs and squash them
together
inside of a CAT or another LIST. Personally, I have never seen a data
file
with this level of nesting, and doubt that it would be of much use.

In the above examples, psuedo code was used to represent the headers.
Let's
look at how a hex file viewer might display the actual contents of an
IFF
file (in hex bytes). First, an IFF header for a FORM AIFF, psuedo code.

  FORM
  4096
  AIFF

Now here's a view of the actual data file.

  46 4F 52 4D      <- FORM
  00 00 10 00      <- hex 00001000, or 4096 decimal
  41 49 46 46      <- AIFF

Note that the ULONG byte count is stored in Big Endian order (ie, the
Most
Significant Byte is first, and the Least Significant Byte is last). This
is
how the Motorola 680x0 stores long values in memory (ie, the opposite
order
of Intel 80x86). IFF files use Big Endian order for all 16-bit (ie,
SHORT)
and 32-bit (ie, LONG) values.

Microsoft decided that IFF was a good idea, but since Windows is
traditionally tethered to Intel CPUs, a version of IFF was needed which
stored LONG or SHORT values in Little Endian order. So, MS decided to
create some new group IDs. MS took the FORM ID and created a Little
Endian
version of it known as RIFF. For example, the WAVE file format has a
RIFF
group ID. All of the SHORT and LONG values in the file are stored in
Little
Endian order. Let's take a look at an example header for a WAVE file.
Assume that there are 258 bytes of data after the byte count.

  52 49 46 46        <- RIFF
  02 01 00 00        <- hex 00000102, or 258 decimal
  57 41 56 45        <- WAVE

Note that the ULONG byte count is stored in Little Endian order (ie, the
Least Significant Byte is first, and the Most Significant Byte is last).
Good old backwards-thinking Intel.

Now, there's some real justification for creating a RIFF group ID, if
you're working with an Intel CPU. But Microsoft couldn't stop there.
True
to their "not made here, so if we're going to accept it, we have to
inflict
our brutish, unneeded brand upon it" mentality, Microsoft created
another
group ID called RIFX. What's an RIFX file? It's simply a FORM with RIFX
replacing the FORM ID. So, if you want to turn a FORM AIFF into a RIFX
AIFF, you just change the first 4 bytes to RIFX. Needless to say, nobody
has ever used the RIFX group ID, and it will undoubtably suffer a
justifiably ignoble disappearance.

Just like everyone else, programmers make mistakes. As mentioned before,
the Amiga's ANIM file format was a mistake. It puts FORM headers inside
of
a FORM group ID. That's not supposed to happen. You can put FORM headers
inside of a CAT or LIST, but not another FORM. A mistake was also made
with
the MIDI file format. The programmer who devised it didn't put a proper
IFF
header on the file. It should be:

  FORM   <- group ID. Indicates an IFF file that contains one type of
data
  3000   <- whatever size the file happens to be
  MIDI   <- type of data. What follows will be chunks as defined by the
            MIDI type of IFF file.

But the programmer omitted the FORM group ID, and simply put the MThd
chunk
first. So, a MIDI file starts as so:

  MThd   <- Chunk ID
  6      <- size of MThd chunk

Another deviation from the standard occurs with padding out odd-sized
chunks with an extra byte. Some programmers didn't bother doing this
when
devising new IFF type files, and occasionally, one will come across some
specification for a new IFF type that allows odd-sized chunks.

Unfortunately, these programmers released their work based upon these
aberrations before getting that work reviewed by other programmers who
might have offered good reasons why the aberrations should be corrected.
It
makes it that much harder for software to read and write files if it has
to
deal with aberrations of the IFF standard. There's no reason for that,
particularly when a strict adherence to the standard sacrifices almost
nothing in the way of quality and efficiency over an aberration. But try
to
tell that to a paranoid programmer who thinks that if he shows anyone
what
he's doing before his product is shrink-wrapped, someone will steal his
soul... well, IFF does give the computer industry a means for resolving
needless hassles with data file formats, and it has worked very
successfully in a number of instances, although occasionally people
don't
always use the standard wisely, or don't quite grasp EA's altruistic
notion
that there is no good reason why a file format should ever be
proprietary
or unpublished. (I urge consumers to avoid products where that is the
case).

------------------ IFF DOC #3 -------------------------

--------------7130410F42AE
Content-Type: text/plain; charset=us-ascii; name="iffdoc2.txt"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline; filename="iffdoc2.txt"



"EA IFF 85" Standard for Interchange Format Files

Document Date:		January 14, 1985
From:			Jerry Morrison, Electronic Arts
Status of Standard:	Released and in use

1. Introduction

Standards are Good for Software Developers

As home computer hardware evolves to better and better media machines, 
the demand increases for higher quality, more detailed data. Data 
development gets more expensive, requires more expertise and better 
tools, and has to be shared across projects. Think about several ports 
of a product on one CD-ROM with 500M Bytes of common data!

Development tools need standard interchange file formats. Imagine 
scanning in images of "player" shapes, moving them to a paint program 
for editing, then incorporating them into a game. Or writing a theme 
song with a Macintosh score editor and incorporating it into an Amiga 
game. The data must at times be transformed, clipped, filled out, 
and moved across machine kinds. Media projects will depend on data 
transfer from graphic, music, sound effect, animation, and script 
tools.

Standards are Good for Software Users

Customers should be able to move their own data between independently 
developed software products. And they should be able to buy data libraries 
usable across many such products. The types of data objects to exchange 
are open-ended and include plain and formatted text, raster and structured 
graphics, fonts, music, sound effects, musical instrument descriptions, 
and animation.

The problem with expedient file formats typically memory dumps is 
that they're too provincial. By designing data for one particular 
use (e.g. a screen snapshot), they preclude future expansion (would 
you like a full page picture? a multi-page document?). In neglecting 
the possibility that other programs might read their data, they fail 
to save contextual information (how many bit planes? what resolution?). 
Ignoring that other programs might create such files, they're intolerant 
of extra data (texture palette for a picture editor), missing data 
(no color map), or minor variations (smaller image). In practice, 
a filed representation should rarely mirror an in-memory representation. 
The former should be designed for longevity; the latter to optimize 
the manipulations of a particular program. The same filed data will 
be read into different memory formats by different programs.

The IFF philosophy: "A little behind-the-scenes conversion when programs 
read and write files is far better than NxM explicit conversion utilities 
for highly specialized formats."

So we need some standardization for data interchange among development 
tools and products. The more developers that adopt a standard, the 
better for all of us and our customers.

Here is "EA IFF 1985"

Here is our offering: Electronic Arts' IFF standard for Interchange 
File Format. The full name is "EA IFF 1985". Alternatives and justifications 
are included for certain choices. Public domain subroutine packages 
and utility programs are available to make it easy to write and use 
IFF-compatible programs.

Part 1 introduces the standard. Part 2 presents its requirements and 
background. Parts 3, 4, and 5 define the primitive data types, FORMs, 
and LISTs, respectively, and how to define new high level types. Part 
6 specifies the top level file structure. Appendix A is included for 
quick reference and Appendix B names the committee responsible for 
this standard.

References

American National Standard Additional Control Codes for Use with ASCII, 
ANSI standard 3.64-1979 for an 8-bit character set. See also ISO standard 
2022 and ISO/DIS standard 6429.2.

Amiga[tm] is a trademark of Commodore-Amiga, Inc.

C, A Reference Manual, Samuel P. Harbison and Guy L. Steele Jr., Tartan 
Laboratories. Prentice-Hall, Englewood Cliffs, NJ, 1984.

Compiler Construction, An Advanced Course, edited by F. L. Bauer and 
J. Eickel (Springer-Verlag, 1976). This book is one of many sources 
for information on recursive descent parsing.

DIF Technical Specification (c)1981 by Software Arts, Inc. DIF[tm] is 
the format for spreadsheet data interchange developed by Software 
Arts, Inc.
DIF[tm] is a trademark of Software Arts, Inc.

Electronic Arts[tm] is a trademark of Electronic Arts.

"FTXT" IFF Formatted Text, from Electronic Arts. IFF supplement document 
for a text format.

Inside Macintosh (c) 1982, 1983, 1984, 1985 Apple Computer, Inc., a 
programmer's reference manual.
Apple(R) is a trademark of Apple Computer, Inc.
Macintosh[tm] is a trademark licensed to Apple Computer, Inc.

"ILBM" IFF Interleaved Bitmap, from Electronic Arts. IFF supplement 
document for a raster image format.

M68000 16/32-Bit Microprocessor Programmer's Reference Manual(c) 1984, 
1982, 1980, 1979 by Motorola, Inc.

PostScript Language Manual (c) 1984 Adobe Systems Incorporated.
PostScript[tm] is a trademark of Adobe Systems, Inc.
Times and Helvetica(R) are trademarks of Allied Corporation.

InterScript: A Proposal for a Standard for the Interchange of Editable 
Documents (c)1984 Xerox Corporation.
Introduction to InterScript (c) 1985 Xerox Corporation.



2. Background for Designers

Part 2 is about the background, requirements, and goals for the standard. 
It's geared for people who want to design new types of IFF objects. 
People just interested in using the standard may wish to skip this 
part.

What Do We Need?

A standard should be long on prescription and short on overhead. It 
should give lots of rules for designing programs and data files for 
synergy. But neither the programs nor the files should cost too much 
more than the expedient variety. While we're looking to a future with 
CD-ROMs and perpendicular recording, the standard must work well on 
floppy disks.

For program portability, simplicity, and efficiency, formats should 
be designed with more than one implementation style in mind. (In practice, 
pure stream I/O is adequate although random access makes it easier 
to write files.) It ought to be possible to read one of many objects 
in a file without scanning all the preceding data. Some programs need 
to read and play out their data in real time, so we need good compromises 
between generality and efficiency.

As much as we need standards, they can't hold up product schedules. 
So we also need a kind of decentralized extensibility where any software 
developer can define and refine new object types without some "standards 
authority" in the loop. Developers must be able to extend existing 
formats in a forward- and backward-compatible way. A central repository 
for design information and example programs can help us take full 
advantage of the standard.

For convenience, data formats should heed the restrictions of various 
processors and environments. E.g. word-alignment greatly helps 68000 
access at insignificant cost to 8088 programs.

Other goals include the ability to share common elements over a list 
of objects and the ability to construct composite objects containing 
other data objects with structural information like directories.

And finally, "Simple things should be simple and complex things should 
be possible."	Alan Kay.

Think Ahead

Let's think ahead and build programs that read and write files for 
each other and for programs yet to be designed. Build data formats 
to last for future computers so long as the overhead is acceptable. 
This extends the usefulness and life of today's programs and data.

To maximize interconnectivity, the standard file structure and the 
specific object formats must all be general and extensible. Think 
ahead when designing an object. It should serve many purposes and 
allow many programs to store and read back all the information they 
need; even squeeze in custom data. Then a programmer can store the 
available data and is encouraged to include fixed contextual details. 
Recipient programs can read the needed parts, skip unrecognized stuff, 
default missing data, and use the stored context to help transform 
the data as needed.

Scope

IFF addresses these needs by defining a standard file structure, some 
initial data object types, ways to define new types, and rules for 
accessing these files. We can accomplish a great deal by writing programs 
according to this standard, but don't expect direct compatibility 
with existing software. We'll need conversion programs to bridge the 
gap from the old world.

IFF is geared for computers that readily process information in 8-bit 
bytes. It assumes a "physical layer" of data storage and transmission 
that reliably maintains "files" as strings of 8-bit bytes. The standard 
treats a "file" as a container of data bytes and is independent of 
how to find a file and whether it has a byte count.

This standard does not by itself implement a clipboard for cutting 
and pasting data between programs. A clipboard needs software to mediate 
access, to maintain a "contents version number" so programs can detect 
updates, and to manage the data in "virtual memory".

Data Abstraction

The basic problem is how to represent information  in a way that's 
program-independent, compiler- independent, machine-independent, and 
device-independent.

The computer science approach is "data abstraction", also known as 
"objects", "actors", and "abstract data types". A data abstraction 
has a "concrete representation" (its storage format), an "abstract 
representation" (its capabilities and uses), and access procedures 
that isolate all the calling software from the concrete representation. 
Only the access procedures touch the data storage. Hiding mutable 
details behind an interface is called "information hiding". What data 
abstraction does is abstract from details of implementing the object, 
namely the selected storage representation and algorithms for manipulating 
it.

The power of this approach is modularity. By adjusting the access 
procedures we can extend and restructure the data without impacting 
the interface or its callers. Conversely, we can extend and restructure 
the interface and callers without making existing data obsolete. It's 
great for interchange!

But we seem to need the opposite: fixed file formats for all programs 
to access. Actually, we could file data abstractions ("filed objects") 
by storing the data and access procedures together. We'd have to encode 
the access procedures in a standard machine-independent programming 
language   la PostScript. Even still, the interface can't evolve freely 
since we can't update all copies of the access procedures. So we'll 
have to design our abstract representations for limited evolution 
and occasional revolution (conversion).

In any case, today's microcomputers can't practically store data abstractions. 
They can do the next best thing: store arbitrary types of data in 
"data chunks", each with a type identifier and a length count. The 
type identifier is a reference by name to the access procedures (any 
local implementation). The length count enables storage-level object 
operations like "copy" and "skip to next" independent of object type.

Chunk writing is straightforward. Chunk reading requires a trivial 
parser to scan each chunk and dispatch to the proper access/conversion 
procedure. Reading chunks nested inside other chunks requires recursion, 
but no lookahead or backup.

That's the main idea of IFF. There are, of course, a few other detailsI

Previous Work

Where our needs are similar, we borrow from existing standards.

Our basic need to move data between independently developed programs 
is similar to that addressed by the Apple Macintosh desk scrap or 
"clipboard" [Inside Macintosh chapter "Scrap Manager"]. The Scrap 
Manager works closely with the Resource Manager, a handy filer and 
swapper for data objects (text strings, dialog window templates, pictures, 
fontsI) including types yet to be designed [Inside Macintosh chapter 
"Resource Manager"]. The Resource Manager is a kin to Smalltalk's 
object swapper.

We will probably write a Macintosh desk accessory that converts IFF 
files to and from the Macintosh clipboard for quick and easy interchange 
with programs like MacPaint and Resource Mover.

Macintosh uses a simple and elegant scheme of 4-character "identifiers" 
to identify resource types, clipboard format types, file types, and 
file creator programs. Alternatives are unique ID numbers assigned 
by a central authority or by hierarchical authorities, unique ID numbers 
generated by algorithm, other fixed length character strings, and 
variable length strings. Character string identifiers double as readable 
signposts in data files and programs. The choice of 4 characters is 
a good tradeoff between storage space, fetch/compare/store time, and 
name space size. We'll honor Apple's designers by adopting this scheme.

"PICT" is a good example of a standard structured graphics format 
(including raster images) and its many uses [Inside Macintosh chapter 
"QuickDraw"]. Macintosh provides QuickDraw routines in ROM to create, 
manipulate, and display PICTs. Any application can create a PICT by 
simply asking QuickDraw to record a sequence of drawing commands. 
Since it's just as easy to ask QuickDraw to render a PICT to a screen 
or a printer, it's very effective to pass them between programs, say 
from an illustrator to a word processor. An important feature is the 
ability to store "comments" in a PICT which QuickDraw will ignore. 
Actually, it passes them to your optional custom "comment handler".

PostScript, Adobe's print file standard, is a more general way to 
represent any print image (which is a specification for putting marks 
on paper) [PostScript Language Manual]. In fact, PostScript is a full-fledged 
programming language. To interpret a PostScript program is to render 
a document on a raster output device. The language is defined in layers: 
a lexical layer of identifiers, constants, and operators; a layer 
of reverse polish semantics including scope rules and a way to define 
new subroutines; and a printing-specific layer of built-in identifiers 
and operators for rendering graphic images. It is clearly a powerful 
(Turing equivalent) image definition language. PICT and a subset of 
PostScript are candidates for structured graphics standards.

A PostScript document can be printed on any raster output device (including 
a display) but cannot generally be edited. That's because the original 
flexibility and constraints have been discarded. Besides, a PostScript 
program may use arbitrary computation to supply parameters like placement 
and size to each operator. A QuickDraw PICT, in comparison, is a more 
restricted format of graphic primitives parameterized by constants. 
So a PICT can be edited at the level of the primitives, e.g. move 
or thicken a line. It cannot be edited at the higher level of, say, 
the bar chart data which generated the picture.

PostScript has another limitation: Not all kinds of data amount to 
marks on paper. A musical instrument description is one example. PostScript 
is just not geared for such uses.

"DIF" is another example of data being stored in a general format 
usable by future programs [DIF Technical Specification]. DIF is a 
format for spreadsheet data interchange. DIF and PostScript are both 
expressed in plain ASCII text files. This is very handy for printing, 
debugging, experimenting, and transmitting across modems. It can have 
substantial cost in compaction and read/write work, depending on use. 
We won't store IFF files this way but we could define an ASCII alternate 
representation with a converter program.

InterScript is Xerox' standard for interchange of editable documents 
[Introduction to InterScript]. It approaches a harder problem: How 
to represent editable word processor documents that may contain formatted 
text, pictures, cross-references like figure numbers, and even highly 
specialized objects like mathematical equations? InterScript aims 
to define one standard representation for each kind of information. 
Each InterScript-compatible editor is supposed to preserve the objects 
it doesn't understand and even maintain nested cross-references. So 
a simple word processor would let you edit the text of a fancy document 
without discarding the equations or disrupting the equation numbers.

Our task is similarly to store high level information and preserve 
as much content as practical while moving it between programs. But 
we need to span a larger universe of data types and cannot expect 
to centrally define them all. Fortunately, we don't need to make programs 
preserve information that they don't understand. And for better or 
worse, we don't have to tackle general-purpose cross-references yet.



3. Primitive Data Types

Atomic components such as integers and characters that are interpretable 
directly by the CPU are specified in one format for all processors. 
We chose a format that's most convenient for the Motorola MC68000 
processor [M68000 16/32-Bit Microprocessor Programmer's Reference 
Manual].

N.B.: Part 3 dictates the format for "primitive" data types where and 
only where used in the overall file structure and standard kinds of 
chunks (Cf. Chunks). The number of such occurrences will be small 
enough that the costs of conversion, storage, and management of processor-
specific files would far exceed the costs of conversion during I/O by "foreign" 
programs. A particular data chunk may be specified with a different 
format for its internal primitive types or with processor- or environment-
speci fic variants if necessary to optimize local usage. Since that hurts 
data interchange, it's not recommended. (Cf. Designing New Data Sections, 
in Part 4.)

Alignment

All data objects larger than a byte are aligned on even byte addresses 
relative to the start of the file. This may require padding. Pad bytes 
are to be written as zeros, but don't count on that when reading.

This means that every odd-length "chunk" (see below) must be padded 
so that the next one will fall on an even boundary. Also, designers 
of structures to be stored in chunks should include pad fields where 
needed to align every field larger than a byte. Zeros should be stored 
in all the pad bytes.

Justification: Even-alignment causes a little extra work for files 
that are used only on certain processors but allows 68000 programs 
to construct and scan the data in memory and do block I/O. You just 
add an occasional pad field to data structures that you're going to 
block read/write or else stream read/write an extra byte. And the 
same source code works on all processors. Unspecified alignment, on 
the other hand, would force 68000 programs to (dis)assemble word and 
long-word data one byte at a time. Pretty cumbersome in a high level 
language. And if you don't conditionally compile that out for other 
processors, you won't gain anything.

Numbers

Numeric types supported are two's complement binary integers in the 
format used by the MC68000 processor high byte first, high word first the 
reverse of 8088 and 6502 format. They could potentially include signed 
and unsigned 8, 16, and 32 bit integers but the standard only uses 
the following:

UBYTE	 8 bits unsigned
WORD	16 bits signed
UWORD	16 bits unsigned
LONG	32 bits signed

The actual type definitions depend on the CPU and the compiler. In 
this document, we'll express data type definitions in the C programming 
language. [See C, A Reference Manual.] In 68000 Lattice C:

typedef unsigned char	UBYTE;	/*  8 bits unsigned	*/
typedef short	WORD;	/* 16 bits signed	*/
typedef unsigned short	UWORD;	/* 16 bits unsigned	*/
typedef long	LONG;	/* 32 bits signed	*/

Characters

The following character set is assumed wherever characters are used, 
e.g. in text strings, IDs, and TEXT chunks (see below).

Characters are encoded in 8-bit ASCII. Characters in the range NUL 
(hex 0) through DEL (hex 7F) are well defined by the 7-bit ASCII standard. 
IFF uses the graphic group RJS (SP, hex 20) through R~S (hex 7E).

Most of the control character group hex 01 through hex 1F have no 
standard meaning in IFF. The control character LF (hex 0A) is defined 
as a "newline" character. It denotes an intentional line break, that 
is, a paragraph or line terminator. (There is no way to store an automatic 
line break. That is strictly a function of the margins in the environment 
the text is placed.) The control character ESC (hex 1B) is a reserved 
escape character under the rules of ANSI standard 3.64-1979 American 
National Standard Additional Control Codes for Use with ASCII, ISO 
standard 2022, and ISO/DIS standard 6429.2.

Characters in the range hex 7F through hex FF are not globally defined 
in IFF. They are best left reserved for future standardization. But 
note that the FORM type FTXT (formatted text) defines the meaning 
of these characters within FTXT forms. In particular, character values 
hex 7F through hex 9F are control codes while characters hex A0 through 
hex FF are extended graphic characters like  , as per the ISO and 
ANSI standards cited above. [See the supplementary document "FTXT" 
IFF Formatted Text.]

Dates

A "creation date" is defined as the date and time a stream of data 
bytes was created. (Some systems call this a "last modified date".) 
Editing some data changes its creation date. Moving the data between 
volumes or machines does not.

The IFF standard date format will be one of those used in MS-DOS, 
Macintosh, or Amiga DOS (probably a 32-bit unsigned number of seconds 
since a reference point). Issue: Investigate these three.

Type IDs

A "type ID", "property name", "FORM type", or any other IFF identifier 
is a 32-bit value: the concatenation of four ASCII characters in the 
range R S (SP, hex 20) through R~S (hex 7E). Spaces (hex 20) should 
not precede printing characters; trailing spaces are ok. Control characters 
are forbidden.

typedef CHAR ID[4];

IDs are compared using a simple 32-bit case-dependent equality test.

Data section type IDs (aka FORM types) are restriced IDs. (Cf. Data 
Sections.) Since they may be stored in filename extensions (Cf. Single 
Purpose Files) lower case letters and punctuation marks are forbidden. 
Trailing spaces are ok.

Carefully choose those four characters when you pick a new ID. Make 
them mnemonic so programmers can look at an interchange format file 
and figure out what kind of data it contains. The name space makes 
it possible for developers scattered around the globe to generate 
ID values with minimal collisions so long as they choose specific 
names like "MUS4" instead of general ones like "TYPE" and "FILE". 
EA will "register" new FORM type IDs and format descriptions as they're 
devised, but collisions will be improbable so there will be no pressure 
on this "clearinghouse" process. Appendix A has a list of currently 
defined IDs.

Sometimes it's necessary to make data format changes that aren't backward 
compatible. Since IDs are used to denote data formats in IFF, new 
IDs are chosen to denote revised formats. Since programs won't read 
chunks whose IDs they don't recognize (see Chunks, below), the new 
IDs keep old programs from stumbling over new data. The conventional 
way to chose a "revision" ID is to increment the last character if 
it's a digit or else change the last character to a digit. E.g. first 
and second revisions of the ID "XY" would be "XY1" and "XY2". Revisions 
of "CMAP" would be "CMA1" and "CMA2".

Chunks

Chunks are the building blocks in the IFF structure. The form expressed 
as a C typedef is:

typedef struct {
	ID	ckID;
	LONG	ckSize;	/* sizeof(ckData) */
	UBYTE	ckData[/* ckSize */];
	} Chunk;

We can diagram an example chunk a "CMAP" chunk containing 12 data 
bytes like this:
			----------------
		ckID:	|    'CMAP'    |
		ckSize: |      12      |
		ckData: | 0, 0, 0, 32  |   -------- 
			| 0, 0, 64, 0  |    12 bytes
			| 0, 0, 64, 0  |   ---------
			----------------

The fixed header part means "Here's a type ckID chunk with ckSize 
bytes of data."

The ckID identifies the format and purpose of the chunk. As a rule, 
a program must recognize ckID to interpret ckData. It should skip 
over all unrecognized chunks. The ckID also serves as a format version 
number as long as we pick new IDs to identify new formats of ckData 
(see above).

The following ckIDs are universally reserved to identify chunks with 
particular IFF meanings: "LIST", "FORM", "PROP", "CAT ", and "    
". The special ID "    " (4 spaces) is a ckID for "filler" chunks, 
that is, chunks that fill space but have no meaningful contents. The 
IDs "LIS1" through "LIS9", "FOR1" through "FOR9", and "CAT1" through 
"CAT9" are reserved for future "version number" variations. All IFF-compatible 
software must account for these 23 chunk IDs. Appendix A has a list 
of predefined IDs.

The ckSize is a logical block size how many data bytes are in ckData. 
If ckData is an odd number of bytes long, a 0 pad byte follows which 
is not included in ckSize. (Cf. Alignment.) A chunk's total physical 
size is ckSize rounded up to an even number plus the size of the header. 
So the smallest chunk is 8 bytes long with ckSize = 0. For the sake 
of following chunks, programs must respect every chunk's ckSize as 
a virtual end-of-file for reading its ckData even if that data is 
malformed, e.g. if nested contents are truncated.

We can describe the syntax of a chunk as a regular expression with 
"#" representing the ckSize, i.e. the length of the following {braced} 
bytes. The "[0]" represents a sometimes needed pad byte. (The regular 
expressions in this document are collected in Appendix A along with 
an explanation of notation.)

Chunk	::= ID #{ UBYTE* } [0]

One chunk output technique is to stream write a chunk header, stream 
write the chunk contents, then random access back to the header to 
fill in the size. Another technique is to make a preliminary pass 
over the data to compute the size, then write it out all at once.

Strings, String Chunks, and String Properties

In a string of ASCII text, LF denotes a forced line break (paragraph 
or line terminator). Other control characters are not used. (Cf. Characters.)

The ckID for a chunk that contains a string of plain, unformatted 
text is "TEXT". As a practical matter, a text string should probably 
not be longer than 32767 bytes. The standard allows up to 231 - 1 
bytes.

When used as a data property (see below), a text string chunk may 
be 0 to 255 characters long. Such a string is readily converted to 
a C string or a Pascal STRING[255]. The ckID of a property must be 
the property name, not "TEXT".

When used as a part of a chunk or data property, restricted C string 
format is normally used. That means 0 to 255 characters followed by 
a NUL byte (ASCII value 0).

Data Properties

Data properties specify attributes for following (non-property) chunks. 
A data property essentially says "identifier = value", for example 
"XY = (10, 200)", telling something about following chunks. Properties 
may only appear inside data sections ("FORM" chunks, cf. Data Sections) 
and property sections ("PROP" chunks, cf. Group PROP).

The form of a data property is a special case of Chunk. The ckID is 
a property name as well as a property type. The ckSize should be small 
since data properties are intended to be accumulated in RAM when reading 
a file. (256 bytes is a reasonable upper bound.) Syntactically:

Property	::= Chunk

When designing a data object, use properties to describe context information 
like the size of an image, even if they don't vary in your program. 
Other programs will need this information.

Think of property settings as assignments to variables in a programming 
language. Multiple assignments are redundant and local assignments 
temporarily override global assignments. The order of assignments 
doesn't matter as long as they precede the affected chunks. (Cf. LISTs, 
CATs, and Shared Properties.)

Each object type (FORM type) is a local name space for property IDs. 
Think of a "CMAP" property in a "FORM ILBM" as the qualified ID "ILBM.CMAP". 
Property IDs specified when an object type is designed (and therefore 
known to all clients) are called "standard" while specialized ones 
added later are "nonstandard".

Links

Issue: A standard mechanism for "links" or "cross references" is very 
desirable for things like combining images and sounds into animations. 
Perhaps we'll define "link" chunks within FORMs that refer to other 
FORMs or to specific chunks within the same and other FORMs. This 
needs further work. EA IFF 1985 has no standard link mechanism.

For now, it may suffice to read a list of, say, musical instruments, 
and then just refer to them within a musical score by index number.

File References

Issue: We may need a standard form for references to other files. 
A "file ref" could name a directory and a file in the same type of 
operating system as the ref's originator. Following the reference 
would expect the file to be on some mounted volume. In a network environment, 
a file ref could name a server, too.

Issue: How can we express operating-system independent file refs?

Issue: What about a means to reference a portion of another file? 
Would this be a "file ref" plus a reference to a "link" within the 
target file?



4. Data Sections

The first thing we need of a file is to check: Does it contain IFF 
data and, if so, does it contain the kind of data we're looking for? 
So we come to the notion of a "data section".

A "data section" or IFF "FORM" is one self-contained "data object" 
that might be stored in a file by itself. It is one high level data 
object such as a picture or a sound effect. The IFF structure "FORM" 
makes it self- identifying. It could be a composite object like a 
musical score with nested musical instrument descriptions.

Group FORM

A data section is a chunk with ckID "FORM" and this arrangement:

FORM	::= "FORM" #{ FormType (LocalChunk | FORM | LIST | CAT)* 
}
FormType	::= ID
LocalChunk	::= Property | Chunk

The ID "FORM" is a syntactic keyword like "struct" in C. Think of 
a "struct ILBM" containing a field "CMAP". If you see "FORM" you'll 
know to expect a FORM type ID (the structure name, "ILBM" in this 
example) and a particular contents arrangement or "syntax" (local 
chunks, FORMs, LISTs, and CATs). (LISTs and CATs are discussed in 
part 5, below.) A "FORM ILBM", in particular, might contain a local 
chunk "CMAP", an "ILBM.CMAP" (to use a qualified name).

So the chunk ID "FORM" indicates a data section. It implies that the 
chunk contains an ID and some number of nested chunks. In reading 
a FORM, like any other chunk, programs must respect its ckSize as 
a virtual end-of-file for reading its contents, even if they're truncated.

The FormType (or FORM type) is a restricted ID that may not contain 
lower case letters or punctuation characters. (Cf. Type IDs. Cf. Single 
Purpose Files.)

The type-specific information in a FORM is composed of its "local 
chunks": data properties and other chunks. Each FORM type is a local 
name space for local chunk IDs. So "CMAP" local chunks in other FORM 
types may be unrelated to "ILBM.CMAP". More than that, each FORM type 
defines semantic scope. If you know what a FORM ILBM is, you'll know 
what an ILBM.CMAP is.

Local chunks defined when the FORM type is designed (and therefore 
known to all clients of this type) are called "standard" while specialized 
ones added later are "nonstandard".

Among the local chunks, property chunks give settings for various 
details like text font while the other chunks supply the essential 
information. This distinction is not clear cut. A property setting 
cancelled by a later setting of the same property has effect only 
on data chunks in between. E.g. in the sequence:

prop1 = x  (propN = value)*  prop1 = y

where the propNs are not prop1, the setting prop1 = x has no effect.

The following universal chunk IDs are reserved inside any FORM: "LIST", 
"FORM", "PROP", "CAT ", "JJJJ", "LIS1" through "LIS9", "FOR1" through 
"FOR9", and "CAT1" through "CAT9". (Cf. Chunks. Cf. Group LIST. Cf. 
Group PROP.) For clarity, these universal chunk names may not be FORM 
type IDs, either.

Part 5, below, talks about grouping FORMs into LISTs and CATs. They 
let you group a bunch of FORMs but don't impose any particular meaning 
or constraints on the grouping. Read on.

Composite FORMs

A FORM chunk inside a FORM is a full-fledged data section. This means 
you can build a composite object like a multi-frame animation sequence 
from available picture FORMs and sound effect FORMs. You can insert 
additional chunks with information like frame rate and frame count.

Using composite FORMs, you leverage on existing programs that create 
and edit the component FORMs. Those editors may even look into your 
composite object to copy out its type of component, although it'll 
be the rare program that's fancy enough to do that. Such editors are 
not allowed to replace their component objects within your composite 
object. That's because the IFF standard lets you specify consistency 
requirements for the composite FORM such as maintaining a count or 
a directory of the components. Only programs that are written to uphold 
the rules of your FORM type should create or modify such FORMs.

Therefore, in designing a program that creates composite objects, 
you are strongly requested to provide a facility for your users to 
import and export the nested FORMs. Import and export could move the 
data through a clipboard or a file.

Here are several existing FORM types and rules for defining new ones.

FTXT

An FTXT data section contains text with character formatting information 
like fonts and faces. It has no paragraph or document formatting information 
like margins and page headers. FORM FTXT is well matched to the text 
representation in Amiga's Intuition environment. See the supplemental 
document "FTXT" IFF Formatted Text.

ILBM

"ILBM" is an InterLeaved BitMap image with color map; a machine-independent 
format for raster images. FORM ILBM is the standard image file format 
for the Commodore-Amiga computer and is useful in other environments, 
too. See the supplemental document "ILBM" IFF Interleaved Bitmap.

PICS

The data chunk inside a "PICS" data section has ID "PICT" and holds 
a QuickDraw picture. Issue: Allow more than one PICT in a PICS? See 
Inside Macintosh chapter "QuickDraw" for details on PICTs and how 
to create and display them on the Macintosh computer.

The only standard property for PICS is "XY", an optional property 
that indicates the position of the PICT relative to "the big picture". 
The contents of an XY is a QuickDraw Point.

Note: PICT may be limited to Macintosh use, in which case there'll 
be another format for structured graphics in other environments.

Other Macintosh Resource Types

Some other Macintosh resource types could be adopted for use within 
IFF files; perhaps MWRT, ICN, ICN#, and STR#.

Issue: Consider the candidates and reserve some more IDs.

Designing New Data Sections

Supplemental documents will define additional object types. A supplement 
needs to specify the object's purpose, its FORM type ID, the IDs and 
formats of standard local chunks, and rules for generating and interpreting 
the data. It's a good idea to supply typedefs and an example source 
program that accesses the new object. See "ILBM" IFF Interleaved Bitmap 
for a good example.

Anyone can pick a new FORM type ID but should reserve it with Electronic 
Arts at their earliest convenience. [Issue: EA contact person? Hand 
this off to another organization?] While decentralized format definitions 
and extensions are possible in IFF, our preference is to get design 
consensus by committee, implement a program to read and write it, 
perhaps tune the format, and then publish the format with example 
code. Some organization should remain in charge of answering questions 
and coordinating extensions to the format.

If it becomes necessary to revise the design of some data section, 
its FORM type ID will serve as a version number (Cf. Type IDs). E.g. 
a revised "VDEO" data section could be called "VDE1". But try to get 
by with compatible revisions within the existing FORM type.

In a new FORM type, the rules for primitive data types and word-alignment 
(Cf. Primitive Data Types) may be overriden for the contents of its 
local chunks but not for the chunk structure itself if your documentation 
spells out the deviations. If machine-specific type variants are needed, 
e.g. to store vast numbers of integers in reverse bit order, then 
outline the conversion algorithm and indicate the variant inside each 
file, perhaps via different FORM types. Needless to say, variations 
should be minimized.

In designing a FORM type, encapsulate all the data that other programs 
will need to interpret your files. E.g. a raster graphics image should 
specify the image size even if your program always uses 320 x 200 
pixels x 3 bitplanes. Receiving programs are then empowered to append 
or clip the image rectangle, to add or drop bitplanes, etc. This enables 
a lot more compatibility.

Separate the central data (like musical notes) from more specialized 
information (like note beams) so simpler programs can extract the 
central parts during read-in. Leave room for expansion so other programs 
can squeeze in new kinds of information (like lyrics). And remember 
to keep the property chunks manageably short let's say 2 256 bytes.

When designing a data object, try to strike a good tradeoff between 
a super-general format and a highly-specialized one. Fit the details 
to at least one particular need, for example a raster image might 
as well store pixels in the current machine's scan order. But add 
the kind of generality that makes it usable with foreseeable hardware 
and software. E.g. use a whole byte for each red, green, and blue 
color value even if this year's computer has only 4-bit video DACs. 
Think ahead and help other programs so long as the overhead is acceptable. 
E.g. run compress a raster by scan line rather than as a unit so future 
programs can swap images by scan line to and from secondary storage.

Try to design a general purpose "least common multiple" format that 
encompasses the needs of many programs without getting too complicated. 
Let's coalesce our uses around a few such formats widely separated 
in the vast design space. Two factors make this flexibility and simplicity 
practical. First, file storage space is getting very plentiful, so 
compaction is not a priority. Second, nearly any locally-performed 
data conversion work during file reading and writing will be cheap 
compared to the I/O time.

It must be ok to copy a LIST or FORM or CAT intact, e.g. to incorporate 
it into a composite FORM. So any kind of internal references within 
a FORM must be relative references. They could be relative to the 
start of the containing FORM, relative from the referencing chunk, 
or a sequence number into a collection.

With composite FORMs, you leverage on existing programs that create 
and edit the components. If you write a program that creates composite 
objects, please provide a facility for your users to import and export 
the nested FORMs. The import and export functions may move data through 
a separate file or a clipboard.

Finally, don't forget to specify all implied rules in detail.



5. LISTs, CATs, and Shared Properties

Data often needs to be grouped together like a list of icons. Sometimes 
a trick like arranging little images into a big raster works, but 
generally they'll need to be structured as a first class group. The 
objects "LIST" and "CAT" are IFF-universal mechanisms for this purpose.

Property settings sometimes need to be shared over a list of similar 
objects. E.g. a list of icons may share one color map. LIST provides 
a means called "PROP" to do this. One purpose of a LIST is to define 
the scope of a PROP. A "CAT", on the other hand, is simply a concatenation 
of objects.

Simpler programs may skip LISTs and PROPs altogether and just handle 
FORMs and CATs. All "fully-conforming" IFF programs also know about 
"CAT ", "LIST", and "PROP". Any program that reads a FORM inside a 
LIST must process shared PROPs to correctly interpret that FORM.

Group CAT

A CAT is just an untyped group of data objects.

Structurally, a CAT is a chunk with chunk ID "CAT " containing a "contents 
type" ID followed by the nested objects. The ckSize of each contained 
chunk is essentially a relative pointer to the next one.

CAT	::= "CAT " #{ ContentsType (FORM | LIST | CAT)* }
ContentsType	::= ID	-- a hint or an "abstract data type" ID

In reading a CAT, like any other chunk, programs must respect it's 
ckSize as a virtual end-of-file for reading the nested objects even 
if they're malformed or truncated.

The "contents type" following the CAT's ckSize indicates what kind 
of FORMs are inside. So a CAT of ILBMs would store "ILBM" there. It's 
just a hint. It may be used to store an "abstract data type". A CAT 
could just have blank contents ID ("JJJJ") if it contains more than 
one kind of FORM.

CAT defines only the format of the group. The group's meaning is open 
to interpretation. This is like a list in LISP: the structure of cells 
is predefined but the meaning of the contents as, say, an association 
list depends on use. If you need a group with an enforced meaning 
(an "abstract data type" or Smalltalk "subclass"), some consistency 
constraints, or additional data chunks, use a composite FORM instead 
(Cf. Composite FORMs).

Since a CAT just means a concatenation of objects, CATs are rarely 
nested. Programs should really merge CATs rather than nest them.

Group LIST

A LIST defines a group very much like CAT but it also gives a scope 
for PROPs (see below). And unlike CATs, LISTs should not be merged 
without understanding their contents.

Structurally, a LIST is a chunk with ckID "LIST" containing a "contents 
type" ID, optional shared properties, and the nested contents (FORMs, 
LISTs, and CATs), in that order. The ckSize of each contained chunk 
is a relative pointer to the next one. A LIST is not an arbitrary 
linked list the cells are simply concatenated.

LIST	::= "LIST" #{ ContentsType PROP* (FORM | LIST | CAT)* }
ContentsType	::= ID

Group PROP

PROP chunks may appear in LISTs (not in FORMs or CATs). They supply 
shared properties for the FORMs in that LIST. This ability to elevate 
some property settings to shared status for a list of forms is useful 
for both indirection and compaction. E.g. a list of images with the 
same size and colors can share one "size" property and one "color 
map" property. Individual FORMs can override the shared settings.

The contents of a PROP is like a FORM with no data chunks:

PROP	::= "PROP" #{ FormType Property* }

It means, "Here are the shared properties for FORM type <<FormType>."

A LIST may have at most one PROP of a FORM type, and all the PROPs 
must appear before any of the FORMs or nested LISTs and CATs. You 
can have subsequences of FORMs sharing properties by making each subsequence 
a LIST.

Scoping: Think of property settings as variable bindings in nested 
blocks of a programming language. Where in C you could write:

TEXT_FONT text_font = Courier;	/* program's global default	*/

File(); {
	TEXT_FONT text_font = TimesRoman;	/* shared setting	*/

		{
		TEXT_FONT text_font = Helvetica;  /* local setting	*/
		Print("Hello ");	/* uses font Helvetica	*/
		}

		{
		Print("there.");	/* uses font TimesRoman	*/
		}
	}

An IFF file could contain:

LIST {
	PROP TEXT {
		FONT {TimesRoman}	/* shared setting	*/
		}

	FORM TEXT {
		FONT {Helvetica}	/* local setting	*/
		CHRS {Hello }		/* uses font Helvetica	*/
		}

	FORM TEXT {
		CHRS {there.}	/* uses font TimesRoman	*/
		}
	}

The shared property assignments selectively override the reader's 
global defaults, but only for FORMs within the group. A FORM's own 
property assignments selectively override the global and group-supplied 
values. So when reading an IFF file, keep property settings on a stack. 
They're designed to be small enough to hold in main memory.

Shared properties are semantically equivalent to copying those properties 
into each of the nested FORMs right after their FORM type IDs.

Properties for LIST

Optional "properties for LIST" store the origin of the list's contents 
in a PROP chunk for the fake FORM type "LIST". They are the properties 
originating program "OPGM", processor family "OCPU", computer type 
"OCMP", computer serial number or network address "OSN ", and user 
name "UNAM". In our imperfect world, these could be called upon to 
distinguish between unintended variations of a data format or to work 
around bugs in particular originating/receiving program pairs. Issue: 
Specify the format of these properties.

A creation date could also be stored in a property but let's ask that 
file creating, editing, and transporting programs maintain the correct 
date in the local file system. Programs that move files between machine 
types are expected to copy across the creation dates.



6. Standard File Structure

File Structure Overview

An IFF file is just a single chunk of type FORM, LIST, or CAT. Therefore 
an IFF file can be recognized by its first 4 bytes: "FORM", "LIST", 
or "CAT ". Any file contents after the chunk's end are to be ignored.

Since an IFF file can be a group of objects, programs that read/write 
single objects can communicate to an extent with programs that read/write 
groups. You're encouraged to write programs that handle all the objects 
in a LIST or CAT. A graphics editor, for example, could process a 
list of pictures as a multiple page document, one page at a time.

Programs should enforce IFF's syntactic rules when reading and writing 
files. This ensures robust data transfer. The public domain IFF reader/writer 
subroutine package does this for you. A utility program "IFFCheck" 
is available that scans an IFF file and checks it for conformance 
to IFF's syntactic rules. IFFCheck also prints an outline of the chunks 
in the file, showing the ckID and ckSize of each. This is quite handy 
when building IFF programs. Example programs are also available to 
show details of reading and writing IFF files.

A merge program "IFFJoin" will be available that logically appends 
IFF files into a single CAT group. It "unwraps" each input file that 
is a CAT so that the combined file isn't nested CATs.

If we need to revise the IFF standard, the three anchoring IDs will 
be used as "version numbers". That's why IDs "FOR1" through "FOR9", 
"LIS1" through "LIS9", and "CAT1" through "CAT9" are reserved.

IFF formats are designed for reasonable performance with floppy disks. 
We achieve considerable simplicity in the formats and programs by 
relying on the host file system rather than defining universal grouping 
structures like directories for LIST contents. On huge storage systems, 
IFF files could be leaf nodes in a file structure like a B-tree. Let's 
hope the host file system implements that for us!

Thre are two kinds of IFF files: single purpose files and scrap files. 
They differ in the interpretation of multiple data objects and in 
the file's external type.

Single Purpose Files

A single purpose IFF file is for normal "document" and "archive" storage. 
This is in contrast with "scrap files" (see below) and temporary backing 
storage (non-interchange files).

The external file type (or filename extension, depending on the host 
file system) indicates the file's contents. It's generally the FORM 
type of the data contained, hence the restrictions on FORM type IDs.

Programmers and users may pick an "intended use" type as the filename 
extension to make it easy to filter for the relevant files in a filename 
requestor. This is actually a "subclass" or "subtype" that conveniently 
separates files of the same FORM type that have different uses. Programs 
cannot demand conformity to its expected subtypes without overly restricting 
data interchange since they cannot know about the subtypes to be used 
by future programs that users will want to exchange data with.

Issue: How to generate 3-letter MS-DOS extensions from 4-letter FORM 
type IDs?

Most single purpose files will be a single FORM (perhaps a composite 
FORM like a musical score containing nested FORMs like musical instrument 
descriptions). If it's a LIST or a CAT, programs should skip over 
unrecognized objects to read the recognized ones or the first recognized 
one. Then a program that can read a single purpose file can read something 
out of a "scrap file", too.

Scrap Files

A "scrap file" is for maximum interconnectivity in getting data between 
programs; the core of a clipboard function. Scrap files may have type 
"IFF " or filename extension ".IFF".

A scrap file is typically a CAT containing alternate representations 
of the same basic information. Include as many alternatives as you 
can readily generate. This redundancy improves interconnectivity in 
situations where we can't make all programs read and write super-general 
formats. [Inside Macintosh chapter "Scrap Manager".] E.g. a graphically-
annotated musical score might be supplemented by a stripped down 4-voice 
melody and by a text (the lyrics).

The originating program should write the alternate representations 
in order of "preference": most preferred (most comprehensive) type 
to least preferred (least comprehensive) type. A receiving program 
should either use the first appearing type that it understands or 
search for its own "preferred" type.

A scrap file should have at most one alternative of any type. (A LIST 
of same type objects is ok as one of the alternatives.) But don't 
count on this when reading; ignore extra sections of a type. Then 
a program that reads scrap files can read something out of single 
purpose files.

Rules for Reader Programs

Here are some notes on building programs that read IFF files. If you 
use the standard IFF reader module "IFFR.C", many of these rules and 
details will be automatically handled. (See "Support Software" in 
Appendix A.) We recommend that you start from the example program 
"ShowILBM.C". You should also read up on recursive descent parsers. 
[See, for example, Compiler Construction, An Advanced Course.]

%	The standard is very flexible so many programs can exchange 
data. This implies a program has to scan the file and react to what's 
actually there in whatever order it appears. An IFF reader program 
is a parser.

%	For interchange to really work, programs must be willing to 
do some conversion during read-in. If the data isn't exactly what 
you expect, say, the raster is smaller than those created by your 
program, then adjust it. Similarly, your program could crop a large 
picture, add or drop bitplanes, and create/discard a mask plane. The 
program should give up gracefully on data that it can't convert.

%	If it doesn't start with "FORM", "LIST", or "CAT ", it's not 
an IFF-85 file.

%	For any chunk you encounter, you must recognize its type ID 
to understand its contents.

%	For any FORM chunk you encounter, you must recognize its FORM 
type ID to understand the contained "local chunks". Even if you don't 
recognize the FORM type, you can still scan it for nested FORMs, LISTs, 
and CATs of interest.

%	Don't forget to skip the pad byte after every odd-length chunk.

%	Chunk types LIST, FORM, PROP, and CAT are generic groups. They 
always contain a subtype ID followed by chunks.

%	Readers ought to handle a CAT of FORMs in a file. You may treat 
the FORMs like document pages to sequence through or just use the 
first FORM.

%	Simpler IFF readers completely skip LISTs. "Fully IFF-conforming" 
readers are those that handle LISTs, even if just to read the first 
FORM from a file. If you do look into a LIST, you must process shared 
properties (in PROP chunks) properly. The idea is to get the correct 
data or none at all.

%	The nicest readers are willing to look into unrecognized FORMs 
for nested FORM types that they do recognize. For example, a musical 
score may contain nested instrument descriptions and an animation 
file may contain still pictures.

Note to programmers: Processing PROP chunks is not simple! You'll 
need some background in interpreters with stack frames. If this is 
foreign to you, build programs that read/write only one FORM per file. 
For the more intrepid programmers, the next paragraph summarizes how 
to process LISTs and PROPs. See the general IFF reader module "IFFR.C" 
and the example program "ShowILBM.C" for details.

Allocate a stack frame for every LIST and FORM you encounter and initialize 
it by copying the stack frame of the parent LIST or FORM. At the top 
level, you'll need a stack frame initialized to your program's global 
defaults. While reading each LIST or FORM, store all encountered properties 
into the current stack frame. In the example ShowILBM, each stack 
frame has a place for a bitmap header property ILBM.BMHD and a color 
map property ILBM.CMAP. When you finally get to the ILBM's BODY chunk, 
use the property settings accumulated in the current stack frame.

An alternate implementation would just remember PROPs encountered, 
forgetting each on reaching the end of its scope (the end of the containing 
LIST). When a FORM XXXX is encountered, scan the chunks in all remembered 
PROPs XXXX, in order, as if they appeared before the chunks actually 
in the FORM XXXX. This gets trickier if you read FORMs inside of FORMs.

Rules for Writer Programs

Here are some notes on building programs that write IFF files, which 
is much easier than reading them. If you use the standard IFF writer 
module "IFFW.C" (see "Support Software" in Appendix A), many of these 
rules and details will automatically be enforced. See the example 
program "Raw2ILBM.C".

%	An IFF file is a single FORM, LIST, or CAT chunk.

%	Any IFF-85 file must start with the 4 characters "FORM", "LIST", 
or "CAT ", followed by a LONG ckSize. There should be no data after 
the chunk end.

%	Chunk types LIST, FORM, PROP, and CAT are generic. They always 
contain a subtype ID followed by chunks. These three IDs are universally 
reserved, as are "LIS1" through "LIS9", "FOR1" through "FOR9", "CAT1" 
through "CAT9", and "    ".

%	Don't forget to write a 0 pad byte after each odd-length chunk.

%	Four techniques for writing an IFF group: (1) build the data 
in a file mapped into virtual memory, (2) build the data in memory 
blocks and use block I/O, (3) stream write the data piecemeal and 
(don't forget!) random access back to set the group length count, 
and (4) make a preliminary pass to compute the length count then stream 
write the data.

%	Do not try to edit a file that you don't know how to create. 
Programs may look into a file and copy out nested FORMs of types that 
they recognize, but don't edit and replace the nested FORMs and don't 
add or remove them. That could make the containing structure inconsistent. 
You may write a new file containing items you copied (or copied and 
modified) from another IFF file, but don't copy structural parts you 
don't understand.

%	You must adhere to the syntax descriptions in Appendex A. E.g. 
PROPs may only appear inside LISTs.




Appendix A. Reference

Type Definitions

The following C typedefs describe standard IFF structures. Declarations 
to use in practice will vary with the CPU and compiler. For example, 
68000 Lattice C produces efficient comparison code if we define ID 
as a "LONG". A macro "MakeID" builds these IDs at compile time.

/* Standard IFF types, expressed in 68000 Lattice C.	*/

typedef unsigned char UBYTE;	/*  8 bits unsigned	*/
typedef short WORD;	/* 16 bits signed	*/
typedef unsigned short UWORD;	/* 16 bits unsigned	*/
typedef long LONG;	/* 32 bits signed	*/

typedef char ID[4];	/* 4 chars in ' ' through '~'	*/

typedef struct {
	ID	ckID;
	LONG	ckSize;	/* sizeof(ckData)	*/
	UBYTE	ckData[/* ckSize */];
	} Chunk;

/* ID typedef and builder for 68000 Lattice C. */
typedef LONG ID; 	/* 4 chars in ' ' through '~'	*/
#define MakeID(a,b,c,d) ( (a)<<<<24 | (b)<<<<16 | (c)<<<<8 | (d) )

/* Globally reserved IDs. */
#define ID_FORM   MakeID('F','O','R','M')
#define ID_LIST   MakeID('L','I','S','T')
#define ID_PROP   MakeID('P','R','O','P')
#define ID_CAT    MakeID('C','A','T',' ')
#define ID_FILLER MakeID(' ',' ',' ',' ')

Syntax Definitions

Here's a collection of the syntax definitions in this document.

Chunk	::= ID #{ UBYTE* } [0]

Property	::= Chunk

FORM	::= "FORM" #{ FormType (LocalChunk | FORM | LIST | CAT)* 
}
FormType	::= ID
LocalChunk	::= Property | Chunk

CAT	::= "CAT " #{ ContentsType (FORM | LIST | CAT)* }
ContentsType	::= ID	-- a hint or an "abstract data type" ID

LIST	::= "LIST" #{ ContentsType PROP* (FORM | LIST | CAT)* }
PROP	::= "PROP" #{ FormType Property* }

In this extended regular expression notation, the token "#" represents 
a ckSize LONG count of the following {braced} data bytes. Literal 
items are shown in "quotes", [square bracketed items] are optional, 
and "*" means 0 or more instances. A sometimes-needed pad byte is 
shown as "[0]".

Defined Chunk IDs

This is a table of currently defined chunk IDs. We may also borrow 
some Macintosh IDs and data formats.

Group chunk IDs
	FORM, LIST, PROP, CAT.
Future revision group chunk IDs
	FOR1 I FOR9, LIS1 I LIS9, CAT1 I CAT9.
FORM type IDs
	(The above group chunk IDs may not be used for FORM type IDs.)
	(Lower case letters and punctuation marks are forbidden in FORM 
type IDs.)
	8SVX 8-bit sampled sound voice, ANBM animated bitmap, FNTR raster 
font, FNTV vector font, FTXT formatted text, GSCR general-use musical 
score, ILBM interleaved raster bitmap image, PDEF Deluxe Print page 
definition, PICS Macintosh picture, PLBM (obsolete), USCR Uhuru Sound 
Software musical score, UVOX Uhuru Sound Software Macintosh voice, 
SMUS simple musical score, VDEO Deluxe Video Construction Set video.
Data chunk IDs
	"JJJJ", TEXT, PICT.
PROP LIST property IDs
	OPGM, OCPU, OCMP, OSN, UNAM.



Support Software

These public domain C source programs are available for use in building 
IFF-compatible programs:

IFF.H, IFFR.C, IFFW.C	

		IFF reader and writer package. 
		These modules handle many of the details of reliably 
		reading and writing IFF files.

IFFCheck.C	This handy utility program scans an IFF file, checks 
		that the contents are well formed, and prints an outline 
		of the chunks.

PACKER.H, Packer.C, UnPacker.C	

		Run encoder and decoder used for ILBM files.

ILBM.H, ILBMR.C, ILBMW.C	

		Reader and writer support routines for raster image 
		FORM ILBM. ILBMR calls IFFR and UnPacker. ILBMW calls 
		IFFW and Packer.

ShowILBM.C	
		Example caller of IFFR and ILBMR modules. This 
		Commodore-Amiga program reads and displays a FORM ILBM.
Raw2ILBM.C	
		Example ILBM writer program. As a demonstration, it 
		reads a raw raster image file and writes the image 
		as a FORM ILBM file.
ILBM2Raw.C	
		Example ILBM reader program.  Reads a FORM ILBM file
		and writes it into a raw raster image.

REMALLOC.H, Remalloc.c

		Memory allocation routines used in these examples.

INTUALL.H	generic "include almost everything" include-file
		with the sequence of includes correctly specified.

READPICT.H, ReadPict.c	

		given an ILBM file, read it into a bitmap and 
		a color map

PUTPICT.H, PutPict.c 	

		given a bitmap and a color map, save it as
		an ILBM file.

GIO.H, Gio.c	generic I/O speedup package.  Attempts to speed
		disk I/O by buffering writes and reads.

giocall.c	sample call to gio.

ilbmdump.c	reads in ILBM file, prints out ascii representation
		for including in C files.

bmprintc.c	prints out a C-language representation of data for
		a bitmap.



Example Diagrams

Here's a box diagram for an example IFF file, a raster image FORM 
ILBM. This FORM contains a bitmap header property chunk BMHD, a color 
map property chunk CMAP, and a raster data chunk BODY. This particular 
raster is 320 x 200 pixels x 3 bit planes uncompressed. The "0" after 
the CMAP chunk represents a zero pad byte; included since the CMAP 
chunk has an odd length. The text to the right of the diagram shows 
the outline that would be printed by the IFFCheck utility program 
for this particular file.

	+-----------------------------------+
	|'FORM'		24070		    |	FORM 24070 IBLM
	+-----------------------------------+
	|'ILBM'				    |
	+-----------------------------------+
	| +-------------------------------+ |
	| | 'BMHD'	20		  | |	.BMHD  20
	| | 320, 200, 0, 0, 3, 0, 0, ...  | |
	| + ------------------------------+ |
	| | 'CMAP'	21	          | |	.CMAP  21
	| | 0, 0, 0; 32, 0, 0; 64,0,0; .. | |
	| +-------------------------------+ |
	| 0				    |
	+-----------------------------------+
	|'BODY'		24000		    |	.BODY 24000
	|0, 0, 0, ...			    |
	+-----------------------------------+

This second diagram shows a LIST of two FORMs ILBM sharing a common 
BMHD property and a common CMAP property. Again, the text on the right 
is an outline  a la IFFCheck.


     +-----------------------------------------+
     |'LIST'		48114	       	       |  LIST  48114  AAAA
     +-----------------------------------------+
     |'AAAA'				       |  .PROP  62  ILBM
     |	+-----------------------------------+  |
     |  |'PROP'		62		    |  |  
     |  +-----------------------------------+  |
     |	|'ILBM'				    |  |
     |	+-----------------------------------+  |
     |	| +-------------------------------+ |  |
     |	| | 'BMHD'	20		  | |  |  ..BMHD  20
     |	| | 320, 200, 0, 0, 3, 0, 0, ...  | |  |
     |	| | ------------------------------+ |  |
     |	| | 'CMAP'	21	          | |  |  ..CMAP  21
     |	| | 0, 0, 0; 32, 0, 0; 64,0,0; .. | |  |
     |	| +-------------------------------+ |  |
     |	| 0				    |  |
     |	+-----------------------------------+  |
     |	+-----------------------------------+  |
     |	|'FORM'		24012		    |  |  .FORM  24012  ILBM
     |	+-----------------------------------+  |
     |	|'ILBM'				    |  |  
     |	+-----------------------------------+  |
     |	|  +-----------------------------+  |  |
     |	|  |'BODY'		24000    |  |  |  ..BODY  24000
     |	|  |0, 0, 0, ...		 |  |  |
     |	|  +-----------------------------+  |  |
     |	+-----------------------------------+  |
     |	+-----------------------------------+  |
     |	|'FORM'		24012		    |  |  .FORM  24012  ILBM
     |	+-----------------------------------+  |
     |	|'ILBM'				    |  |
     |	+-----------------------------------+  |
     |	|  +-----------------------------+  |  |
     |	|  |'BODY'		24000    |  |  |  ..BODY  24000
     |	|  |0, 0, 0, ...		 |  |  |
     |	|  +-----------------------------+  |  |
     |	+-----------------------------------+  |
     +-----------------------------------------+



Appendix B. Standards Committee

The following people contributed to the design of this IFF standard:

Bob "Kodiak" Burns, Commodore-Amiga
R. J. Mical, Commodore-Amiga
Jerry Morrison, Electronic Arts
Greg Riker, Electronic Arts
Steve Shaw, Electronic Arts
Barry Walsh, Commodore-Amiga

--------------7130410F42AE--