Tag Types background information
This document describes some of the background to the tag types format,
why I think it is useful, and the reasons for the design choices.
Why a tag types format ?
By tagged data I mean ascii text in a form like:
Name1: value1
Name2: value2
...
There are many examples of tagged data already in use, but each one has
been invented to deal with a particular specific case, and often, while
the information is presented in a tagged style format, it is not
sufficiently regular to process by a program.
Examples of existing tagged data are:
- Bibliographic records (RFC1807)
- The LDAP Interchange format
- Linux LSM files
- Quicken Interchange Files
Design Goals
Interchange format
The intention is specifically for an interchange format, rather than one which
will provide an efficient implementation. It should be easy for an application
to read in a tagged file and turn it into its own internal representation,
and also for it to output its internal representation as a tagged file.
Semantic independant processing
By this I mean that it should be possible to perform useful operations on
tagged data without knowing what the particular tagged data is about.
A program should be able to extract information from a tagged file about
recipies as well as one about books at the level of 'find me all the records
which have this value in this field'
The benefit from this is that common tools can be written, allowing programs
which deal with the semantic content to concentrate on doing that well.
Easily understood format
Many recipients of tagged data will not have (initially at least) a program
which can deal with it. It should be understandable without needing a program
to interpret the information.
A side effect of this is that it should not be nescessary to send the same
information twice, i.e. Here is this information in ascii text for a person
to read, and now here it is again for automated processing.
Ability to encapsulate general text data unchanged
Many tagged formats have problems once they move beyond the plain
Tag: Value
style.
The end result of this tends to be that descriptions, or other text that
the creator wishes to preserve will be moved outside the tag format.
I wished to avoid rules like
- Continuation lines are indicated by some special character on the end of
a line
- Values continue until the next thing which looks like a tag.
Preserving general text within the format permits arbitary data which may
come from some more powerful format to be saved inside a tag record, and
extracted again by the same application.
Not trying to solve everything
Once you start to design more than one tag type for different applications
you realise that there are overlaps between different tag types. It is
very tempting to try to build some form of object oriented system which
could be expanded to handle every possible piece of data. The tag types
format is deliberately targetted at handling the easy cases in a
consistent manner. It is possible that integration between types may be
provided by some kind of standardisation within the tag descriptor files
at a later stage.
Other influences on the design
Why not use SGML
SGML (the parent standard to HTML) has had a profound influence on the
Internet, but it is more aimed at structuring documents which are targetted
at a human reader. It would be possible to do something similar to
the tag type description types with SGML
DTDs. There are 2 reasons for not going this route. One is that the tag
types format is lightweight, and should be capable of implementation within
personal organisers, pagers etc. The second is the experimental evidence that
people seem to be drawn to a tag type format whenever they invent something
like this.
Why not use ASN.1 / SNMP
The system used within SNMP which is implicitly capable of giving a unique
object identifier to everything in the world is very powerful, but the
encoding is too complex to use in mail messages. There may be a place for
using some of the ideas.