Original version: Mon May 30 08:24:59 2005
Last updates: Wed Jun 1 16:24:06 2005

mm (mail manager) client

This directory contains releases of the C version of the TOPS-20 mm (mail manager) client and the supporting command parser library, ccmd. They were translated from the PDP-10 assembly code version, developed primarily by Mark Crispin at Stanford University, to C at Columbia University in New York City in 1987--1988 by Fuat Baran, Howie Kaye, Andrew Lowry, Chris Maio, and Melissa Metz. Further maintenance and porting work has been done at the University of Utah by Nelson H. F. Beebe.

More on the development history is available at the Columbia mm Web site, and earlier versions are available at the Columbia FTP site.

mm is a fantastic mail client than many of us have used since its first release in 1978, both on TOPS-20 and subsequent Unix systems. Particularly with the huge surge in e-mail volume at the end of the 1990s, for many of us, no other mail client would have been practical, because none offers the power and convenience of mm for speedy processing of e-mail.

The version of mm available here is newer than the Columbia code, and after running for 15+ years on various Sun SunOS and Solaris releases at the University of Utah, has now been updated to build and run on these current operating systems and CPU architectures:

It is expected that the code should build and run without problems on other CPU architectures and O/S versions for the BSD family, and other distributions and O/S versions for GNU/Linux, and very likely, newer releases of the commercial Unix systems listed above. Together, the listed systems account for almost all desktop Unix computing world wide. The one unfortunate omission is IBM AIX on PowerPC and System/390: I no longer have access to either of those platforms.


Limitations of the ccmd command parser

Although the PDP-10 had byte instructions to handle any byte size from 1 to 36, during the long life of that system (1967--1990, and the machine still runs in 2005 with at least two excellent software implementations of the PDP-10 hardware), text files were conventionally stored as 7-bit bytes in the standard ASCII character set. Regrettably, when the C translation of the TOPS-20 command parser was done at Columbia, an important opportunity to expand support to 8-bit bytes was lost by a programming decision to use the high-order bit of 8-bit character data as a flag bit in internal data structures. At Utah, I spent several weeks of work in 1998 in an attempt to remove this unfortunate limitation, but ultimately, I gave up in defeat. What is badly needed is a complete ground-up redesign of the ccmd parser package, for support not only of 8-bit bytes, but Unicode in UTF-8 encoding, and possibly also UTF-16 and UTF-32.

I do not have time to undertake this project myself. It is not a small task: the current version has just over 21K lines of C code, curiously, just a bit bigger than each of Donald Knuth's TeX typesetting system and METAFONT font design system. Nevertheless, because of the great end-user convenience of TOPS-20-style command parsing with completion, prompting, and in-line help, it would be very useful for someone to do this redesign. Both mm and kermit use this parser, and a well-done, properly documented, and highly-portable redesign could make it widely available to other software on GNU, Unix (including Mac OS X), and Microsoft Windows systems, and encourage its use as the defacto standard way for interactive programs to communicate with users. [Many of us believe that GUIs are not the universal solution to the human/computer interface, even though they are useful for some tasks.] In other words, you just might get famous if you can succeed at this task.

Further comments on the code

When the translation to C was carried out by the Columbia teams, at academic sites at least, Unix primarily meant BSD 4.x and its derivative, Sun's SunOS. Other commercial versions of Unix existed, such as Hewlett-Packard's HP-UX, but it was only later that such systems became more widely known.

The first IEEE Portable Operating Systems Interface specification appeared in 1986, but POSIX, as it is now called, did not become widely supported by vendors until the 1990s. For example, Sun's man 5 standards manual page reports this list:

POSIX Standard Description Release
POSIX.1-1988 system interfaces and headers SunOS 4.1
POSIX.1-1990 POSIX.1-1988 update Solaris 2.0
POSIX.1b-1993 realtime extensions Solaris 2.4
POSIX.1c-1996 threads extensions Solaris 2.6
POSIX.2-1992 shell and utilities Solaris 2.5
POSIX.2a-1992 interactive shell and utilities Solaris 2.5
POSIX.1-2001 POSIX.1-1990, POSIX.1b-1993, Solaris 10
POSIX.1c-1996, POSIX.2-1992, and
POSIX.2a-1992 updates

Vendor compiler support of language and operating-system interface standards usually lags by five to ten years. There have since been POSIX updates as IEEE Std 1003.1-2001 and IEEE Standard 1003.1-2004, but neither are yet supported.

The traditional way of dealing with differences in the operating-system interface in C programs was to use preprocessor conditionals, like this:

    #if defined(SYSTEM_X)
    ...code for system x...
    #elif defined(SYSTEM_Y)
    ...code for system y...
    #elif defined(SYSTEM_Z)
    ...code for system z...
    ...code for everywhere else..

Unfortunately, that is messy, requires a lot of maintenance as operating systems evolve and the code is ported to new platforms, and is likely to be invalidated when the O/S changes radically, as has happened with most major O/S releases, whether commercial or free.

A better approach that is now widely used in the GNU Project and many other software packages, including the thousands of free and open-source distributions hosted by SourceForge.net, is to conditionalize code based on the needed feature, rather than on the platform, to write C code to be POSIX-conformant by default, and to restrict the code to that large subset of C that is also valid C++. These practices significantly expand the range of compilers that can be used to test the code, and the platforms that the code will run on: almost every commercial O/S, even the non-Unix ones, offers some degree of POSIX conformance.

The GNU autoconf systems makes it relatively easy to describe the required features in a configure.in file and a companion Makefile.in file, and then have autoconf generate a highly-portable configure shell script for feature tests, and have autoheader create a config.hin file that provides a template of all of the features. The code still contains conditionals, which now look like this:

    #if defined(HAVE_FEATURE_X)
    ...feature x code...
    #elif defined(HAVE_FEATURE_Y)
    ...feature y code...
    #elif defined(HAVE_FEATURE_Z)
    ...feature z code...
    ...code for everywhere else..

However, because of the POSIX-conformant coding, many fewer are needed.

The human installer then runs the configure script, and its output is customized Makefile and config.h files that can be used to build, validate, and install the package almost anywhere by a single standard easily-memorized recipe:

    ./configure && make all check install

It would be very helpful if someone would take on the task of updating ccmd and mm for autoconfiguration. This is not a small task: at version 0.94, there are 1045 conditionals in the ccmd package, and 1697 in the mm package. Nevertheless, the job is still much smaller than the suggested redesign of ccmd.

If you are interested in tackling either of these projects, please inform Nelson H. F. Beebe of your intent. He volunteers to coordinate these activities to avoid duplication of effort.

System dependencies

The mm code is careful to extract site-dependent settings into easily-customizable files matching the filename pattern s-*.h. Although these files are named according to the base operating system and sometimes, O/S version, such as s-freebsd.h and s-osf1-5.h, it is not necessarily true that the settings there will hold at another site.

Each of these system header files contains the settings of around 40 to 50 parameters. Many of them are common to all instances of that platform, but a few are not. Notable among them are the SENDMAIL, SPOOL_DIRECTORY, HAVE_F_SETLK, HAVE_FLOCK, and HAVE_LOCKF parameters.

The SENDMAIL setting is the absolute path to the local Message Transfer Agent (MTA), often sendmail or postfix. The distribution setting of /usr/sbin/sendmail is the one currently recommended by the sendmail developers, but the historical path has varied.

The SPOOL_DIRECTORY setting is the absolute path to the local directory where incoming mail arrives. Most current systems use /var/mail, but again, historical practice varies.

The HAVE_F_SETLK, HAVE_FLOCK, and HAVE_LOCKF settings are the thorniest to deal with. A Mail User Agent (MUA) program like mm may move incoming mail from the SPOOL_DIRECTORY to a user-specified file, or it may use the incoming mail file as the mailbox. In either case, one has the problem that both the MTA and the MUA may simultaneously access the incoming mail file. Similarly, if two instances of the MUA are active (such as from home and office connections to a central computing facility), simultaneous access is possible. In either case, such access can lead to loss of incoming mail, or mailbox corruption.

The traditional approach to this problem is to use locks. These could be Unix filesystem range locks, or file locks, or the lock could be indicated by the presence of a suitably named file, such as username.lock. The problem with the first two is that they do not work reliably across a multivendor multiplatform NFS environment. The problem with the third is that all MTA and MUA programs must agree on a lock-naming convention. In all three, if the process that created the lock fails to remove the lock when it exits, a stale lock is left that can prevent subsequent operation. In the event of a kernel-based lock, the only way to remove it may be to copy the mail file to a temporary file, delete the original, and then rename the copy.

The sad result is that none of these approaches are reliable. When stale locks are left, few users know how to resolve the problem themselves, and may require intervention by systems staff before they can read mail again. With 15,000 users, this quickly becomes a large problem. At Utah, we have therefore run mm for about 18 years with all locking disabled. We caution our users to avoid running more than one MUA, and on those few occasions when they do and get mailbox corruption, we repair the problem manually. Dealing with stale locks would, in our view, be a noticeably-larger headache.