next up previous contents
Next: Bibliography Up: Developing an Open Source Previous: Licensing   Contents

Subsections

``MakeMan'' in Perl

Project Outline

Before we can start with anything else, we do need of course a project outline, an idea of how we want to progress. The above is not a bad start, but the project requires refinement. What exactly will ``MakeMan'' do and what should be the goal of the first release?

Obviously, the task at hand will involve a lot of text manipulation, so that it would seem reasonable to implement a first solution in Perl.

The first release shall be a program that takes an XML file as input and converts it into a valid man page, ie generate roff sources. As all good UNIX programs, the Perl implementation of ``MakeMan'' will operate upon stdin and stdout if the filenames are not specified through command-line options. Aside from some standard options such as -help and -version, this should suffice for a simple application such as this.

As mentioned above, SGML can be validated through the use of nsgmls, so we will make this tool a requirement for our program, as well as expect input from stdin to be the output of nsgmls. Should the user specify an input file, we can simply call nsgmls, so that these files may be .sgml files.

Let us now write down the specifications for the first release of the Perl implementation of ``MakeMan''. While we're at it, why not use the SGML template from above to:

  1. show a practical example
  2. show what the input and the output of ``makeman.pl'' should look like
  3. define what ``makeman.pl'' actually does

Listing [*] shows the SGML file3 describing the man page for ``makeman.pl'', which ultimately we will include in the first release.

``MakeMan'' man page in SGML



<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook V4.1//EN">

<!--
	``MakeMan'' SGML Example Representation of a man page
	For more Information, please see http://mama.sourceforge.net

	$Author: Jan Schaumann <jschauma@netmeister.org> $
	$Id: makeman.pl.sgml, v 0.2 2001/08/24 15:11:02 jschauma Exp $
-->

<refentry id="makeman.pl">
	<refmeta>
		<refentrytitle>makeman.pl</refentrytitle>
		<manvolnum>1</manvolnum>
		<refmiscinfo class="date">August 24th, 2001</refmiscinfo>
		<refmiscinfo class="source">MakeMan</refmiscinfo>
		<refmiscinfo class="title">Writing Man Pages</refmiscinfo>
	</refmeta>

	<refnamediv>
		<refname>makeman.pl</refname>
		<refpurpose>parse an SGML man page into valid roff source</refpurpose>
	</refnamediv>

	<refsynopsisdiv>
		<cmdsynopsis>
			<command>makeman.pl</command>
				<arg choice="opt">-h</arg>
				<arg choice="opt">
					-i <arg choice="req">FILE</arg>
				</arg>
				<arg choice="opt">
					-o <arg choice="req">FILE</arg>
				</arg>
				<arg choice="opt">-v</arg>
		</cmdsynopsis>
	</refsynopsisdiv>

	<refsect1>
		<title>DESCRIPTION</title>
		<para>
			<command>makeman.pl</command> is part of
			<emphasis>MakeMan</emphasis>, a project to provide several 
			frontends, GUI and non-GUI, to an XML interface to write
			man pages.
        </para>
        <para>
            <command>makeman.pl</command>, written in Perl, parses an
			XML input file and generates valid roff source that can be read
            by <emphasis>man</emphasis>.
		</para>
	</refsect1>
	
	<refsect1>
		<title>USAGE</title>
		<para>
			Per default, <command>makeman.pl</command> reads the output of
			<command>nsgmls</command> from <emphasis>stdin</emphasis> and
			write roff sources to <emphasis>stdout</emphasis>. Alternatively,
			the input- and output-files can be specified as command-line
			options, in which case the input file may be the SGML source.
		</para>
	</refsect1>
	
	<refsect1>
		<title>OPTIONS</title>
		<para>
			A summary of the options supported by
			<command>makeman.pl</command> is included below.  
		</para>
		<variablelist>
			<varlistentry>
				<term>-h</term>
				<listitem>
					<para>Show summary of options and exit.</para>
				</listitem>
			</varlistentry>
			<varlistentry>
				<term>-i <emphasis>FILE</emphasis></term>
				<listitem>
					<para>Read SGML from <emphasis>FILE</emphasis>.</para>
				</listitem>
			</varlistentry>
			<varlistentry>
				<term>-o <emphasis>FILE</emphasis></term>
				<listitem>
					<para>Write output to <emphasis>FILE</emphasis>.</para>
				</listitem>
			</varlistentry>
			<varlistentry>
				<term>-v</term>
				<listitem>
					<para>Show version information and exit.</para>
				</listitem>
			</varlistentry>
		</variablelist>
	</refsect1>

	<refsect1>
		<title>EXAMPLES</title>
		<variablelist>
			<varlistentry>
				<term>In a pipe:</term>
				<listitem>
					<para>
						<command>nsgmls</command>
						<emphasis>infile.sgml</emphasis> | 
						<command>makeman.pl</command>
					</para>
				</listitem>
			</varlistentry>
			<varlistentry>
				<term>Alone:</term>
				<listitem>
					<para>
						<command>makeman.pl -i</command>
						<emphasis>infile.sgml</emphasis>
					</para>
				</listitem>
			</varlistentry>
		</variablelist>
	</refsect1>
	
	<refsect1>
		<title>REQUIRES</title>
		<para>
			Perl, SGMLSp, expat, nsgmls (part of SP)
		</para>
	</refsect1>

	<refsect1>
		<title>VERSION</title>
		<para>
			0.1
		</para>
	</refsect1>

	<refsect1>
		<title>BUGS</title>
		<para>
			None so far, not yet written.
		</para>
	</refsect1>

	<refsect1>
		<title>SEE ALSO</title>
		<para>
			<command>man(7)</command>
		</para>
		<para>
			<ulink url="http://mama.sourceforge.net">http://mama.sourceforge.net</ulink>
		</para>
	</refsect1>

	<refsect1>
		<title>AUTHOR</title>
		<para>
			Jan Schaumann <email>jschauma@netmeister.org</email>
		</para>
	</refsect1>
</refentry>

Let us now make sure that what we wrote down there is in fact a valid representation of a man page in SGML:


www:~/xml> nsgmls -s makeman.pl.sgml

Ok, everything looks peachy. Once we have run this input through our program ``makeman.pl'', we would like the output to look as in Listing makeman.pl.1

``MakeMan'' man page



.\"
.\" This page was created on 2001-08-24 15:22:34 by makeman.pl
.\" ``makeman.pl'' is part of the ``MakeMan'' project.
.\" For more information, please see http://mama.sourceforge.net
.\"
.TH makeman.pl 1 "August 24th, 2001" "MakeMan" "Writing Man Pages" 

.SH NAME
makeman.pl \- parse an SGML man page into valid roff source

.SH SYNOPSIS
\fBmakeman.pl\fR
[ \fI\-h\fR ] [ \fI
\-i \fR\fIFILE\fR ] [ \fI
\-o \fR\fIFILE\fR ] [ \fI\-v\fR ] 

.SH "DESCRIPTION"

.PP
\fBmakeman.pl\fR is part of
\fIMakeMan\fR, a project to provide several 
frontends, GUI and non\-GUI, to an XML interface to write
man pages.


.PP
\fBmakeman.pl\fR, written in Perl, parses an
XML input file and generates valid roff source that can be read
by \fIman\fR.

.SH "USAGE"

.PP
Per default, \fBmakeman.pl\fR reads the output of
\fBnsgmls\fR from \fIstdin\fR and
write roff sources to \fIstdout\fR. Alternatively,
the input\- and output\-files can be specified as command\-line
options, in which case the input file may be the SGML source.

.SH "OPTIONS"

.PP
A summary of the options supported by
\fBmakeman.pl\fR is included below.
.\" Begin List
.TP
\fB\-h\fR 
Show summary of options and exit.
.TP
\fB\-i \fR \fIFILE\fR
Read SGML from \fIFILE\fR.
.TP
\fB\-o \fR \fIFILE\fR
Write output to \fIFILE\fR.
.TP
\fB\-v\fR 
Show version information and exit.
.\" End List

.SH "EXAMPLES"
.\" Begin List
.TP
In a pipe:

\fBnsgmls\fR
\fIinfile.sgml\fR | 
\fBmakeman.pl\fR

.TP
Alone:

\fBmakeman.pl \-i\fR
\fIinfile.sgml\fR

.\" End List

.SH "REQUIRES"

.PP
Perl, SGMLSp, expat, nsgmls (part of SP)

.SH "VERSION"

.PP
0.1

.SH "BUGS"

.PP
None so far, not yet written.

.SH "SEE ALSO"

.PP
\fBman(7)\fR

.PP
http://mama.sourceforge.net (Link to \fIhttp://mama.sourceforge.net\fR)


.SH "AUTHOR"

.PP
Jan Schaumann <jschauma@netmeister.org>

Coding Style Revisited: Perl Issues

When writing Perl, please make it habit to start each and every program with the following two lines:


/usr/bin/perl -w
use strict;

In addition, I would like to shamelessly copy, uhm, ``cite'' an excerpt from http://www.perl.com/pub/a/2000/01/CodingStandards.html:

  1. The verbosity of all names should be proportional to the scope of their use

  2. The plurality of a variable name should reflect the plurality of the data it contains. In Perl, $name is a single name, while @names is an array of names

  3. In general, follow the language's conventions in variable naming and other things. If the language uses variable_names_like_this, you should too. If it uses ThisKindOfName, follow that.

  4. Failing that, use UPPER_CASE for globals, StudlyCaps for classes, and lower_case for most other things. Note the distinction between words by using either underscores or StudlyCaps.

  5. Function or subroutine names should be verbs or verb clauses. It is unnecessary to start a function name with do_.

  6. Filenames should contain underscores between words, except where they are executables in $PATH. Filenames should be all lower case, except for class files which maybe in StudlyCaps if the language's common usage dictates it.

Furthermore, give perldoc perlstyle a careful read.

makeman.pl

Ok, now that we got all these formalities out of the way, let us write some code already! The first thing I usually start out with - well, the first thing after outlining the project and its specifications - is a skeleton that parses the command-line arguments and initializes a few global variables (if any). In this case, the simple function init as shown in Listing [*], will perform this task.

Parsing command-line options



sub init
{
	my %Options;
	my $ok = getopts('hi:o:v', \%Options);
	if (!$ok)
	{
		my $i;
		my @values = keys(%Options);
		foreach $i (@values)
		{
			if (!$Options{$i})
			{
				print "Option '$i' requires an argument.\n";
				exit(1);
			}
		}
	}

	if ($Options{'h'})
	{
		usage();
		exit 0;
	}
	if ($Options{'v'})
	{
		print "$NAME Version $VERSION\n";
		exit 0;
	}
	if ($Options{'i'})
	{
		$INSTREAM = $Options{'i'};
	}
	if ($Options{'o'})
	{
		$OUTSTREAM = $Options{'o'};
	}
}


We know that we will deal with XML, so let's take a short trip over to CPAN[26] and investigate if some helpful modules might already be available. XMLParser sounds promising: ``is an interface to James Clark's XML parser, expat''. After installing expat4, the usual routine installs the module:


tar zcvf XML-Parser-2.29.tar.gz
cd XML-Parser-2.29
perl Makefile.pl
make
make test
su -c ''make install''

Further research via http://www.google.com reveals another interesting URL: http://www.perlxml.com/faq/perl-xml-faq.html, which suggests the SGMLSpm module, a ``class library for parsing the output from James Clark's SGMLS and PSGMLS parsers.'' This is pretty much exactly what we need, so this gets installed right away as well.

After reading through perldoc XML::Parser and the documentation for SGMLSpm we realize that the latter will be fully sufficient for our project.

As mentioned above, if no .sgml file was specified on the command-line, we expect the input to be the output of nsgmls; otherwise, we call nsgmls ourselves on the given input file. The output of nsgmls is used to create a new object of type ``SGMLS'', which then can be analyzed by simply walking down the tree of elements. Depending on what kind of element we are dealing with, we call the appropriate subroutines. The function doing this work can be seen in Listing doParse:

Function ``doParse''



.
.
.
		while ($event = $parse->next_event)
		{
				my $foo;
				if ($event->type eq 'start_element')
				{
						if ($event->data->name eq 'REFENTRY')
						{
								$start = 1;
								printHeaderComments();
						}

						if (!$start)
						{
								return 0;
						}

						if ($event->data->name eq 'REFMETA')
						{
								$event = parseMeta($parse, $event);
						}
						elsif ($event->data->name eq 'REFNAMEDIV')
						{
								$event = parseNameDiv($parse, $event);
						}
						elsif ($event->data->name eq 'REFSYNOPSISDIV')
						{
								$event = parseSynopsis($parse, $event);
						}
						elsif ($event->data->name eq 'REFSECT1')
						{
								print WRITE "\n.SH ";
								$event = parseSection($parse, $parse->next_event);
						}
				}

				if ($event->type eq 'conforming') 
				{
						$valid = 1;
				}
		}

.
.
.

Whenever we encounter a new section, be it a main section (``Refsect1'') or a subsection (``Refsect2'', ``Refsect3'' etc.), we call the function ``parseSection'', which prints out the relevant information according to the tags encountered. This function needs to check a lot of conditions in a tedious way - if this reminds you of a compiler-class you took, don't be surprised. Excerpts of the function are shown in Listing parseSection:

Function ``parseSection''



.
.
.
				if ($type eq 'start_element')
				{
						if ($data->name eq 'TITLE')
						{
								printf WRITE "\"";
						}
						elsif ($data->name eq 'PARA')
						{
								print WRITE "\n.PP";
						}
						elsif ($data->name eq 'COMMAND')
						{
								printf WRITE "\\fB";
						}
.
.
.
						elsif ($data->name eq 'REFSECT2')
						{
								print WRITE "\n.SS ";
								$event=parseSection($parse,$parse->next_event);
						}
				}
.
.
.
				elsif ($type eq 'end_element')
				{
						if ( ($data->name eq 'REFSECT1') ||
										($data->name eq 'REFSECT2') ||
										($data->name eq 'REFSECT3') )
						{
								return $event;
						}
						elsif ($data->name eq 'TITLE')
						{
								print WRITE "\"\n";
						}
.
.
.

After testing our first implementation of ``MakeMan'' in Perl on a few SGML files, we convert the file makeman.pl.sgml (Listing makeman.pl.sgml) by issuing the following command:

./makeman.pl -i ../../xml/makeman.pl.sgml \
      -o ../doc/makeman.pl.1

Voilá - not so bad, I'd say. Eager to release this first version of ``makeman.pl'', we can now start to create a package, write some accompanying documentation and then announce the software on the various websites so as to get some people to use it and find the bugs we overlooked. While some people believe that one should not release a program before version 1.0, I'm convinced it will help us keep our enthusiasm if we get feedback - no matter what kind - from other people. Finding and fixing bugs can only be done if the software is used, and very often it takes a fresh pair of eyes to realize the obvious that we might have overlooked. ``Release early, release often!''

Packaging

In order to release the software, we will need to prepare a proper package; that is we need to write some documentation that will accompany the product, write installation instructions, copyright notices etc. We already have the man page available, so that we can now write the rest of the documentation.

I usually create a few HTML pages that cover the installation of the software and place that information into a plain ascii file as well, usually named ``README'' or ''INSTALL''. In order to get all these files installed properly, I usually provide a ``Makefile'' with a single target ``install''. This is certainly not necessary for a small package such as this, but might prove convenient in the future.

With all this information, we can now create a directory structure as follows:

You will notice that we do provide all the files - AUTHORS, README etc - that one is used to see when downloading an Open Source package. Obeying this practice even for small packages makes it easier to maintain: for future releases, we will just need to modify the appropriate files.

Rolling a tarball is the first thing we do - after all, we want to distribute the package in it's most platform independent way. One gzip'ped and one bzip2'd tarball coming right up! Once we have created these, we can proceed to build Debian packages and RPM packages. While both these formats have their own quirks when building them there is excellent documentation available on the web. To build a .deb package, I usually follow the instructions given on [27]; for .rpms, the ``RPM-HOWTO'' ([28]) comes in handy.

Releasing the package

Now that the packages have been built, we can announce the availability of our software on the web. First of all, we want to upload the files to our account at SourceForge. Just following the familiar procedure (uploading the files to upload.sourceforge.net and ``Quick-releasing'' from the Admin part of the website) we can release the packages easily.

Next we wish to announce the package on other free software sites such as Freshmeat (http://www.freshmeat.net). The procedure is pretty much the same for all the sites: one always has to specify the download location and a short descriptive blurb as well as a contact address.

Usually, I announce new software on the following sites:

Depending on the site, it may take between a few hours and a few days for the packages to be announced.

Continued work on ``makeman.pl''

After releasing the first package to the world, I have gotten some feedback from other developers; some feedback from other users and -- fortunately -- some bug reports. The continuing cycle of bug-fixes and releases of new versions has started. To document each and every one change of ``makeman.pl'' would be too tedious and certainly would be out of the scope of this document. However, as all changes are documented in the CHANGES file, the user will be able to easily follow the development of the software.

By the time of this writing, ``makeman.pl'' seems to work reasonably well. Quite a few man-pages (particularly for my other project ``The Missing Man Pages Project) have been converted from SGML to roff-sources, and other people have shown some interest in the project.

My original goal of providing a better SGML-to-roff converter for man pages than docbook-to-man seems to have been reached, if only partially. Needless to say, development on this tool will continue as it is being used more and more.

New releases will be announced on websites supporting Free Software and on http://mama.sourceforge.net, of course. However, with the initial tool having reached a functional stage in the development cycle, we can now focus on providing more user-friendly front-ends that will eventually use the SGML format as their basis.

If you have any comments whatsoever related to ``makeman.pl'' (or the entire project, of course), please don't hesitate to contact me at jschauma@netmeister.org. I will be more than happy to hear about bugs, so as to improve the software. Patches and other helpful suggestions will cause immense joy, as well.


next up previous contents
Next: Bibliography Up: Developing an Open Source Previous: Licensing   Contents
Jan Schaumann 2001-08-24