Obviously, the task at hand will involve a lot of text manipulation, so that it would seem reasonable to implement a first solution in Perl.
The first release shall be a program that takes an XML file as input and converts it into a valid man page, ie generate roff sources. As all good UNIX programs, the Perl implementation of ``MakeMan'' will operate upon stdin and stdout if the filenames are not specified through command-line options. Aside from some standard options such as -help and -version, this should suffice for a simple application such as this.
As mentioned above, SGML can be validated through the use of nsgmls, so
we will make this tool a requirement for our program, as well as expect input
from stdin to be the output of nsgmls. Should the user specify an
input file, we can simply call nsgmls, so that these files may be
.sgml
files.
Let us now write down the specifications for the first release of the Perl implementation of ``MakeMan''. While we're at it, why not use the SGML template from above to:
Listing shows the SGML file3 describing the man
page for ``makeman.pl'', which ultimately we will include in the first
release.
<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook V4.1//EN"> <!-- ``MakeMan'' SGML Example Representation of a man page For more Information, please see http://mama.sourceforge.net $Author: Jan Schaumann <jschauma@netmeister.org> $ $Id: makeman.pl.sgml, v 0.2 2001/08/24 15:11:02 jschauma Exp $ --> <refentry id="makeman.pl"> <refmeta> <refentrytitle>makeman.pl</refentrytitle> <manvolnum>1</manvolnum> <refmiscinfo class="date">August 24th, 2001</refmiscinfo> <refmiscinfo class="source">MakeMan</refmiscinfo> <refmiscinfo class="title">Writing Man Pages</refmiscinfo> </refmeta> <refnamediv> <refname>makeman.pl</refname> <refpurpose>parse an SGML man page into valid roff source</refpurpose> </refnamediv> <refsynopsisdiv> <cmdsynopsis> <command>makeman.pl</command> <arg choice="opt">-h</arg> <arg choice="opt"> -i <arg choice="req">FILE</arg> </arg> <arg choice="opt"> -o <arg choice="req">FILE</arg> </arg> <arg choice="opt">-v</arg> </cmdsynopsis> </refsynopsisdiv> <refsect1> <title>DESCRIPTION</title> <para> <command>makeman.pl</command> is part of <emphasis>MakeMan</emphasis>, a project to provide several frontends, GUI and non-GUI, to an XML interface to write man pages. </para> <para> <command>makeman.pl</command>, written in Perl, parses an XML input file and generates valid roff source that can be read by <emphasis>man</emphasis>. </para> </refsect1> <refsect1> <title>USAGE</title> <para> Per default, <command>makeman.pl</command> reads the output of <command>nsgmls</command> from <emphasis>stdin</emphasis> and write roff sources to <emphasis>stdout</emphasis>. Alternatively, the input- and output-files can be specified as command-line options, in which case the input file may be the SGML source. </para> </refsect1> <refsect1> <title>OPTIONS</title> <para> A summary of the options supported by <command>makeman.pl</command> is included below. </para> <variablelist> <varlistentry> <term>-h</term> <listitem> <para>Show summary of options and exit.</para> </listitem> </varlistentry> <varlistentry> <term>-i <emphasis>FILE</emphasis></term> <listitem> <para>Read SGML from <emphasis>FILE</emphasis>.</para> </listitem> </varlistentry> <varlistentry> <term>-o <emphasis>FILE</emphasis></term> <listitem> <para>Write output to <emphasis>FILE</emphasis>.</para> </listitem> </varlistentry> <varlistentry> <term>-v</term> <listitem> <para>Show version information and exit.</para> </listitem> </varlistentry> </variablelist> </refsect1> <refsect1> <title>EXAMPLES</title> <variablelist> <varlistentry> <term>In a pipe:</term> <listitem> <para> <command>nsgmls</command> <emphasis>infile.sgml</emphasis> | <command>makeman.pl</command> </para> </listitem> </varlistentry> <varlistentry> <term>Alone:</term> <listitem> <para> <command>makeman.pl -i</command> <emphasis>infile.sgml</emphasis> </para> </listitem> </varlistentry> </variablelist> </refsect1> <refsect1> <title>REQUIRES</title> <para> Perl, SGMLSp, expat, nsgmls (part of SP) </para> </refsect1> <refsect1> <title>VERSION</title> <para> 0.1 </para> </refsect1> <refsect1> <title>BUGS</title> <para> None so far, not yet written. </para> </refsect1> <refsect1> <title>SEE ALSO</title> <para> <command>man(7)</command> </para> <para> <ulink url="http://mama.sourceforge.net">http://mama.sourceforge.net</ulink> </para> </refsect1> <refsect1> <title>AUTHOR</title> <para> Jan Schaumann <email>jschauma@netmeister.org</email> </para> </refsect1> </refentry>
Let us now make sure that what we wrote down there is in fact a valid representation of a man page in SGML:
www:~/xml> nsgmls -s makeman.pl.sgml
Ok, everything looks peachy. Once we have run this input through our program
``makeman.pl'', we would like the output to look as in Listing makeman.pl.1
``MakeMan'' man page
.\" .\" This page was created on 2001-08-24 15:22:34 by makeman.pl .\" ``makeman.pl'' is part of the ``MakeMan'' project. .\" For more information, please see http://mama.sourceforge.net .\" .TH makeman.pl 1 "August 24th, 2001" "MakeMan" "Writing Man Pages" .SH NAME makeman.pl \- parse an SGML man page into valid roff source .SH SYNOPSIS \fBmakeman.pl\fR [ \fI\-h\fR ] [ \fI \-i \fR\fIFILE\fR ] [ \fI \-o \fR\fIFILE\fR ] [ \fI\-v\fR ] .SH "DESCRIPTION" .PP \fBmakeman.pl\fR is part of \fIMakeMan\fR, a project to provide several frontends, GUI and non\-GUI, to an XML interface to write man pages. .PP \fBmakeman.pl\fR, written in Perl, parses an XML input file and generates valid roff source that can be read by \fIman\fR. .SH "USAGE" .PP Per default, \fBmakeman.pl\fR reads the output of \fBnsgmls\fR from \fIstdin\fR and write roff sources to \fIstdout\fR. Alternatively, the input\- and output\-files can be specified as command\-line options, in which case the input file may be the SGML source. .SH "OPTIONS" .PP A summary of the options supported by \fBmakeman.pl\fR is included below. .\" Begin List .TP \fB\-h\fR Show summary of options and exit. .TP \fB\-i \fR \fIFILE\fR Read SGML from \fIFILE\fR. .TP \fB\-o \fR \fIFILE\fR Write output to \fIFILE\fR. .TP \fB\-v\fR Show version information and exit. .\" End List .SH "EXAMPLES" .\" Begin List .TP In a pipe: \fBnsgmls\fR \fIinfile.sgml\fR | \fBmakeman.pl\fR .TP Alone: \fBmakeman.pl \-i\fR \fIinfile.sgml\fR .\" End List .SH "REQUIRES" .PP Perl, SGMLSp, expat, nsgmls (part of SP) .SH "VERSION" .PP 0.1 .SH "BUGS" .PP None so far, not yet written. .SH "SEE ALSO" .PP \fBman(7)\fR .PP http://mama.sourceforge.net (Link to \fIhttp://mama.sourceforge.net\fR) .SH "AUTHOR" .PP Jan Schaumann <jschauma@netmeister.org>
/usr/bin/perl -w use strict;
In addition, I would like to shamelessly copy, uhm, ``cite'' an excerpt from http://www.perl.com/pub/a/2000/01/CodingStandards.html:
Furthermore, give perldoc perlstyle
a careful read.
Ok, now that we got all these formalities out of the way, let us write some
code already! The first thing I usually start out with - well, the first
thing after outlining the project and its specifications - is a
skeleton that parses the command-line arguments and initializes a few global
variables (if any). In this case, the simple function init
as shown in
Listing , will perform this task.
sub init { my %Options; my $ok = getopts('hi:o:v', \%Options); if (!$ok) { my $i; my @values = keys(%Options); foreach $i (@values) { if (!$Options{$i}) { print "Option '$i' requires an argument.\n"; exit(1); } } } if ($Options{'h'}) { usage(); exit 0; } if ($Options{'v'}) { print "$NAME Version $VERSION\n"; exit 0; } if ($Options{'i'}) { $INSTREAM = $Options{'i'}; } if ($Options{'o'}) { $OUTSTREAM = $Options{'o'}; } }
We know that we will deal with XML, so let's take a short trip over to CPAN[26] and investigate if some helpful modules might already be available. XMLParser sounds promising: ``is an interface to James Clark's XML parser, expat''. After installing expat4, the usual routine installs the module:
tar zcvf XML-Parser-2.29.tar.gz cd XML-Parser-2.29 perl Makefile.pl make make test su -c ''make install''
Further research via http://www.google.com reveals another interesting URL: http://www.perlxml.com/faq/perl-xml-faq.html, which suggests the SGMLSpm module, a ``class library for parsing the output from James Clark's SGMLS and PSGMLS parsers.'' This is pretty much exactly what we need, so this gets installed right away as well.
After reading through perldoc XML::Parser
and the documentation for
SGMLSpm we realize that the latter will be fully sufficient for our
project.
As mentioned above, if no .sgml
file was specified on the command-line,
we expect the input to be the output of nsgmls; otherwise, we call nsgmls ourselves on the given input file. The output of nsgmls is used
to create a new object of type ``SGMLS'', which then can be analyzed by simply
walking down the tree of elements. Depending on what kind of element we are
dealing with, we call the appropriate subroutines. The function doing this
work can be seen in Listing doParse:
Function ``doParse''
. . . while ($event = $parse->next_event) { my $foo; if ($event->type eq 'start_element') { if ($event->data->name eq 'REFENTRY') { $start = 1; printHeaderComments(); } if (!$start) { return 0; } if ($event->data->name eq 'REFMETA') { $event = parseMeta($parse, $event); } elsif ($event->data->name eq 'REFNAMEDIV') { $event = parseNameDiv($parse, $event); } elsif ($event->data->name eq 'REFSYNOPSISDIV') { $event = parseSynopsis($parse, $event); } elsif ($event->data->name eq 'REFSECT1') { print WRITE "\n.SH "; $event = parseSection($parse, $parse->next_event); } } if ($event->type eq 'conforming') { $valid = 1; } } . . .
Whenever we encounter a new section, be it a main section (``Refsect1'') or a subsection (``Refsect2'', ``Refsect3'' etc.), we call the function ``parseSection'', which prints out the relevant information according to the tags encountered. This function needs to check a lot of conditions in a tedious way - if this reminds you of a compiler-class you took, don't be surprised. Excerpts of the function are shown in Listing parseSection:
Function ``parseSection''
. . . if ($type eq 'start_element') { if ($data->name eq 'TITLE') { printf WRITE "\""; } elsif ($data->name eq 'PARA') { print WRITE "\n.PP"; } elsif ($data->name eq 'COMMAND') { printf WRITE "\\fB"; } . . . elsif ($data->name eq 'REFSECT2') { print WRITE "\n.SS "; $event=parseSection($parse,$parse->next_event); } } . . . elsif ($type eq 'end_element') { if ( ($data->name eq 'REFSECT1') || ($data->name eq 'REFSECT2') || ($data->name eq 'REFSECT3') ) { return $event; } elsif ($data->name eq 'TITLE') { print WRITE "\"\n"; } . . .
After testing our first implementation of ``MakeMan'' in Perl on a few SGML
files, we convert the file makeman.pl.sgml
(Listing makeman.pl.sgml)
by issuing the following command:
./makeman.pl -i ../../xml/makeman.pl.sgml \ -o ../doc/makeman.pl.1
Voilá - not so bad, I'd say. Eager to release this first version of ``makeman.pl'', we can now start to create a package, write some accompanying documentation and then announce the software on the various websites so as to get some people to use it and find the bugs we overlooked. While some people believe that one should not release a program before version 1.0, I'm convinced it will help us keep our enthusiasm if we get feedback - no matter what kind - from other people. Finding and fixing bugs can only be done if the software is used, and very often it takes a fresh pair of eyes to realize the obvious that we might have overlooked. ``Release early, release often!''
I usually create a few HTML pages that cover the installation of the software and place that information into a plain ascii file as well, usually named ``README'' or ''INSTALL''. In order to get all these files installed properly, I usually provide a ``Makefile'' with a single target ``install''. This is certainly not necessary for a small package such as this, but might prove convenient in the future.
With all this information, we can now create a directory structure as follows:
You will notice that we do provide all the files - AUTHORS, README etc - that one is used to see when downloading an Open Source package. Obeying this practice even for small packages makes it easier to maintain: for future releases, we will just need to modify the appropriate files.
Rolling a tarball is the first thing we do - after all, we want to distribute
the package in it's most platform independent way. One gzip
'ped and
one bzip2
'd tarball coming right up! Once we have created these, we
can proceed to build Debian packages and RPM packages. While both these
formats have their own quirks when building them there is excellent
documentation available on the web. To build a .deb
package, I usually
follow the instructions given on [27]; for .rpm
s, the
``RPM-HOWTO'' ([28]) comes in handy.
Now that the packages have been built, we can announce the availability of our software on the web. First of all, we want to upload the files to our account at SourceForge. Just following the familiar procedure (uploading the files to upload.sourceforge.net and ``Quick-releasing'' from the Admin part of the website) we can release the packages easily.
Next we wish to announce the package on other free software sites such as Freshmeat (http://www.freshmeat.net). The procedure is pretty much the same for all the sites: one always has to specify the download location and a short descriptive blurb as well as a contact address.
Usually, I announce new software on the following sites:
Depending on the site, it may take between a few hours and a few days for the packages to be announced.
After releasing the first package to the world, I have gotten some feedback from other developers; some feedback from other users and -- fortunately -- some bug reports. The continuing cycle of bug-fixes and releases of new versions has started. To document each and every one change of ``makeman.pl'' would be too tedious and certainly would be out of the scope of this document. However, as all changes are documented in the CHANGES file, the user will be able to easily follow the development of the software.
By the time of this writing, ``makeman.pl'' seems to work reasonably well. Quite a few man-pages (particularly for my other project ``The Missing Man Pages Project) have been converted from SGML to roff-sources, and other people have shown some interest in the project.
My original goal of providing a better SGML-to-roff converter for man pages than docbook-to-man seems to have been reached, if only partially. Needless to say, development on this tool will continue as it is being used more and more.
New releases will be announced on websites supporting Free Software and on http://mama.sourceforge.net, of course. However, with the initial tool having reached a functional stage in the development cycle, we can now focus on providing more user-friendly front-ends that will eventually use the SGML format as their basis.
If you have any comments whatsoever related to ``makeman.pl'' (or the entire project, of course), please don't hesitate to contact me at jschauma@netmeister.org. I will be more than happy to hear about bugs, so as to improve the software. Patches and other helpful suggestions will cause immense joy, as well.