How to Semi-Automate Typesetting to Save Time and Money

Typesetting is the operation consisting in setting the text onto a page so that it looks good.  It originates from the printing world in which typographers initially had to arrange physical types (letters and symbols) to imprint them on paper. Obviously, this work is now completed digitally with the help of dedicated software such as Adobe InDesign, LaTex, or QuarkXPress.  All these solutions are designed to facilitate the process of determining the right size for margins, the most adequate font typeface and size, or the appropriate styles for paragraphs. They serve to produce brochures, magazines, books, and official reports that are generally shared electronically in PDF format.

Some would argue that in the era of the Internet where HTML is king, an online publisher like Lexum should campaign for the abandonment of this approach anchored in the history of printing. And in fact, we did for many years. However, we came to realize that many legal institutions are keen on preserving the existing look and feel of official printed publications even when transforming them into digital products. This may result from adherence to traditions, from a desire to maintain legibility in the electronic environment, or to instill authenticity to the new format. But it can most probably be explained by a mix of all these reasons. In any case the need for professionally formatted and assembled publications will remain in the foreseeable future, alongside the need for searchable databases of their components.

In this context our approach has been twofold.  First, a few years ago we announced changes in the way we publish PDF files to meet the most sophisticated digital publishing requirements. Second, we have started offering typesetting services to clients producing official publications. But instead of manually enforcing their publishing policies one page at a time, Lexum’s typesetters have designed a way to use heuristic rules to automate a large part of the typesetting work. Automating the inclusion of formatting tags in the body of original files allows us to standardize a large portion of the styling requirements before even opening the files with a typesetting software. This approach dramatically reduces the number of manual interventions required in every file, which itself turns into savings for the clients.

For instance, the following illustration shows the tags inserted in the body of the MS-Word file of the French version of a Supreme Court of Canada decision to be included in two-column volume of the Canada Supreme Court Reports:

This tagging is facilitated by using Word Macros. This one inserts the <pstyle:Corps FR> tag shown above where appropriate:

The same approach is used to ensure that text attributes (such as italic, bold or underline) are not lost during the conversion to the typesetting solution. In addition to styles, many formatting elements are inserted in the same fashion.  For instance, regular spacing can be systematically replaced by non-breaking spaces to avoid inappropriate separations at the end of lines. Moreover, it allows to automatically identify and fix common formatting mistakes, such as double spacing following the end of a sentence, or the use of single quotes instead of double quotes.

In the end, instead of having to manually process every single page, only exceptions to each project’s publishing policy require any kind of interventions from our staff.  This approach allows our typesetters to focus on quality-control operations, instead of on production tasks. Efficiency even improves over time, as more and more rules are designed and implemented to catch recurring exceptions. Needless to add that it also reduces delays in projects involving short publishing cycles.

Obviously, semi-automation of typesetting tasks with such tagging requires to invest additional initial effort to design and adjust the required macros and scripts. For smaller one-time projects, this extra work may not be worth it. However, for projects involving periodicals or any kind of recurring works, the savings generated over time quickly overtake the initial burden.

Examples & illustrations have been provided by Philippe Lanthier.