Making old cases new: the digitization of case law

Information that is born digital today tends to be accessible online and thus better preserved than 20 years ago, before the Web made it much easier. Prior to the Web’s revolution in information technology, content that was prepared electronically with word-processors was mostly preserved on paper and not digitally, the native files being overwritten with newer content in order to save memory space, which at the time really meant, in several office environments, to save… floppy discs. As surprising as it seems today, this was normal management of information back then and courts and tribunals were not different as other institutions. Most decisions issued before the mid-90s were only preserved in printed form even if they were initially prepared electronically.

OCR Not so long ago, making these cases available online was quite costly. Today, by using the latest technologies and improved processes, making old cases new has become much more affordable.

Digitization of printed texts is often dubbed “Optical character recognition” (OCR) but this characterization obliterates the fact that it doesn’t merely consist in transferring data from paper to electronic media. Digitization of case law also entails extraction and reorganization of information so that it can be exploited by computerized systems for the benefit of users wanting to efficiently search and retrieve decisions.

Cases preserved on paper can generally be found in one of two categories: reported or not reported. The former are available in law reports, the latter are available in court records. Two sources of document, two very different sets of issues.

Digitizing decisions directly from the court record is, not surprisingly, a most challenging endeavor. Even where the court’s file system is organized so that reasons for decision can be sorted easily, the decisions’ identifying data might be difficult to extract. Even an element as important as the date of decision could be missing from the paper version kept in the file, as it was often stamped only on the printed versions distributed to parties. In appellate courts where reasons are issued by a bench of more than one judge, the “opinions” from the different judges might be scattered in the paper record and mistaken as constituting standalone decisions where they should be read together. Many times these different opinions do not bear the same date, which may add to the confusion. Finally, as these documents were not intended for online publication at the time they were issued, they might be subject to legal publication restrictions or to the application of a court’s policy which minimizes online publication of personal information.

To alleviate these issues with decisions digitized from the court record, the court’s administration has a role to play. In a digitization project realized for the Alberta Courts a few years ago, court officers sorted the decisions from their records and for each one added a new cover page to the decisions’ images provided to Lexum. The cover page presented a decision’s identifying data such as the name of court, the date, parties’ names, judges’ names, etc. Court officers familiar with their records are in the best position to find any missing information and the cover pages they prepared were crucial to get the project done on time and on budget.

ocr Digitizing decisions from case law reports presents other challenges. The first that comes to mind is about copyright. Lexum recently processed 42 years worth of decisions reported in the Revue légale for the Centre d’accès à l’information juridique (CAIJ), for which the reporter Wilson & Lafleur has granted permission to reproduce headnotes. However, this is not how it goes for most projects. Where the reproduction rights of reported cases can’t be cleared, one has to devise means to only retain the reasons when processing the documents, without any of the headnote’s proprietary information (summary, keywords, etc.). This leads to a second significant challenge. Even for older reported cases that fall under the public domain, a careful analysis of each report’s editorial practices is mandated in order to properly extract decisions’ identifying information (metadata such as date, docket, citation, etc.) included within the headnotes. As you go back in time, reporting practices differ to the point where basic elements such as case names and decision dates differ among reporters. This is why digitizing decisions out of a wide variety of printed reports published as far back as the 1870’s can’t be reduced to OCR. Information has to be extracted efficiently, reorganized and sometimes corrected (e.g. an 1880 decision reported in the SCR was “officially” dated June 31st…). Using word-processing software features such as styles to tag metadata, for instance, is key to streamline extraction of decisions’ metadata, minimize manual intervention and reduce processing costs.

Knowledge of reporters’ past practices and finding optimal techniques to manage their diversity comes with experience. Over the past decade, Lexum digitized over 50,000 decision documents in both official languages from all major Canadian case law reporters, some of which date back to the 19th century. This was made possible with support from Courts and Law foundations from Alberta, Saskatchewan, Ontario, Newfoundland and Labrador and more recently the Supreme Court of Canada.

The cost of performing digitization has dropped by a factor of 10, thanks to technology and improved processes. Courts and tribunals can now afford to make their old cases new and available online, and Lexum is more than ever prepared to serve them well.