Daniel Poulin
Centre de recherche en droit public
Faculté de droit, Université de Montréal
poulind@droit.umontreal.ca
Today, however, these difficulties, while not eliminated, appear in a context which has become more favourable than it once was. The recent development of technologies linked to the Internet and the World Wide Web (WWW) has lead to a radical reduction in the production and distribution costs of highly interactive teaching tools. Another advantage is that the systems developed using these technologies are more open since their elements can more easily be made available to be re-used by others. The changes this implies in the way costs are calculated are likely to increase student access considerably. This is the environment, traditionally hostile but today more favourable, in which we developed our proposed Système d'enseignement des libertés fondamentales: SELF (System of Teaching Fundamental Freedoms).
In the following sections we will describe this information system, which is organized around the Canadian Charter of Rights and Freedoms. First we will present its general structure and functioning, and then we will deal with the various technical issues raised by its development. We will particularly discuss our choice of the SGML (Standard Generalized Markup Language), its use, the solution chosen for the hypertext additions to the corpus, for information retrieval and the publication of SELF on the Internet.
Concretely, the base of documents published is structured around the collection of Supreme Court Charter decisions, or almost 500 decisions totalling approximately 25 000 pages in each language. The corpus will also include the Canadian Charter of Rights and Freedoms and the Canadian Bill of Rights. Future plans are to complete this set of various doctrinal works as soon as the right to publish new additions is obtained. Other related corpuses will also be added to this basic set. This will be the case, for example, of a set of documents including the Charte québécoise des droits et libertés de la personne and the complete collection of the decisions of the Tribunal québécois des droits de la personne.
The proposed system provides a rich teaching environment. From the point of view of the professor, SELF is first of all an educational system aiding in the teaching of concepts related to fundamental rights. The system contains both documentary resources and exercises allowing the acquisition of knowledge to be confirmed. The use of the Internet also permits the utilization of various communication mechanisms proper to promoting interactivity with students, such as electronic mail and lists. Secondly, SELF will appear to professors as a store of materials ready to be used for various educational purposes. In effect, the openness of the system fosters its use in the preparation of very different kinds of education. For example, it will be possible for a professor in criminal law to mine SELF for documents useful for the part of his or her course dealing with fundamental rights. In fact, with a minimum of work it will be possible for anyone with access to the Internet to use SELF to produce a personalized electronic anthology which is adapted to his or her teaching. Thirdly, for professors and also for all those concerned, it is a valuable documentary tool providing access to primary sources of law and to various doctrinal works.
From the student's point of view, SELF appears as a WWW resource specifically adapted to one area of learning. From a personal computer or a laboratory station, the student will be able to do research in SELF or browse through it, either by employing the electronic syllabus proposed by his or her professor, or by using it like any other Internet resource. By creating his or her own electronic documents, the student will be lead to develop a personal documentary system on the subject studied. In fact, this teaching tool authorizes individual uses. Students will be free to annotate electronic documents, to underline important passages and, more generally, to structure the information in their own way.
The Choice of SGML. Many concerns contributed to our choice of SGML for the storage of documents. We were looking primarily for an open solution compatible with Internet norms. Openness and Internet compatibility seemed an unavoidable condition on the possibility of our conception of the cooperative establishment of common information resources for the teaching of law in Canada. We also hoped that the format chosen would be stable. Our experience as publishers on the Internet had convinced us of the advantages associated with adopting durable formats. Finally, we were looking for a means of storage likely to facilitate retrieval. In effect, we hoped to offer users the possibility of making requests by field. For that, the structure of the documents had to be defined. They had to be tagged.
Three types of solutions were considered, including certain proprietary formats (Folio, Acrobat and RTF), the HTML and the SGML. Folio, Acrobat and RTF were rapidly set aside in order to ensure the accessibility of SELF and due to the costs and inconveniences that selecting them would have entailed for us and for our users. It also seemed to us that while the HTML is extremely useful for publishing on the Internet, it does not possess the wealth and stability necessary for our base of documents. It is the SGML, norm ISO-8879, a stable, open language for tagging, which has appeared to us to be best suited to allow us to add to and preserve documents.
Setting SGML to Work. The deployment of SGML includes a number of elements. Documents must first be modelled, then converted, and finally, of course, published. The process begins thus with modelling, or the development of a model document, called DTD (Document Type Definition). A DTD takes the form of a grammar of which the elements correspond to the structural elements of the documents. It is a sort of abstract description of the documents. During the tagging of the documents in SGML, each element of the DTD is used as a tag to identify an instance of the structural element to which it corresponds. In the proposed SELF, we use two main DTDs: the DTD RCS for Supreme Court decisions, and the DTD HTML 2.0 --the HTML norm is simply a specific DTD-- for documents to be published on the Internet. Other DTDs have been developed, or will be developed, for the various other types of documents which can be found in SELF.
Once the DTD is developed, the documents must be tagged. In effect, the use of SGML in a context in which the documents have been prepared in the traditional way requires an up-conversion so that the tags which explain the structure of the documents can be added to the text. In this respect, the production of SELF requires a considerable amount of work for the conversion of the documents since approximately 50 000 pages must be tagged in SGML. Marking the documents by hand was not an option, for obvious reasons, so programs able to recognize the structure of the decisions using lexical and typographical signs were developed. The process which was initially predicted to entail one step took instead the form of a process involving successive refinements. In effect, as we worked on the corpus, unforseen structures were revealed in the writings being analyzed, thus forcing us to update the DTD and, in consequence, the conversion programs.
To the conversion into SGML is added the conversion of SGML into HTML, which is necessary for the publication of documents and especially for consulting them using current browsers. This cross-conversion was also automated. With a base of documents tagged in SGML, which could be converted on request into HTML, all that remained was to ensure the hypertext additions to the set.
The Hypertext additions and external management of the links. On the hypertext level, two problems had to be solved. First, the potential hypertext links made up of the legal references contained in the documents had to be defined and transformed into functional hypertext links. Once again, the size of the corpus and the number of links to be entered required the development of automated mechanisms of analysis. In the identification and analysis of references, we were able to count on the rigor of the forms of citation used in law, for example, "R. v. Oakes, [1986] 1 S.C.R. 103." The set of intra-corpus references, in other words those going from one decision to another or from a decision to the Charter were easily identified. Other links can be added as the space for legal documents on the Internet is expanded.
The second difficulty linked to the hypertext dimension of the corpus has to do with the management of the links themselves. In the WWW universe, the usual solution is to write the hypertext link into the document itself. For this the attribute href of the anchor tag <A> is used:
<A HREF="URL">The text designating the link</A>.
One would write to denote a link to page 105 of the Oakes decision situated on the University of Montréal server:
<A HREF="http://www.droit.udem.ca/CSC/arrets/86/vol1/oakes.fr .html#p105"> R. v. Oakes, at 105</A>
This solution did not seem satisfactory. In effect, the least movement of any document would require an updating of the entire base in order to adjust the addresses. Worse, from the point of view of the development of a teaching system, we would have to document the addresses of all the documents in order to allow our users, professors and students, to refer to them using hypertext links. Finally, we want to encourage the use of our resources by other faculties but, if our documents are used directly as the destination for hypertext links, the slightest movement on our server would risk having repercussions everywhere the corpus is in use.
We thus turned towards a solution related to the PURL norm (Persistent Uniform Resource Locator) presently offered by the OCLC (Online Computer Library Center, Inc.). We will thus manage the destination addresses through hypertext links exterior to the documents. In this context, during the preparatory treatment of documents, references are replaced by hypertext links designating an address stored in a database. If there is an address change, only the corresponding record in our database will have to be changed. Then true URLs can easily be entered when the documents are converted from SGML to HTML.
Information Retrieval. SELF must have extremely efficient retrieval mechanisms. In effect, since all the decisions deal in principle with the same issue, the simplest retrieval mechanisms are relatively inefficient here. This is why we resorted to a full text retrieval engine supporting fields. By using certain SGML tags to define the fields, it became possible to request, for example, documents in which a given word is found in a specific part and certain other words elsewhere.
The large size of many documents in the corpus, some of which ran to several hundreds of pages, constituted a second challenge at the level of retrieval. In this case, it is not sufficient to find a document: what is pertinent with respect to the request formulated must also be identified within that document. This problem was resolved through the development of a user interface permitting repeated use of the retrieval mechanism. This will be discussed in greater detail in the next section.
The Interface and Publishing on the Internet. In SELF, the set of documents could be approached in a number of ways. The user could browse through a hypertext syllabus prepared by his or her professor. He or she could also navigate through our resources like any one else on the Internet, or, finally, he or she could resort to various retrieval mechanisms.
In order to permit the first two types of use, our work was limited to the development of the appropriate HTML pages. Thus to allow navigation using an electronic syllabus, all that is required is that the professor be helped to develop a set page giving access to the contents designated for each of the professor's meetings with his or her students. For each meeting, or for specific meetings, a specific page can be prepared in which the students could find, as well as the presentation of the notions introduced in the course, hypertext links to documents providing the background information.
Access by means of retrieval mechanisms is a little more complex to organize. In order to consult SELF, the user fills out a form indicating the key words which should appear in each of the parts of the document to be found. The system then provides a list in which documents seeming to answer the request are classified according to their relevance. When the user wants to consult one of these documents, the system repeats the request, but this time it applies it to that document alone. What the user receives as a result of this request is an index-result--produced using SGML tags--providing access to the structure of the document and to a list of the most relevant pages given his or her request. If the user chooses to consult one of the elements of the document retrieved, such as the reasons of one of the dissenting judges in a decision or one of the pages suggested, he or she calls up the element of the document, to which hypertext links to other documents in the corpus may be added. The user can then continue his or her research, either by consulting other parts of the document retrieved, by moving to one of the other documents presented in the response to the initial request, by making yet other requests, or by following the hypertext links which were presented.
The production of the various technical aspects of the project was due to the assistance of many collaborators: Stéphane Denis, for the conversion procedures; Yanik Grignon, for the retrieval mechanisms; Chantal Lefebvre, for the conception and production of the interfaces; and Guy Huard, for the SGML modelling of the Supreme Court decisions. This part of the project received financial support from the Fonds de l'autoroute de l'information du Québec (#94-035) and from the Fonds FCAR (#96-ER-1557).
This project would not have been possible without the support of the Supreme Court of Canada.