The internet and legal information: projects and prospects

THE INTERNET AND LEGAL INFORMATION :
PROJECTS AND PROSPECTS

TOM R. BRUCE

CORNELL LAW SCHOOL

Man draws dogs, bites Net: a coming of age

Page 61 of the July 5, 1993 New Yorker magazine held a cartoon depicting two dogs at the keyboard of a personal computer. One animal is saying to the other, "On the Internet, nobody knows you're a dog".

For the Internet community, the New Yorker cartoon marked a passage into adolescence. The Internet had risen far enough into the popular consciousness to be laughed at by sophisticates, a mark of wider public awareness. Many now divide the course; informality is an Internet hallmark) into "before the dogs" and mutts were just the beginning. By purporting to be guides to the Internet appeared on the shelves of mass-market booksellers. By New Year's Day they were out of date. All in all, literally thousands of Internet-related stories appeared in the non-technical press in 1993 and early 1994. Many of them made liberal use of the word "cyberspace". Most of them prominently featured the phrases "information superhighway" and "interactive television". Virtually all of them were variations on a theme: five hundred channels of interactive mud- wrestling will be coming to your home soon, courtesy of an unholy alliance of the telephone, cable television, and computer industries. The reaction from most people who had actually been working with Internet- based technologies was one of cautious approval; they liked the attention, but they didn't know whether a future in which all the bandwidth is consumed by Arnold Schwarzenegger movies was exactly what they'd been working toward. Like an adolescent with growing pains, the Internet community is not at all sure whether it wants to be an adult on someone else's terms, or if it might just be wiser to remain a child long enough for the whole thing to blow over. Like other adolescents, it is already well out of childhood with no way back.

Legal Information and the Net

The high-bandwidth media hype of the past several months neglects the utility of existing Internet technology. The last few years have seen substantial experiment and growth in the legal-information sphere. Four years ago (with the possible exception of a few acceptable-use policies and law- of- computing documents scattered across FTP sites) there was no legal information on the Net. Three years ago, there was one law- specific Gopher server, the one operated by the Legal Information Institute at Cornell. Two years ago, there was one law- specific WorldWideWeb server, again at Cornell. There was almost no information being mounted by government agencies at that time, and only one commercial publisher who was using Gopher as a means of vending legal information.

The situation today is quite different. Many Federal agencies are now making information available through the Net; perhaps the largest single example is the EDGAR database of securities filings. A dozen or more law schools are mounting information in one format or another; many others have plans to do so. Uniform stylistic standards are the subject of recurrent discussion in the academic legal-technology community, and it seems likely that a "legal hypertext style book" will emerge sooner rather than later. Practitioners have been slower to act; relatively few firms have direct Internet connectivity, and most are worried about security issues which were solved to the satisfaction of the defence industry years ago. Nonetheless, the first WorldWideWeb server operated by a law firm came on-line for test purposes a few hours before this sentence was written, bearing information designed to showcase the intellectual inventory of the firm. There will be many, many more.

These youthful accomplishments invite speculative comparisons with established and presumably more mature commercial legal data services. The comparisons are sometimes fuelled by enthusiasm (the "Wow! we could start our own LEXIS!" response) or skepticism ("Why should we do this at all when we have WESTLAW?"). Those two giants themselves use the Internet as a pipe through which to ship their data inventory. But looking at Internet infrastructure merely as a better or worse way of delivering the same old services to the same old markets is a profound mistake. It is equally wrongheaded to assume that the methods historically used by the computerised legal information industry to exchange money for information are the only ones available to us. To do so ignores the fact that the Internet is a very different medium, and it ignores a long history of software and hardware development in which quantitative improvements in technology, accompanied by reduction in cost, lead to enormous qualitative change.

We are witnessing a revolution of scale and of scope which in many respects will parallel the personal computer's overthrow of the mainframe, and which will occur for many of the same reasons. Perhaps that sweeping statement is simply the legal technologists' transliteration of the media hype mentioned a moment ago; I suppose you could call it the "500 channels of interactive legal information" story. If it is a fantasy, it is one firmly rooted in the reality of present-day production systems, and a look at those roots is in order.

What is the WorldWideWeb?

The balance of this essay will use WorldWideWeb technology as its point of departure. The Web is actually composed of three parts: a client-server protocol (HTTP) by which software running on a workstation can request data from a remote computer over the Internet; a markup language (HTML) which permits typographic information, hypertext links, images, digitised audio clips, animations, and interactive forms to be embedded in ordinary ASCII documents; and a Uniform Resource Locator scheme (URL standard) which provides a common syntax for describing the means of access to and location of virtually all data resources on the Internet. The URL notation is the basis not only of a syntactic scheme for embedding hypermedia links into Web documents, but of a kind of "protocol transparency", in which the user of Web client software need not be concerned with the actual mechanics used to deliver data to her machine.

Complete descriptions of the Web standards are outside the scope of this paper, as is detailed explanation of hypertext and what it does. I will instead concentrate on a few critical aspects and implications of the technology, with examples taken largely from our ongoing work at the Legal Information Institute. It seems only logical to begin with the hardware and software needed to support the effort.

Hardware and software platforms

Web server software exists for all hardware platforms commonly in use today, including all varieties of personal computers and UNIX workstations, as well as VAX minicomputers and IBM mainframes. Typically, the systems used for high-traffic servers are workstations running some variant of UNIX. This may be more a matter of affection than performance; some experimenters report that PC clones of modest capabilities (486-33's) running commercial flavors of UNIX for IBM PC's have outperformed dedicated RS-6000 workstations costing ten times as much. The unifying characteristic of the machines commonly in use as Web servers is that they are inexpensive, well under $10,000 (US) for a very capable machine with two gigabytes of disk storage, and quite possibly half that.

Connectivity costs for a Web server vary greatly depending on the type of institution running it. At most university-based sites, the cost of Internet connection for a particular machine is subsumed in some overall budget and is inexpensive on a per-machine basis; ours costs in the vicinity of $36 (US) per month. Costs for a dedicated 14.4K phone link serviced by a commercial Internet access provider, such as the one used by Venable, Baetjer, Howard, & Civiletti to provide access to their Web server, might run $2000 annually. Finally, all the software used at the server is freely distributed and costs nothing. To put these numbers in perspective, the LII's hardware, software, and connectivity cost for startup of their Web, Gopher, and listserv operations, plus one year's operating cost, was less than half the amount of money saved each year by a move to electronic proof generation at the school's two student-edited law journals.

Distributed hypertext

These are not, of course, the only costs involved. Hardware and software expenses are in fact a rather small part of the overall cost of electronically publishing a collection of texts, though they have a disproportionate chilling effect when an organization contemplates Internet distribution for the first time. The lifetime cost of such a system is almost entirely in the mounting and more significantly the maintenance of the collection. The fact is that if an organization wishes to provide value-added legal information, then someone or something is going to have to add the value. That value-adding process will have costs associated with it in both the near and long terms. Distributed hypertext permits organizations in different sectors to share those costs in new and interesting ways.

Earlier hypertext systems allowed the user to navigate among documents maintained on standalone computers and on local area networks; they were, in effect, confined to environments where the workstation could be fooled into believing that the disk used to store hypertext documents was physically attached to it. This was and is a serviceable technology, and its utility in the organization and retrieval of legal text has been widely recognized and acted upon. Web technology goes beyond these earlier systems in its ability to provide hypertext links across machines on the Internet, for the most part without regard to specific delivery protocols. It is easy for a Web client to access information made available through the Web itself, through Gopher servers, WAIS databases, anonymous FTP, and essentially any other access system for which gateway software can be written (examples include the HyTelnet, Hyper-G, and TechInfo formats). In cultural and administrative terms, this means that hypertext links can tie together bodies of text which are related in substance but unrelated in sponsorship, text which is mounted and maintained by different organizations, without regard for geographical or institutional proximity. The editorial and maintenance costs of a collection can be spread across many cooperating "sub-providers", each one mounting and maintaining a portion, perhaps one in which the provider has substantial expertise or interest.

The collection of US Supreme Court opinions offered by the Legal Information Institute illustrates this proposition well. The actual opinions are mounted for anonymous FTP retrieval at Case Western Reserve University under Project Hermes, an effort begun by the Court in 1990 to make decisions available electronically to public and private-sector entities. Case Western decided to use anonymous FTP -- the most widely available technology at the time -- as the means of distribution, dividing each text from the court into syllabus, opinion, dissents, concurrences, and so forth, and assigning each portion a unique file name based on the docket number of the case. The user of CWRU's anonymous FTP site is confronted with a directory structure full of files with names like "92-1168.ZS.filt", which is in fact the syllabus of the decision in the Harris v. Forklift Systems sexual-harassment case. CWRU is, of course, adding considerable value by filtering out word-processing codes, dividing the opinions into reasonably-sized chunks, and providing a publicly-accessible distribution point. For the average individual trying to obtain an opinion, however, the combination of cryptic filenames which won't pass unaltered onto a DOS-based file system and the poor navigational capabilities of most FTP clients add up to an interface best described as user-hostile.

The LII's contribution has been the construction of hypertext pages which link case names and related information to the FTP files at Case Western. This provides end users with a hypertextual, annotated table of contents from which they may select cases by actual case name and portion (e.g. syllabus, opinion, concurrence, and so forth) without needing to know or to look up the docket number. These hypertext pages are stored on Cornell's server, and the opinions remain at Case Western. CWRU bears the cost of mounting the actual opinions, and Cornell bears the cost of organizing and maintaining usable access points for the average user. We have gone on to make the collection searchable in a variety of ways, including organization by keyworded topic and full-text search. Again, only the indices and the searching software are at Cornell, with the "base text" at CWRU.

This process, which we have glibly dubbed "add-on scholarship", is much more than a workaday exercise in the librarian's or indexer's art. The same techniques can be used to unify scattered bodies of related material. We are in the early stages of a grant-funded project in which the Rules of Professional Conduct as adopted in each of the 50 states would be mounted by institutions in each of the states. Participating organizations -- which might be law schools, bar associations, or law firms with a pro bono bent -- need only mount those sections which differ from the two major variants on the Model Rules of Professional Conduct. The remainder of the corpus for each state would be drawn from a central pool, accessed section by section via hypertext links. Thus, the table of contents for any given state would contain a mixture of pointers to generic and localized sections, minimizing the work needed from any one of the participating institutions and, correspondingly, its expense.

Law teaching and add-on scholarship.

Two years ago, we undertook a collaboration with the Fletcher School of Law and Diplomacy at Tufts University. Fletcher personnel provided us with digital versions of a large collection of multilateral treaties; we provided a distribution point with full-text search capability on our Gopher server. One of many treaties so treated was the Berne Convention for the Protection of Literary and Artistic works, which now does double duty as an item in the Fletcher multilaterals collection and as one piece of a much more extensive collection of intellectual property materials put together by the LII. The collection includes all relevant US copyright, patent, and trademark statutes and regulations. It is, in fact, the Internet extension of a project begun a few years ago using Folio Views on a local area network. My colleague Peter Martin has used that environment to do another kind of "add-on" hypertext linking : he mounts his outline for each class presentation in his intellectual property course in hypertext form with links to the relevant portions of the core statutory material. It is not difficult to imagine that a law teacher or group of law teachers working in the same substantive area might well do the same across the Net, creating a much more seamless, electronic equivalent of the Xeroxed anthologies-cum-textbooks which have become popular in recent years. (Indeed, in the week which elapsed between the first drafting of this paragraph and the transmission of my text to the conference organizers, someone did so. A teacher of contract law at Ohio Northern University built links between a Web version of the review outline he prepares for his students and the relevant sections of the Uniform Commercial Code as mounted at Cornell.) Each teacher could retain her unique perspective, while offering those of others. No individual need bear the full brunt of mounting and organizing large chunks of core material which will be used by all. The addition of personalized annotation capability, already an experimental reality in the Web research community, would allow such resources to be highly individualized and extended by students and teachers.

Other channels

Though I earlier offered to confine myself to Web technology, the availability of other communications technologies and channels across the Internet begs at least a nod here, and for two reasons. First, the array of technologies available permits electronic publishers to offer additional services built on their investment in mounting electronic text. Second, the availability of electronic mail in particular means that every text placed on the Net can exist at the center of many channels of communication, some of them especially useful to an author/publisher.

The first idea is illustrated by LIIBULLETIN, a service offered by the LII shortly after we began automatically constructing pointers to new Supreme Court opinions as they were delivered to CWRU by the Court. The same software which discovered the existence of new opinions to be indexed and added to our hypertext pages could, with only minor modifications, send notice by e-mail to persons who wished to know the existence of a new opinion from the Court, in a message which also contains the syllabus of the opinion and instructions for retrieving its full text by mail. This offers a service to those who have only mail connectivity to the Net, as well as adding value through its timeliness and through its narrowcast characteristics: the alert is transmitted directly to a mailbox which the user presumably scrutinizes frequently, rather than through a broadcast or bulletin-board apparatus.

The second idea takes its power from the idea that a publicly-mounted text on the Net is visible to large numbers of other experts who presumably have access to mail and can address responses more or less directly to the author. On several occasions -- though I should perhaps blush to mention it -- errata or omissions in an online text have been brought to our attention very quickly by others on the Net. There is an interactive power working in the author's favor here; one need not be trapped for a year or more in an error which has been typeset and distributed to who knows where. Our experience is that the most perceptive and expert readers are the most likely to provide feedback. One anticipates a certain amount of annoyance from non-expert readers, or readers with axes to grind, but in fact this kind of communcation has in our case been very rare by contrast with a much greater volume of genuinely useful feedback received from our peers.

Of course, conversations need not be limited to errata and other problems. Substantive discussion of online works can take place, be captured, and then perhaps incorporated into the work. The capture and subsequent public offering of electronic conversation about electronic texts offers the possibility of symposia and colloquia which take place in virtual space. Because of their specific relationship with an electronically published work, such symposia can remain relatively both focussed and timely.

Who will participate?

Academic institutions are not the only inhabitants of the Net. We are continually surprised at the diversity of audience for the legal information we mount at Cornell. The LII version of the Supreme Court opinions is accessed by everyone from teachers in political-science departments to high-school students and researchers in government laboratories. While at first we imagined that our efforts would primarily be interesting to the legal academic community and a few venturesome practitioners, we find that we have a much larger audience, one that for the most part has not been served well by traditional on-line services. This audience first declared itself in a note to us from a corporate manager in Hewlett-Packard's London office, expressing his gratitude for our efforts in mounting American intellectual-property material. He needed and could not otherwise obtain daily access to copyright regulations. Many others have stepped forward via e-mail, including ham radio operators interested in FCC regulations, high-school civics teachers, and foreign legal scholars. The log files which record activity on our servers show that US law schools and legal practitioners are in fact the minority among those accessing our material.

The Net is a communications space common to the public, private, governmental, and academic sectors. Many synergistic efforts can result from this kind of cross-sector proximity. The LII works closely with a number of corporate sponsors who have an interest in legal and quasi-legal information, and examples of such projects exist at many places other than Cornell. The EDGAR securities-filing database was recently mounted on the Internet; it is a large and important example of government information from one agency (the Securities and Exchange Commission) being mounted with funding from another agency (the National Science Foundation) by a collaboration between a private sector entity (Internet Multicast, Inc.) and an academic institution (the business school at New York University). Our own more modest example is the mounting of the NASDAQ Financial Executive Journal, an outreach publication of the NASDAQ Stock Market.

The NASDAQ project is an interesting example of what one might call "corporate presence information", information mounted by a company not for sale or as advertising but as a means of offering access to corporate intellectual property as a new species of customer service. The value of Internet publication to NASDAQ is presence in front of its issuing companies, many of whom are high-tech corporations who already have considerable Internet presence themselves. Just as a law firm might open an office in a city where a major client is beginning business operations, NASDAQ has opened a kiosk in cyberspace as a way of showing solidarity with and service to its customers. As it happened, the service was a timely one; the first issue, which was largely concerned with strategies for avoiding shareholder suits when stock prices tumble, was put on the Net roughly a week before the sharp drop in the price of Apple Computer stock last summer. We logged a large number of accesses from individuals at 'apple.com' in the weeks that followed.

The timeliness of NASDAQ's first informational offering was coincidental, but it is not difficult to imagine that corporations -- and even law firms -- might find ways to make equally topical information available by design. Many already do exactly that with newsletters describing the effects of regulatory changes, pending legislation, and so on. Electronic information can, of course, be distributed much more quickly than paper, and can thus be even more timely. We routinely receive the text of the NASDAQ publication at the same time as the printer of the paper edition does, and generally have it on-line ten days prior to the mailing of the paper version.

Sponsorship, for-pay services, and altruism

Corporate presence information falls at the midpoint of a spectrum which has the traditional grant-funded altruism of the Internet community at one end and for-pay services such as LEXIS and WESTLAW at the other. That continuum is a spacious one, with room for many models. The notion of private sponsorship of public information, analogous to corporate sponsorship of public television in the US, falls nearer altruism. I believe this is a viable structure, though one which has not been used to any great degree. One can easily envision a database of occupational safety and health regulations sponsored by a safety-equipment company, or an on-line version of the Physician's Desk Reference sponsored by a pharmaceutical concern.

The other end of the spectrum is not yet well-populated either, primarily because there are as yet no commonly-accepted and reliable authentication mechanisms (passwording systems) which would permit "pay-by-the-drink" services to exist. Counterpoint Publishing, among others, has been experimenting with a subscription system which does seem to work well, although one imagines that the administrative overhead is high. Of course, it is actually possible for a company to make money by giving information away; West Publishing's recent offering of a national directory of lawyers via Gopher server can be seen as this kind of "loss leader", a useful and coherent data resource made freely available, with the implicit promise of much more to those who pay.

In any case, workable authentication schemes are no more than a year away; they are understandably the target of a great deal of interest in both the academic and commercial research and development communities. One should bear in mind that authentication is only the tip of an economic iceberg. Customer-service businesses of various kinds are needed. Imagine, for example, that you routinely access four hundred different information servers each month as you move around the Net; in a hypertext environment you might do this quite easily and naturally. How many invoices do you get at the end of the month? Four hundred? Or would you prefer to deal with a business organization which will consolidate those individual charges into a single itemized invoice, much as charge-card providers do? Some commercial on-line services (Delphi was perhaps the first) are clearly planning to position themselves in this way, but it is not clear that any one of them can do so effectively as a single member of a large and growing pool of access providers.

There are those in the academic community for whom any hint of commercialization is anathema, but their days are numbered. Altruism in the mounting of information at no apparent cost has served to jump-start use of the Net, but may not serve the public well in the long run unless the altruists raise their sights. Volunteers have sometimes done a terrible job of meeting information-quality standards we take for granted in print and in the commercial on-line services. In part this is because volunteers have been more interested in the delivery systems than in the content, and in part it is owed to the fact that they can't afford to maintain large collections individually. This is not to say that academic organizations should give up electronic publication in favor of existing commercial operations. Far from it. It is simply that organizations in academia need to find new ways to be compensated for their work; there is much about their labor that has real value in an information economy. Cost sharing and other cross-sector efforts will require an unprecedented degree of administrative cooperation and creativity in constructing relationships between academic, governmental, and private-sector players who do not at this point understand one another very well. The technology makes such cooperation possible but not inevitable; the next major strides to be made in electronic publishing on the Internet need to be administrative rather than technological innovations.

Content

The necessity which will mother this kind of cooperative invention may well arise from a new perception by the market that content is the only valid way to differentiate electronic products. It has, until now, been very much in the interest of electronic publishers to seize on comprehensiveness and user interface as the primary points of competition between products which offer the same substance. Consider, for example, yet another CD-ROM which contains the US Internal Revenue Code and related caselaw. It is a great deal less expensive to pay two software engineers to develop a flashier user interface, or maybe even a search engine, than it is to maintain an expert editorial staff to add that kind of value to an ever-expanding collection of documents. For the most part, end users have taken the bait, some of which (particularly enhanced searching interfaces) can admittedly be quite tasty. This will not be such a viable strategy as the market matures, and I believe the long term will favor those who add real editorial value. The electronic equivalent of small academic presses have considerable access to authorial and editorial expertise, and are thus well-positioned to add precisely this kind of value in narrow niches of their own choosing. The same is true of law firms with particular expertise. While authors have traditionally undervalued the efforts of publishers in adapting texts for market and distributing them to those markets, so too have publishers undervalued the role of authorial content in selling books. It is not for nothing that the book is called Prosser on Torts rather than simply Torts. There is no question but what new, distributed, low-cost electronic publishing apparatus, with an audience far larger than that which is possible for any printed book, will tip the balance of traditional author-publisher relationships permanently.

A final word: collaboration

If the picture I have painted here is a blurry one, it is because traditional roles occupied by authors, publishers, and technologists are coming together in different combinations in different places, and those places are themselves mergers and cooperative efforts between the academic, private, and governmental spheres. The Legal Information Institute is the result of an intensive collaboration between a law teacher and a technologist, recently expanded to include a specialist in UNIX server systems and a law librarian. There is a sense of balance and interaction between substance and delivery systems in the minuscule staff roster of our Institute, and in the work it undertakes as both a receiver of academic grants and a contractor and consultant for various commercial entities. The LII offers substance of its own. It also offers, organizes, and provides contextual information about data resources which are only one mouse click away for the end user and very, very far apart by measures of physical distance or institutional culture. That information is offered by over one hundred academic, private, and governmental organizations, and those institutions communicate with and refer information-seekers to one another to an unprecedented degree. It is perhaps not surprising that the Internet -- which is, after all, a very large communication system -- should give rise to a large amount of communication. What is unprecedented is the degree to which it demands communication and collaboration across professional, institutional, economic, and cultural lines if we are to fully realize its potential.


S.D. 08/03/94