The computerization of probate registers
By: Gunnar Thorvaldsen
This paper deals with alternative methods for making machine-readable the probate registers, that is to say reports about the redistribution of heritage from deceased persons to their relatives. I shall start by putting the treatment of the probate type of source material into the context of other historical name data.
In 1994 Norwegian historians and information scientists formed a standardization body with the purpose to establish standards for the computerization of nominative, historical source material. The intention is that if all projects work along the same guidelines, it will be easier to exchange the data files and merge them for research on several localities and periods. So far we have published standardization rules for the censuses 1865 to 1910, called Histform - Norwegian standard for data entry and exchange of nominative census data 1865 to 1910. (Tromsø 1994, only published in Norwegian.) A book along the same guidelines for the ministerial records is planned for.
Censuses, ministerial records, emigration registers and so on are all examples of what we can call highly structured historical sources. Even if they may not be totally uniform and immediately ready for computerization, the information in these sources are in most cases put into a fixed structure with predesigned forms, columns, labels etc that we can adjust to a computer format. Other types of historical records do not lend themselves so easily to be brought into the rigid structure of for instance the relational data model. If we take judicial material or newspaper articles as extreme examples, only the simplest classification of this kind of sources can be fitted into the above-mentioned structure without extended and potentially harmful editing. The texts themselves can, however, be successfully treated with software for free text search and retrieval.
Probate registers in this context stand out as an in between case. It is not usual (at least not in Scandinavia) to find predesigned forms for the persons, property and movables listed. The report from the distribution of an estate may at first glance look like the notes the same clerk filed after a meeting in court. At closer inspection, however, crucial differences between the two types of sources can be seen. Different entities can be distinguished: The deceased persons, their relatives and movables. For each of these, certain characteristics are listed in a rather uniform way. We find each person's name and relationship with the deceased, and for each article its value is assessed. If we are to utilize this information in research, a free text retrieval system may not be a very effective tool. We do not only want to be referred to certain kinds of movables, but also to compute their value relative to others articles, sum their values etc.
With such research questions in mind, it seems proper to consider putting the contents of the probate registers into a fixed structure after all. Because there is so little structural information included in the original source, however, it becomes more important than ever to define a standard structure that should be followed by any projects involved in the data entry of probate registers. Since the optimal design of this standard is far from obvious, we will be more than glad for all comments based on the data entry of probate registers in other countries or experiences with other types of source material that can be said to have a "median level of fixed structure".
In Norway the official probate registers date back to the seventeenth century at least for parts of the country. They are a mixed lot: in the beginning the list of belongings, the negotiations in court and the distribution among the inheritors where registered in the same protocol. This system, however, conflicted with the chronology of the cases, so from around the year 1800, it was usual to keep three different series of probate registers for the different aspects of the process of inheritance. Even though the lists of movables and their distribution may seem to be very detailed and reliable, a number of source critical questions should be raised against them, which space does not allow me to go into here. Suffice to say that the records seldom give a complete picture of the assets of the deceased, both because certain items tend to be missing, because their value may be systematically underestimated, and also because the selection of cases filed tend to be biased with respect to social standing, age, and maybe geography.
Even so, the probate registers are used extensively, not only by genealogists, but also by researchers in economic, social and cultural history. They can tell us how people dressed, what they read, the development of agricultural tools can be traced over time and we can study the relationship between wealth and social status. (All the time taking the vital source critical questions into consideration.) The cumbersome and shifting formats of the registers will, however, make it time consuming to use more than a few cases for a specific project. Genealogists, researchers and archivists alike will therefore benefit greatly from computerized access to this information. But considering the bulk of the source material in question, it is not obvious how this can be done in a cost-effective way.
Below I shall discuss three different approaches that may be used to enter the probate registers into the computer. First there is the short cut approach: Only references to where each probate case can be found in the protocols are entered. The second approach is to key in all the information, or most information from each case, using a standard data base system with fields and forms. Since this is labour consuming and involves solving many dubious cases of ambiguity and difficult handwriting, the third approach of scanning will be considered. Such electronic copies or images from the sources must be accompanied by references analogues to the first approach, preferably with summary information about special items inherited.
Indexes to the probate registersMaking references to this source material was not introduced as a result of computerization. Archivists started to produce card catalogues with alphabetical references to farm names 80 years ago. Even before that, the more diligent among magistrates included alphabetical lists of reference in the protocols. Thus, one of our District Archives holds a complete manual index on cards to its probate registers. Another District Archive has computerized its indexes in a data base, claiming this gives great benefits in terms of speedy and flexible access to the records. While the card catalogue has farm name as the sole key word, the data base includes indexes also to the names of the deceased persons. During the last years, therefore, all indexing work in the other district archives is carried out with computer. The Norwegian Historical Data Centre carries out work in this area, helping two of the district archives to complete their indexes.
As can be seen below, the number of data fields used when indexing the probate registers is kept to a minimum. The reason for this is the great number of probate cases that can still only be found by tedious searching in the protocols. Also, a future goal is to include information from the card catalogues in a national data base of references to the probate registers. With limited resources this goal can only be achieved by skipping all information that is not highly relevant when tracing a probate case. Subsidiary information often found on the catalogue cards, such as the summary value of the properties and references to special items like silver or a list of the inheritors, will not be included in the computerized version.
Figure 1: Data entry form for the indexing data base
As can be seen from the screen dump in figure 1, the form includes only information about the positions and names of the deceased person (devisor) and his or her partner. Sometimes it may be difficult to distinguish between the two, and it is therefore important that information about both is mentioned. In addition to the name of the farm there is a field for other place names that may be necessary to identify it in a unique way. The date and protocol numbers are used to look up the probate case in the original sources.
This simple data base format has proved to be an efficient tool for archivists, genealogists and researchers. The latter user group has, however, expressed a need for more details about the inherited property and are seconded by the genealogists as far as information about the inheritors is concerned. How can these needs be met when we devise a standard for the computerization of the probate registers?
"Complete" registration of each probate case The word "complete" should not be taken literally, for two reasons. The computerization of all the contents in the probate registers is a labour demanding task that far exceeds the resources at hand. First, this work can only be undertaken when the computerized version is needed by a well-founded research project. (A researcher should apply for computerization of his resource material along the same lines as he applies for funds and other resources.) "Complete" data entry of probate registers will then only be considered for very limited geographical areas and chronological periods.
Second, we must take into consideration that the registers contain much reiterating information about the persons and their inheritance. People and belongings will be listed when registered and during the distribution of property, maybe also in lengthy disputes in court. As a rule a researcher will be content with data base tables from the first registration of the deceased person's property and his or her beneficiaries. Whenever this list has disappeared from the records, we must be content with the list taken down during the distribution of the belongings. This list may, however, be incomplete and should ideally be supplied with information from any auction or sale that took place in the meantime. A project that is interested in the distribution of property (who got what?) should of course use the latter kind of records.
A "complete" data base from the probate registers then, will contain three separate tables. One table is for information about the deceased person, which will be similar to the one described above. The second table holds information about the beneficiaries' names; their status, relationship to the deceased etc. The third table will contain details about the number, kind and value of the belongings listed in the probate case. Here we also enter all capital assets and debt items listed. One-to-many relationships will be defined between the tables in the form of references from tables two and three to table one. References between tables two and three (who got what?) will, however, as a rule not be implemented.
Figure 2: Data entry form for "complete" registration: the belongings.
Apart from the general problem of difficult handwriting, the most problematic field is the entry of the value assessment. Here, varying denominators were used over time, and it will be very time-consuming and difficult for the data-entry personnel to recalculate these values into any kind of standard estimate. The researcher using the computerized version cannot expect to utilize these values without a careful scrutiny of the source material, and will probably have to encode this information before using it in statistical analysis.
Scanning the probate registers With the methods described above, a basis is laid for computerized analysis of the probate registers. But as mentioned earlier, it will be necessary to leave out all other information than the most central. If the court proceedings and other details, for instance concerning the distribution of the assets, are to be computerized, the only realistic method may be to enter the source material into the computer in the form of images rather than in the form of ASCII files. This scanning technique can be considered an alternative to microfilming, but will give more direct access to the images and possibilities for image enhancement. The basic equipment needed is not very expensive any longer, but with the slow speed of cheap scanners, one should consider the alternatives of either buying more effecient equipment or letting a commercial firm do the scanning job with advanced machinery that produces images on 16 mm film in addition to an electronic version on CD-ROMS. The films might come in handy, since many potential users do not have computers with discs and screens powerful enough to store and show high quality electronic images.
In order to retrieve the electronic copies, the material must be indexed. This can be done along the same lines as the short cut approach mentioned earlier. We must only add a field for page number, since each scanned page will be stored separately. When we access the sources through keys such as the name of the farm or deceased person, we must be able to inspect not only the first page of the probate case, but also to turn over the other pages. This is no problem in strict technical terms and can be done with standard data base programs. It may, however, not be very convenient for some purposes, like comparing information on different pages. Two images cannot, as a rule, be shown on the screen simultaneously. If much of the material then has to be output on a laser printer, one might as well have photocopied the whole material in the first place.
Perhaps the chief drawback of the method is that the information in image form cannot be searched with computer methods whether the images are stored on film or electronically. It is equally impossible to perform statistical analysis. Of course smaller or larger excerpts from the images can be turned into ASCII format by keying it into additional data base tables like in the second approach described above. (OCR is of course not applicable to handwritten material.) Like Winnie the Poo we then say "give me both, please!", an answer very few project leaders can afford to give.
In conclusion the scanning technique should be considered by researchers who need to enhance the readability of their source material or who want to cooperate by sending it via data networks. If the aim is simply to produce copies, other techniques are more effective and much cheaper. Those who want to perform detailed analysis of the probate cases will benefit from computerized ASCII versions that can be sorted, encoded and analysed statistically on the basis of questions arising from the material culture depicted in the registers. For the majority of users, the greatest benefit will come from simple indexes that allow us to find particular probate cases concerning the localities or persons that we are studying.
____________________ | | | Depositor | | | |____________________| | | | | | | v v _________________ _________________ | | | | | Belongings | | Inheritors | | | | | |_________________| |_________________|Figure 3:
Entities in the "complete" computer version of the probate registers.
Registreringssentral for historiske data
Universitetet i Tromsø, N-9037 Tromsø
Oppdatert: 10. november 2004