|Text Processing||Graphics Processing|
|Update 2007 Comments|
You are viewing the Book Window, which is made up of three smaller windows called frames. The Banner Frame, which is the frame at the top of the window, contains the buttons which will take you out of the Book Window. The Main Frame, which is the one you are reading now, is where you will view Allen's book. The frame on the left is the Contents Frame, and it is used to jump directly to a specific section of the book.
The frames can be resized by dragging their borders. You should minimize the Banner Frame by dragging its lower border upwards and leave it there until you are ready to leave the Book Window. You should also minimize the Contents Frame by dragging its right border as far left as it will go. This will maximize the Main Frame while you are looking at the book.
Due to downloading time considerations, Allen's book and the ancillary information has been broken up into 45 sections. Each section should download in a reasonable amount of time. Use the buttons at the top of each page in the Main Frame to move to the Next or Previous section. To cut down on the clutter on each page, I did not provide any buttons to take you to the top of the page. Instead, you can get to the top of each page in one step by pressing Control + Home.
If you need to jump around in the text, you can open the Contents Frame at the left and click on the page you want to go to. Clicking a page in the Contents Frame will cause the chosen frame to replace the one in the Main Frame.
I started scanning in the text and processing it in 1994. I used an HP ScanJet IIcx (300 dpi) to scan each page of the book as a PCX file. The PCX files were then processed with an early version of OmniPage (by Caere). The results were not too good - there was a lot of editing to be done. It would almost have been quicker to type the text in by hand. I tried WordScan next, which significantly lowered the error rate. Caere subsequently bought out WordScan, and the combined technology has been quite good since. Although I long ago finished scanning and OCR'ing the text in, I have used the OCR software from time to time to scan in other information of interest. I have had excellent results recently in converting newsprint, which is about as bad as you can get with OCR.
Each of the 333 pages in the original book had to proofread and edited. I developed a number of simple macros to assist me with this. The macros performed such routine tasks as converting abbreviations such as "b." to "born," "Rev." to "Reverend," etc.
I corrected a lot of routine typos and numbering errors in the original. However, I am sure I made my share of blunders too. Please let me know about any errors you find, and I will try to correct them. You will find a button on the Main Genealogy page labeled "Feedback."
If you are looking for a specific person, word, or phrase, you can go to the Search page via the Banner Frame. This will provide you with a list of hyperlinks to every document containing the word(s) you were looking for. However, it only tells you that what you are looking for is in that document. To find a target within any given document, you can use Edit/Find on the main menu or else press Control + F. Either one will bring up a dialog box which will take you to the item you were looking for.
There are 116 graphics in the book. Except for the family crest and a relatively few line drawings, the majority of the pictures in the book are halftones. These present a particular problem for processing and editing. All of the literature I read in preparation for this task just ignored halftones. They assumed we were scanning line originals or original photographs. In this case, it was not possible - Allen's text with its halftones was all I had available.
There was one exception to this - I have the original of my great grandfather's picture (entry number 642). It has turned to a sepia color since it was taken in 1887, or perhaps it was sepia toned to begin with. In any event, it provided me with the cleanest, clearest picture in the book. I doubt that many of the pictures in the book still survive, but if they do, I would appreciate getting access to them. I would like to replace all graphics converted from halftones with new graphics scanned in from the originals. You can either have them scanned and send me the file as an attachment to an e-mail, or else mail them to me suitably packaged. I will scan them in and return them to you immediately - they will be well taken care of, I promise.
Tech stuff. Skip the next four paragraphs if you don't want a minimal explanation of why halftones are a problem. As near as I could estimate (with a ruler and magnifying glass), there are about 120 dots per inch (dpi) in the book's halftones. Thus a halftone picture contains about 120 X 120 = 14,400 dots in a square inch. I read somewhere that a continuous-tone photograph is the equivalent of approximately 20,000 dpi, or 400,000,000 dots in a square inch. So each dot in a halftone represents the average of 400,000,000 ÷ 14,400 = 27,778 "dots" in a continuous tone original photograph. In other words, you have lost 27,777/27,778 of your structural information in going from the resolution of a photograph to that of a halftone. This information cannot be recovered.
Even worse, since printers can only print black, and not gray, the gray color is approximated by printing a black circle in a white square. Usually a gray scale where white is 255 and black is 0 is used on a scanner , since the human eye cannot differentiate grays scaled much finer that that. A white area is represented by a pure white square. A light gray area is represented by a black dot which partially fills the white square. When the diameter of the circle is about 56% of the side of the square, the gray scale is at its half-way point. Above this, the square becomes black, and darker grays are represented by ever decreasing white circles in a black square. Totally black areas are represented by solid black squares.
So now we look at this field of evenly spaced dots with a scanner. If the resolution of the scanner matches the dot spacing exactly, we will at least recover a gray scale which matches the dot scale. However, if the scanner resolution is different, even on a field of constant gray tone, we will see a larger scale pattern of lights and darks, which will appear as a regularly spaced pattern of blotches overlaying the picture. This is called a Moiré pattern. To make matters even worse, if the scanner is not exactly lined up with the dots in the picture, the Moiré pattern will be tilted as well.
At the 300 dpi resolution of my scanner, I am guaranteed a violent Moiré pattern. Fortunately, the accompanying software allows some adjustment of the apparent resolution (probably by a built-in averaging function). I found by experimentation that setting the resolution to about 72 dpi in software minimized the Moiré patterns, but did not eliminate them completely. I don't know why this worked - it just did. This was of course accompanied by some loss of sharpness. A higher resolution would have only made the dot pattern more intrusive.
This gave me what I judged to be the best possible scan. From here on out, I used Adobe PhotoShop V 5.0 to manually edit the pictures to obtain the best possible images of the subjects. The as-scanned pictures were in PCX format, which was converted to PDF format when imported into PhotoShop.
I am not a graphic artist, and I make no special claims for anything I have done to the pictures. I did what I felt was a minimum job in making them look better. Each picture was treated as seemed best for it. I think they all look better than the as-scanned versions. However, there were some common procedures which were used for most of them, and I describe them in the following sections. Adobe Photoshop was used for all of the graphics except for the family crest, which was first re-created with Adobe Illustrator and then read into Photoshop.
1. If necessary, the picture was rotated ±90° or 180° to bring it into a normal viewing position. In some cases, this is because of the way in which it was scanned; others were in landscape orientation to begin with. In the book, many pictures are in landscape orientation and are viewed by turning the book 90° clockwise. Since this is not practical when they viewed on a monitor, these pictures were rotated for the HTML version.
2. The pictures were cropped to remove excess white space and then converted to RGB mode, which was necessary for some of the processing steps to follow.
3. The images were masked and edited on different layers. The masks were literally created pixel by pixel, a time consuming process. Although each picture was edited on its own merits, the most common separation was into background, clothing, and face. This allowed each major area to adjusted for levels, contrast, brightness, and grayscale distribution separately.
4. In many cases, the backgrounds were blotchy, and they were wholly replaced by created backgrounds which matched the gradations of the original as closely as possible. In a few cases, shadows were created and placed in back of the subjects. In other cases, the blotchy backgrounds were improved by use of the Gaussian blurring filter.
5. Faces were the most difficult. Almost all of the tools were used at one time or another. The most frequently used was the blur tool, modifying the brush and degree of blurring as needed.
6. Clothing, since it was uniformly dark, was smoothed on the wide areas, sharpened at the edges, and adjusted for brightness and contrast. Spreading out the grayscale, which was usually bunched up at the dark end, sometimes brought out clothing detail which was not even discernable in the original halftones. Collars and white shirts were often lightened.
7. Two pixel wide black borders were added to each picture, including the ovals, to match the book. Aliasing was necessary to make the ovals appear smooth. To completely eliminate aliasing in the rectangular graphics, the background color was set to black and then the canvas size was increased by 2 pixels on each side.
8. The final pictures were flattened, converted to TIFF format, and also optimized for the Web. JPEG medium resolution appeared to give the best balance between clarity and speed of download (i.e., small size.) They were then compressed into ZIP files of a size which would allow them to be stored on diskettes.
Technology has moved ahead in the five years since this work was completed. Improvements in scanners and OCR software would have speeded the text preparation significantly, but it does not appear to be worthwhile to re-scan and re-OCR the text at this time. I believe that there are very few errors in the text, and if there are, in those five years no one has pointed any of them out to me.
The graphics are another story. It is probably possible to improve them because of advances in technology. In particular, TWAIN technology, which processes the scanner signal before it is written to the hard drive, incorporates a new algorithm called de-screening that is supposed to be able to reduce the Moiré pattern. And Photoshop has moved ahead by four upgrades to version CS2, and it has a lot of new features that would significantly speed up and improve the editing. At some point - it won't be in the immediate future - I intend to experiment with re-scanning and work out a better method for editing the graphics. The ultimate improvement would be if I could get the originals, but to date no one has offered to lend me any. I wonder if any of the originals still exist.
I have corresponded with Allen's grandson, who tells me that the original copper plates used to print the graphics in his book were found in the attic of one of his relatives when she died. No one saw any value in them, and they were thrown out! Today even the copper in them alone would have been worth a small fortune.
Although many of us now have high speed DSL connections to the Internet, there are still a significant number of dial-up modems out there. For this reason I have not changed the size of the graphics collections or the pages in the book.
This page was last updated on
February 26, 2007 .
Copyright© 1998 - 2002, 2007 by James P. Rosenkrans, IV. All rights reserved.