As the project winds down to a close (or up to the release!), we will be sharing and documenting decisions made over the course of the project.The goal here is to share decisions that will be helpful to understanding our workflow, and we hope will they be useful to those undertaking similar digitization and description projects.
Project overview in regards to digitization
Due to the scale of the project and types of materials to be digitized we decided, from the outset, to use an outside vendor for all digitization (15 serial titles, 382 reels microfilm, 40 boxes photographs). The grant narrative, digitization RFP, and materials selection were completed prior to my (Mandy) arrival at Smith as Metadata Archivist, therefore, I’ve tried to include these decisions in my documentation to the best of my ability and to the extent to which they were documented and communicated to me. Below I will outline digitization decisions based on format type, as each type presented its own challenges and exceptions.
Scope of Digitization
- 15 serials | approximately 48,092 files | 600 ppi TIFFs
- 40 boxes photographs | approximately 22,960 files (back and front) | 600 ppi JPEGs
- 382 reels of microfilm | approximately 1,059,167 files | 600 ppi TIFFs
The vendor used an automated process to capture microfilm images. If a frame was too light or didn’t have clear edges, the process may not have captured that image. In addition, they assessed the reels for total condition and set one setting for the scanning that would be appropriate for the totality of the reel, not each individual frame.
The original reels vary greatly from frame to frame in contrast and clarity, and, as a result, some frames may not have been captured to their optimum clarity or settings. For these reasons, it became clear very early on that the digitization process for the microfilm would not capture 100% of the frames on film. As checking each reel against the digitized version would be time consuming and labor intensive, we decided to proceed with what we had and assume that the frames not captured were minor in amount. This decision was based on performing manual checks on a sample of digitized reels.
Throughout the descriptive metadata process, if it seemed like a significant portion was missing, it was checked against the original and evaluated for rescanning. Some rescanning (manual and automated) was done by the vendor, if it was identified during the active scanning project. Some rescans were done manually in house by the project’s student assistant. In-house rescans were limited in scope and occured after vendor digitization had wrapped up.
Another challenge that came up early on in the digitization process was due to the quality of the microfilm itself. There are three sets of YWCA microfilm held by Special Collections and Smith Libraries: set 1 is an access copy; set 2 is a preservation copy; and set 3 is an ILL copy. Set 2 was originally sent to the vendor for digitization. After the initial files were returned, it became clear that the set was in poor condition, leading to poor digitization results. As I was doing QC checks against set 1, it seemed as though set 1 was in better condition for scanning. As a result the sets were switched out after initial digitization. Files already delivered from set 2 were retained, and no reels were re-digitized after the switch.
Serials were scanned in black and white or grayscale, according to vendor estimate. Some materials, though a minority, do contain color but were still scanned grayscale or black and white. When available, duplicate copies of the serials were used for digitization. Duplicate bound volumes were disbound for ease of digitization.
Photographs were digitized according to guidelines set by an archivist previously on the project team. The vendor was instructed to digitize the front and back (if any content on the back) of photographs and postcards only and to not digitize any other items in the folders. Items not within the scope of digitization included photocopies, clippings, envelopes, negatives, and pamphlets. These instructions were communicated to the vendo in a document providing examples and explanations. These guidelines proved difficult for the vendor to enforce and discern, especially regarding what constitutes a photograph and what is considered supporting materials. As a result, the digitized photograph materials include photocopies, clippings, and other supporting materials, though not 100% of those materials. It seems to have depended on the scanning technician’s interpretation of the documentation. These digitized items have been retained.
Furthermore, it became clear through the description process that some of those materials that were not slated for digitization are, in fact, integral to the context of the photograph. Examples of these are an index to numbered photograph series, a photograph caption that became detached, or a letter describing a photograph. In these cases, we decided to digitize those items in house, if they were not already digitized by the vendor. No negatives were digitized as part of this project.
In addition to specifying what physical items should be scanned, documentation was shared with the vendor specifying digitization requirements. The grant requested 600 ppi TIFF files for each photograph, front and back, only when there are markings on the back. Greyscale was requested for all photographs, unless the original photograph is a color photograph. Greyscale was used for the backs of all photographs when digitized. After initial digitized photographs were returned, it was discovered that the vendor did not have the capability to produce TIFF’s as desired. Due to general language in the vendor estimate, and, after several meetings between the vendor and project team members, we decided to accept JPEG. As mentioned, any additional digitization of contextual materials or missed items were performed in house. When we rescanned in house, we followed our own guidelines for digitization.
Photographs also presented a challenge, in relation to the proposed scope of digitization outlined in the grant proposal. The original grant proposed digitizing 48 linear feet of boxes with an estimated output of 20,000 TIFF files. The amount was based off of a sampling of several boxes of folders. As digitization progressed, we reached the 20,000 TIFF files after 40 boxes or about 14 linear feet. We decided to stop there for several reasons. First, the cost estimate was based on the 20,000 TIFF files, so we were concerned about exceeding our budget. Second, the vendor could not produce TIFF images to our satisfaction and standards, so we decided to cease digitization with them once the threshold was met.
Overall, the digitization process accomplished what the original writers of the grant intended in addition to establishing workflows and guidelines for future projects. Through our experiences, successes, and pain points, future digitization projects will benefit from more specific digitization guidelines, suggested workflows for vendor supported digitization, a more rigorous RFP process incorporating up to date standards, and a clearer path to communication with outside vendors.