|
APS News | Timeline | Linking
Today and Yesterday | APS:
Sample Uses and Results | Close
Window
Digitizing the American Periodicals Series (APS) Microform Collections
Database Development
American Periodicals Series (APS) Online is part of ProQuest
Information and Learning's "Digital Vault" initiative-the
digitization of the world's largest commercially available microfilm
collection. Stored in three climate controlled, underground vaults,
the ProQuest microfilm collection includes over 5 billion page images
from over 20,000 periodicals, 7,000 newspapers, 400 research collections,
and 1,000,000 dissertations.
Why Digitize APS?
Surveys of academics and librarians showed great enthusiasm for digitizing
the APS microform collections as one of the first Digital Vault products.
Respondents expected to see full page images (a replication of the microfilm)
with searchable ascii text and publication searches by date and issue.
The Digitization Process
Preparations for digitizing the 7,000,000-page APS collections included
developing strict quality control and editorial guidelines. After successful
test runs, the process of digitizing APS in sequence (see the description
of the microform collection) was underway. That process for each periodical
involved
- Pulling the master negative from the vault.
- Preparing and cleaning the film, and building 100-foot rolls with
two page images per frame of film.
- Scanning the film at 300+ dpi resolution.
- Cleaning and enhancing images with filters.
- Quality checks for each of the procedures.
APS Online was released during the summer of 2000 as part of
a 48-month production schedule. APS I and a portion of APS II were completed
and made available in 2001. A customized APS interface was developed
and made available in 2001, and in 2002 new digitization specifications
were introduced. Both the interface and new digitization specifications
were informed by feedback from users, customers, and the APS Advisory
Board.
Digitization of APS I (1740-1800)
Because of the variety of archaic journals, typefaces, and images, the OCR
rate for APS I (1740-1800) material fluctuated greatly. Magazine-like content
was OCR'd primarily in the 70%-90% range. Newspaper-like content, with its
larger image size and multi-column format, posed a greater challenge to OCR,
with ranges generally not reaching the levels of magazine-like content. (This
content is being evaluated for upgrading through methods used for APS II
and APS III, which are described below).
Feedback from customers and users was very positive in regard to ProQuest
having made a vast, primary source, historical content repository accessible
through the modern Online media. Responding to suggestions for more focused
search capabilities, additional information on the more than 1,100 periodicals,
and better image quality, ProQuest initiated new manufacturing procedures
for APS II and III and introduced a customized interface in mid-2001.
Digitization of APS II and APS III
After pages are scanned and captured electronically, each page of each journal
is zoned: the various content elements (including cover, table of contents,
article title and subtitle, article, images, and captions) are electronically
outlined and separated from the page into a unique file. Each outlined area
is deskewed and despeckled, and each outlined area is tagged according to
editorial guidelines. The outlined areas are threaded, meaning they are returned
to their original page location.
Through this procedure, the OCR rate greatly improved, which is especially
critical for "busy pages" (pages with multiple articles or
images). Article titles, image captions, and article abstracts (generally
the first paragraph of each article) are captured at a minimum 99.95%
OCR rate, and the mid 90% range is routine for text, except in cases
where the original source material had flaws. These OCR rates improve
viewing and search capabilities. The zoning process of digitization allows APS users
to access either full page or individual article displays, and to focus
searches on a variety of article types.
The new manufacturing process for APS Online first began to show
during summer 2002. Twenty top journals from the second half of the nineteenth
century were selected as the first to be manufactured under the new process.
Manufacturing of the original APS sequence (continuing APS II) will resume
after the twenty journals are completed, and the previously manufactured
content already available in APS Online will be evaluated for upgrading.
Progress reports on content loading, profiles of new features, and general APS product
information are available in APS News.
Back to the Top
|