Skip to content
Penn State University Libraries

Digital Toolkit

Our Digital Toolkit currently includes CONTENTdm, Olive Software and PrimeOCR.

CONTENTdm

CONTENTdm corporate site

About CONTENTdm
CONTENTdm is a server based product used for the creation and presentation of digital collections. Images and associated Dublin Core metadata are loaded into CONTENTdm either as data loads from spreadsheet data and image files or through the ContentDM client - Acquisition Station. The Libraries added the OCR module and the JPEG2000 extension to enhance text searching and viewing large images.

When items are in a ContentDM collection, search options—boolean operators, predefined search terms, keyword searching, etc.—are configured across one or many collections. The patron display includes thumbnail presentation of each digital object with it's title. A click on the thumbnail goes to digital object and it's descriptive metadata. Patrons can save Contentdm favorites as HTML files for reuse in presentations.

The Libraries have customized the Web presence of CONTENTdm to group related collections and to manage image access rights. Digital Libraries Technologies (DLT) has enabled OAI harvesting of public collections served through CONTENTdm.

The interface for a CONTENTdm collection is kept as generic as possible, unless resources are committed during the project planning stage to creating a customized interface. Search options are easily customized, making it possible to perform selected searches within CONTENTdm or from other websites.

Primary Contacts: Linda Klimczyk, Karen Schwentner

Local Project examples:

Pennsylvania History Pictorial Collections includes Mira Dock Forestry Lantern Slide Collection, The O'Connor/Yeager Collection: Pennsylvania Prints from the Palmer Museum of Art, and others

 

Olive Software

Olive corporate site

About Olive: Olive is a content management tool that uses an application called The ActivePaper™ XML Publisher to automatically transform unstructured PDF files into a Rich XML structure and then store it in a web-ready, open source, XML flat-file repository. This XML based process transforms and integrates digital, microfilm, and paper-based content and displays it in a true-to-print quality presentation, bringing together print, archive, and online media.

Current Projects: Historical Digital Collegian, Pennsylvania Civil War Newspapers, The Behrend Beacon, Lancaster Farming, Hazleton Highacres Collegian,

Primary Contacts: Karen Schwentner, Sue Kellerman

PrimeOCR

PrimeOCR corporate site

About PrimeOCR: Prime OCR is an Optical Character Recognition (OCR) tool that is used to convert digitized images of text into ascii text. It is designed to run jobs that are submitted to it. Other OCR tools that are more manual are usually used a page or a small work at a time using a single software OCR tool. Characters that are not recognized by the OCR engine must be corrected or verfied manually.
PrimeOCR combines several OCR engines that work together when needed. This combination of engines reduces the number of characters that are unrecognizable and thus reduces the amount of manual correction. Penn State has three engines available through PrimeOCR.

Primary Contact: Albert Rozo