Digitising South African books: Dilli
Dilli is an initiative to digitise South African books and make them available in new platforms on new formats to new markets. These include the distribution of digital content via the web and printing on demand for publishers. Dilli is a joint project between ViaAfrika, Media24 and NB Publishers (encompassing Tafelberg, Human & Rossouw, Jonathan Ball and more).
Making South African books available electronically meant that:
- One centralised, electronic resource would be much more accessible and manageable;
- Electronic searches by keywords, titles and other tags would make finding relevant publications extremely easy;
- It would open up access to a far wider customer and publisher base, creating a larger market;
New revenue streams could be explored including:
- Making previously out-of-print titles available once again; – Selling electronic versions of books, rather than just hardcopies; and – Offering print-on-demand options for titles; Storage space and up-front publishing overheads would be reduced significantly as large print runs would not be required anymore.
The client’s requirements
FirstCoast Technologies was commissioned to help with the creation of this digital book library, which consists of these critical elements:
- Content creation: The actual scanning of old books and creating electronic content in various file formats.
- The book repository: A place where the digital content can be stored and managed.
- The middleware: Interactive tools which connect the front-end distribution media (e.g. websites) and the repository.
- The front end: Various media for distribution of content to internal and external parties, e.g. websites.
Before FirstCoast could recommend a solution, we needed to know what Dilli’s requirements were regarding:
- What the end product would be (and whether there would be more than one for each item)
- What storage space limitations there were
- What distribution channels were to be put in place
- The volumes involved and the file formats needed
- The level of quality and associated monitoring that was desired
- Optical character recognition (OCR) and text requirements: would the end users want to search just titles or the text within the books?
- Black and white, greyscale and colour considerations (influences whether dual streaming is necessary)
- The need for image editing during the scanning process.
FirstCoast was up for the challenge
With the answers to these questions, FirstCoast prepared a solution that incorporated a variety of world-leading products:
While more recent publications are available readily in electronic format, many books have been published that are now only available in hardcopy. It was this content that needed to be scanned, converted into searchable assets and saved into the repository. Scanning on this scale needs a well-developed process that can handle the demands of volume and quality. The quality of the scanned content is particularly important, as it impacts the ‘sellability’ of the end products that are accessed through the website, on-demand printing, etc. It is expensive and time-consuming to edit images so it was important that this was minimised through high-quality scanning. FirstCoast recommended two products: Inotec Scamax M06 scanners, which can scan up to 150 pages per minute at high quality; and the Zeutschel OS 12000 book scanner, which again offers excellent image quality and is particularly suited to handling bound documents.
The book repository
The repository and management system that stores the content is the main asset and most critical part of the project. Without a secure, searchable, robust database, the repository will fall over when large volumes are achieved. FirstCoast recommended the use of OpenText Alchemy for this as it offers a strong, flexible storage solution and offers excellent data security. It also allows for the creation of multiple asset and product databases, which can be saved on a network or locally. FirstCoast also developed a management tool and relational database, which houses all the content’s meta data.
Middleware and front-end
‘Middleware’ describes the tools that enable the repository, or book database, to interact with the distribution media. In this case the front-end distribution media took the form of internal and external channels, including websites, print on demand facilities, research, mobile phone access, and more. Each medium requires the content to be set up in a specific way so that it is obtainable in the correct format by the end user. FirstCoast also ensured that there was the facility to restrict access to certain content and products. The result is that Dilli now has a robust, easy-to-use repository into which they can constantly add new material. Setting up a facility like this means that they will be able to capitalise on new revenue streams, such as making out-of-print books available once more.
The products that FirstCoast used and recommended to create a digital archive for Dilli were:
OpenText Alchemy: For creating a digital archive and searchable materials.
Inotec Scamax M06 150 page scanners: For fast, high quality scanning.
Zeutschel OS 12000 C book scanner: Excellent image quality and good for bound documents.
For more information or to find out how FirstCoast can help your company digitise and archive your materials, please contact First Coast Technologies to see what we could do for you.