After two decades of development, Africa Media Online has launched Preservatio, an integrated solution for digital preservation and access made up of a number of digital curation systems to ensure digital asset management and preservation.
Understanding the Need
Any organisation wanting to preserve their identity needs a Digital Archive
Most organisations have a gap appearing in their history between the point at which their physical archive ends and the present, with born-digital files generated over the past decade left ungathered and unprocessed.
The gap is often populated by lost digital files because, for example, they were saved onto a stiffy disk that can no longer be read, or a DVD has become corrupted, a hard drive ‘ticks’ every time you plug it in, your computer contracted a virus, or you mistakenly neglected to rename files and so you overwrote older ones when you copied them to the same folder. Digital files are easy to create and even easier to lose!
Even if the born-digital files exist, they are often scattered across the organisation on various computers or hard drives and “we’re sure that so-and-so has them on a DVD somewhere” becomes a phrase that is all too familiar.
Even if files are gathered to a server somewhere, often they tend to be saved in a folder structure with folders and subfolders and sub-subfolders and frequently only one or two people know how to operate that structure, which means that if they leave the organisation no one can locate anything.
Even if files are on a server, usually that server can only be accessed by internal staff, and so your Digital Archive is not available to the whole community you serve.
Even if you were to make the files available to your community or target audience, you may fear what they will do with them and how you will manage access and use rights. Further you dread the time and energy that is likely to be required to service the masses of requests you will receive from your community.
The solution is a digital preservation and access ecosystem that incorporates an archival digital repository system, a digital asset management system and an engaging fully responsive user interface so that it can be accessed on all types of devices. All organisations have a Raw Digital Archive (the scatter of digital files across the organisations). Some have a Processed Digital Archive (the gathering of files in one place, usually a server being placed in some sort of order). Few organisations however select out the best of these and place them on a Public Digital Archive. A Digital Archive hosted on a digital preservation and access ecosystem is the permanent home for the Public Digital Archive and becomes the official record of your organisation.
Preservatio is made up of a number of layers
Like an aircraft, which is made up of multiple sub-systems, a digital preservation ecosystem is made up of numerous interrelated sub-systems. Building on the approach we took in our earlier systems, Preservatio is built using components that interface with one another through Application Programming Interfaces (APIs). Each component of the system is designed to do one thing well and interface with other self-contained components. This means that we can easily swap old components with newer ones using the latest technology. While such components occur at every level through the system, this approach has also been applied to the macro level of the system – what we call Layers.
The Infrastructure Layer is the hardware upon which the ecosystem runs. This is made up of servers, backup hardware and the network between them. Africa Media Online’s specialised Infrastructure Layer is known as our ‘Preservation Cloud’.
On top of the Infrastructure Layer is the Operating System Layer. Like the operating system sitting on your computer, all programmes need to run on top of the operating system layer. In the case of Preservatio, various subsystems run on various operating systems but it is mostly the Open Source operating system – Linux, Apache, MySQL, PHP (LAMP).
The Asset Management Layer is the real power at the heart of Preservatio. It is the part of the system that actually looks after the digital files and ensures they are backed up. It is made up of a Digital Vault which houses Security Boxes in which the digital files reside and it keeps track of all those files and their derivatives. Preservatio’s asset management layer is called ArchiVault and is developed using Perl, a powerful Open Source programming language.
ArchiVault is not the only Digital Asset Management System (DAMS) we use. Prior to files being ingested into ArchiVault, they are gathered in our work-in-progress module called MediaGraph. This DAMS is ideal for gathering files from across your organisation, managing the use rights and, preparing the files for ingestion into the permanent repository.
And another DAMS, called Picvario, is used on the access end of the ecosystem to allow for the presentation of the files to users.
The Cataloguing Layer is the first of the user interface layers. Preservatio enables the capture of metadata at multiple points in the lifecycle of a file. Embedded metadata can be preserved right through the workflow. For instance, you may capture item level metadata in a desktop programme like Adobe Lightroom or Adobe Bridge or Photo Mechanic. That embedded metadata is maintained and can be worked on in MediaGraph when preparing files for ingestion into ArchiVault or when already being presented to the public in Picvario.
In addition to the above, Preservatio has a specialist cataloguing layer is known as the Metadata App. It enables the capture of metadata in the fields of various open metadata schemata.
The Arrangement Layer is also part of Preservatio’s MEMAT user interface and is known as the MEMAT AL. The MEMAT AL is technology that allows the digital archive to reflect the arrangement of the physical archive that it is a surrogate of. It enables archivists to conform the arrangement of the digital archive to the arrangement described in the finding aid of a collection. We are just getting going on the development of this layer and there are exciting developments to come.
The Presentation Layer is the base of the primary user interface of Preservation and is known as the MEMAT PL. It enables users to search for digital files and explore them in engaging ways. It has been developed in such a way to incorporate or interface with numerous Open Source systems. It is built on the Open Source CMS, WordPress, that uses the PHP programming language, the system also utilises IIIF technology.
The Curation Layer, known as the MEMAT CL, allows system administrators and curators to curate material drawn from the digital archive and showcase the select material to users on the system. The Curation Layer has seen significant development recently allowing administrators to not only create galleries but also timelines and stories. The MEMAT CL has been built by incorporating other Open Source systems such as Timeline JS and Pageflow.
The Market Layer or MEMAT CL, is the part of the user interface that handles all aspects of a user’s access to files including requesting files, the release of files by the system administrator, the transfer of user rights and even, where appropriate, licensing with an e-commerce payment gateway. Again all this functionality has been built with Open Source systems including WooCommerce.
The Design Layer, or MEMAT DL is the skin on top of the user interface that enables the interface to take on the branding and look and feel of an organisation of institution. It has available to it all the functionality that WordPress can supply and as the MEMAT user interface has been developed as a plugin to WordPress with shortcodes, designers can use any WordPress theme to work with in developing the look and feel of the web interface.
Infrastructure Layer: Preservation Cloud
The foundation of the Preservatio digital ecosystem is our Preservation Cloud infrastructure.
Cloud storage enables the distribution of media files across servers situated all over the World presenting both a benefit and a security challenge. When Africa Media Online set out to build the Preservation Cloud, we were aware of the need to construct a solution that benefits from cloud storage while at the same time minimises the security risk. Preservation files are therefore stored on servers in our buildings where we have direct control over them and only derivative files are made available through a web servers that are located in South Africa and, therefore, fall under South African law . Now you can both know where your data is and access it in a secure way from anywhere in the World.
The Africa Media Online Server Room is located in a different building from our main offices where our Backup Room is sited. The Server Room is fitted with biometric access and monitoring, fire suppressant systems and other security measures. In terms of power supply the servers are backed up by a UPS which in turn is backed up by an inverter which in turn is backed up by a generator. The Server Room hosts the main storage servers and the generating servers that generate derivative files when they are loaded.
The Backup Room is located within Africa Media Online’s main offices. It is connected to the Server Room via fiber optic cable so that backups are automatically offsite. The Backup Room houses our LTO tape libraries and has the capability to back up to both LTO6 and LTO7 tapes. In addition we have a system to burn data onto BluRay disks providing another level of offline backup onto write-once media. This room is temperature and humidity controlled and has a backup power supply similar to the server room with the systems supported by a UPS, supported in turn by an inverter which is supported in turn by a generator.
The LTO tapes and BluRay disks that have data written to them are stored in a safe within a fire-proof building in a remote area 20 km away from our main offices.
Access files are synchronised from our storage servers sitting in our server room to to our external web servers located in a secure data centres in different parts of Gauteng Province, South Africa. The two web servers are a mirror of one another providing a fail-over solution for maximum up-time.
Asset Management Layer: ArchiVault
Preservatio’s ArchiVault preserves and secures the digital files
With over a decade of experience in working with media files at global standards, heading up the team that wrote the best practices section of a National Policy on the Digitisation of Heritage Resources, as well as experience in assisting heritage and scientific organisations to write strategies for digital archiving, Africa Media Online brings a wealth of knowledge in the production of media files at the right quality. Such standards are built into ArchiVault and ensure long-term accessibility and usability of media files and their associated metadata.
ArchiVault is a multimedia system preserving and giving access to digital image, manuscript, audio and video files. ArchiVault supports all media types and preservation quality media file formats. Our emphasis is on open file formats that are extensively supported internationally to ensure conformity to long-term preservation requirements.
Digital files are ingested into ArchiVault’s Digital Vault where they are stored permanently in DVD or BluRay size Security Boxes. This means that they can be copied onto both LTO tape and BluRay disks for both offline and write-once media backup. Once a digital file is ingested into a Security Box, it is immutable. It can be superseded by a new version but it cannot be changed thereby preserving digital authenticity.
When digital files are ingested into ArchiVault’s Digital Vault, derivative files used for access are autogenerated from the preservation file including thumbnails, previews and low-res files for image, document, audio and video formats. This means that only the preservation quality file needs to be ingested and the system does the rest of the work.
When digital files are ingested into ArchiVault’s Digital Vault they are checksummed so that the system can detect changes to the file over time and if there are changes which is an indicator of file corruption, the corrupted file can be restored from backups.
A vital preservation service that ArchiVault performs, in addition to the maintenance of your digital archive, is the backup of digital files. The files are stored in disk arrays that use Raid 5 and are also backed up onto LTO tapes for off-site storage and to BluRay disks for write-once offline backup. This ensures the migration of media files over time to new storage media.
ArchiVault also stores all metadata relating to a particular file and makes that available to the Elasticsearch search engine so that the Preservatio ecosystem can draw on the data to enable search and discovery of the file on its MEMAT user interface.
Cataloguing Layer: MEMAT Metadata App
MEMAT’s Metadata App enables the assigning, capture and quality control of metadata
The Administrator Login allows the Administrator to navigate the structure of the archive and identify and assign groups of material to appropriate Metadata Capturers. They also receive completed work back from those Capturers and can quality control that work, including correcting mistakes and providing feedback on the work done. When the Administrator approves the work, it is made live on the MEMAT PL where users are searching and browsing. Administrators can also create multiple capturer accounts on the system.
When metadata capturers log into the system, they can see the work assigned to them and are able to select items to work on and add metadata. The Metadata App utilises a number of metadata schemata including Dublin Core, IPTC and Darwin Core for science museums. A Capturer can chose to work on more than one item at a time if there are common elements that need to be added to all selected items. Once complete, committing the changes sends the work back to the Administrator for quality control.
Presentation Layer: The PL
MEMAT’s Presentation Layer is the web interface for the MEMAT system
The MEMAT 4 PL allows for different levels of password protected access. An unlogged-in user has access simply to search, browse and view galleries. In order to make selections and order files, however, the user must first register on the system and be logged in. An Administrator has access to the backend of the system and can manage users and their access, manage file orders, create galleries and edit metadata. Designers can also be given a login to make design changes to the look and feel of the web interface.
The MEMAT 4 PL has a Design Layer whereby the web interface can take on the unique look and feel of your organisation. It uses a fully responsive design that can work equally well on a large screen desktop or a screen of a smart phone.
The MEMAT 4 PL enables Administrators to create galleries or select materials for all users of the system to see. It also allows users to create private selections that they can share with others. Selections can only be seen by the logged-in user and others with whom the user chooses to share material.
Drawing on the search engine in the AML, the MEMAT 4 PL presents search results beautifully. For manuscripts it can draw on the Optical Character Recognition (OCR) text and on associated metadata enabling the system to return results that conform to the search term, and then, if logged-in as a registered user, by clicking on a particular result, it will search for that term within the manuscript. The system also allows users to browse the structure of the archive both in the browse tab and from the preview page of a particular search result.
MEMAT aims at providing both digital preservation and access. It manages the processes around the order and delivery of files. Users cannot gain access to preservation quality digital files unless they are logged in and order the file. MEMAT 4 puts the control for vetting orders and releasing files in the hands of the System Administrator.
Africa Media Online is the first organisation in Africa to adopt the International Image Interoperability Framework (IIIF) the new international standard for the presentation of heritage and scientific collections online. IIIF brings advanced technology that enables the rapid roll out of immersive online experiences by interacting with digitised collections, such as deep zoom of materials and searching within manuscripts.
Africa Media Online uses the open source WordPress Content Management System (CMS) as a base for the MEMAT PL. With 33% of all websites on the internet powered by the system, it provides a familiar backend for System Administrators enhancing MEMAT’s user-friendliness. While it has a reputation for security issues, at its core it is just as secure as Drupal and Joomla. Africa Media Online has built on that core, disallowing extraneous plugins. WordPress also gives us the ability to rapidly develop new features.