Archiving Workflow Considerations
Below are some somewhat chronological steps to best practices for workflow
Accession (choices of media and method for delivery)
- Type of asset: born digital. The moving image you’re preserving was captured and finished in a file-based way and has been delivered to your archive that way.
- Type of asset: digitization of legacy works (scanning, processing, restoration). The moving image has been scanned or encoded from an analog source, and is usually sent to you in a “raw” and a “finished” state. Example: “raw” scans are those that are the capture of the film frame by the scanner before any digital restoration efforts are made; “finished” are final, color-corrected, clean images. Click here for a template RFP for scanning services.
- Mode of access for archival data (LTO, drives, etc.; production upload direct-to-cloud). Moving images have typically been delivered to archives on data tapes and drives, but, increasingly, cloud-based delivery is becoming an option. The archivist should specify as much as possible how these assets are to be delivered, up to and including: digest/checksum inclusion, file-naming conventions, file-based specifications on format, content in line with industry standards, minimum metadata, etc.
Validation and Ingest
- Confirmation of delivery. Ensure that you have a complete asset from your delivery on removable media or cloud. Is it one file? If it is more than one file, how many? Does the asset meet the specs you require? Compare the digest/checksum against the one supplied. Do they match? Has there been any data loss in transit? If the data has been delivered on tapes, can you restore it? Do you have the capability of reading a particular flavor of LTFS, or can you unbundle an RAR file?
- Normalization of file types, creation of proxies. In certain cases, archivists may choose to normalize highly proprietary file types into more standardized, ubiquitous file-types. Another step that can be taken is to create video proxies of image-based file sets. For example, a creation of a proxy of all raw scans captured for a restoration can be a simple and useful way of having access to what was on an original negative.
- Validation. Validation is a detailed technical and human process of ensuring that the set of files you will be preserving have arrived in the state they were intended. A lot can go wrong with digital files in transit from one place to another, and a lot can go wrong when productions are rushed to move a specific set of files to tapes or drives. The validation can include visual QC as well as technical QC for errors such as file-size changes, etc. This validation document walks through example validation steps for new and newly-restored motion pictures.
- Documentation and metadata. For moving images, the types of metadata that are important to collect and maintain in systems/databases are the following:
- Descriptive metadata: This refers to cast, crew, summaries, ratings, and other pertinent data. This metadata is often held by entertainment companies in many languages. This is title-based metadata.
- Technical metadata: Information documenting the technical information about a specific asset, including file type, version, aspect ratio, length, resolution, etc.
- Material history metadata: Information on dates of creation of different materials, vendors used to create materials, sources used to make new materials, preservation and restoration reports, etc.
Technical metadata is most critical to archival processing, and has to be as accurate as possible. Much of it can be automated. Tools such as MediaInfo can scrape metadata from headers and files. If assets are delivered to pre-determined specs, this information can also be used as technical metadata. Accurate technical metadata is the key to automation of archival and other workflows. Without consistent accurate information, procedures cannot be run across archival repositories for any of the myriad distribution needs an entertainment company might need.
Because the archivist is often processing many files at once, either for a single asset or multiple assets, tools to help collect and bulk-upload technical metadata are pretty ubiquitous now. Rarely do you find a digital archivist manually entering data into attributes.
- Ingest, write-out, final file movement. Archives will have either an archival infrastructure on-prem, a cloud-based infrastructure, or an off-line data storage system. The protocols for moving files to their final archival location differ, but it’s critical that technical metadata be kept in a repository. If the files are kept off-line, automation of asset transformation processes will not be possible. Storage policies for preservation are critical and can be automated.
- Fixity. This is the process of using checksums/digests/hash to compare the health of a file over time. Fixity can be made to be part of an automated storage policy that includes saving replicates so that corrupted files can be swapped out with good copies. This paper written for the ETC in 2019 explores the issues around fixity in the cloud:
- Types of digital storage. Most entertainment companies are going to have company-wide approaches to digital storage. Archivists working in those companies will likely have access to those given their archival needs and will need to make sure that storage policies are implemented in on-prem or cloud-based solutions. If the archivist must manage their own digital storage that is not part of an on-prem or cloud-based infrastructure, they must select types of storage that meet their requirements. Most off-line archives are on tape in entertainment companies, and specifically “LTO” tape. Offline tape-based data storage offers lower-cost, stable storage. Its limitations include difficulty of access, expensive migration, and limited automation.
This AMIA webinar provides a good overview of other related issues:
- Digital Asset Management Systems. Most entertainment companies have digital asset management and/or media asset management systems. The role of these systems is to provide a repository for digital assets as well as metadata associated with those assets. The systems aid search and discovery of these assets but also often integrate automated services such as transcoding, cloud integration, work-in-progress collaboration, and many others. The archivist working with large-scale digital preservation at an entertainment company will want to include digital preservation requirements (such as storage policies and fixity) into their company system. Another useful concept for a DAM/MAM is ensuring a one-to-many relationship between asset record and specific files. A final digital intermediate can be 20,000 files, and a preservation requirement is one in which both an asset and its component files can be addressed equally.
- Creating photochemical elements of digitally captured assets. Entertainment companies sometimes create film elements from final digital assets. “YCMs” or “separation masters” are sometimes created for new features. These black-and-white film representatives of the three primary colors are seen as an extra archival step. Similarly on digital restorations, a “film out” is often done at the end of the restoration to photochemically capture the work that was done digitally.