Storage

Archiving is in large part a question of storage of data.

There are many dimensions to archival storage today. There is a massive amount of film in many gauges, from 8mm to 70mm, in studio libraries and archives around the world. While there is no standard for storage, there are abundant and reliable recommended practices. A respected source for information on how to store film is the Image Permanence Institute based at the Rochester Institute of Technology.

Archives also contain massive amounts of television film (both edited and unedited) and analog and digital video. The Image Permanence Institute is an excellent source of information and recommended practices for this type of material as well. The Library of Congress also disseminates information on all aspects of conservation.

There is a history of digital storage media comprised of many platform-specific forms. Digital containers including Digital Betacam, the “D” series of videotape (especially D-5), and Sony’s HDCAM series of cassettes were important in the transition from analog to digital image storage. In audio, a number of digital cassettes (DA-88, for example) became very popular for audio capture and short-term storage. Many of these formats were used as direct follow-on media in the transition out of analog video and production. However, many were also used as the basis of early video production and production of digital features. These formats are all “platform dependent” and obsolete insofar as the technologies are no longer supported. Use of these formats was specified and controlled by manufacturers. In these cases, the format specifications are found in the technical manuals for the devices.

From about the 1990s on, aspects of digital media began to emerge. Requirements of interoperability and mobility led increasingly away from platform based solutions and toward the separation of containers from data content. Thus, a file-based system of media began to emerge. In this era, many storage types began to be used, from AIT and S-AIT, Exabyte, Zip drives, hard-disk drives, RAID arrays, DCT, and DLT to the  early LTO tape generations and the T10000 series of computer data storage. The importance of this phase was the gradual emergence of file-based data that was largely free of any specific physical container and could be moved into and out of computers and onto any form of digital storage. Nevertheless, this era was impeded by the widespread success of Sony’s HDCAM SR format, which contained multiple audio tracks and digital image and audio that was easily moved into computers or other systems for production, broadcast, or storage. At this point, in the media industry, HDCAM SR was the de facto standard for all who could afford it.

The watershed moment for the emergence of file-based workflows came on March 11, 2011. The Tohoku earthquake which devastated Japan destroyed Sony’s Sendai Technology Center, the sole manufacturing plant in the world of HDCAM SR tape, leading to a quickly developing shortage of new HDCAM tape stock. This development forced many productions facilities to abandon HDCAM SR and to finally enter the digital era through file-based workflow, which was not dependent on this very specific and high-quality proprietary product. It was at this point that the widespread industry solution of LTO tape came to the fore.

Data in the media business now starts with camera files, in any of a number of formats, and the audio stems, normally in WAV format.

Camera file formats are designed and specified for each camera by its manufacturer (e.g., Arri, Red, Sony). These proprietary file formats usually involve large amounts of compressed data  representing luminance and color. This may be full RGB data, or direct output from an image sensor with a color-filter array (such as a Bayer filter); in the latter case, the image data must be demosaiced (“debayered”) to obtain the full RGB data .  There are also other elements which involve context and sequencing. But at the base of the archive are massive numbers of code values describing color and luminance for every pixel or pixel block.

These proprietary formats are transcoded to files in any of a variety of standardized formats — TIFF, DPX, EXR, etc. (See the page on File Formats and Codecs.) These standardized files allow the data to be processed by a wide variety of software across diverse hardware platforms. A second set of code values — proprietary and diverse like camera files — have to do with the codecs used in the creation of visual effects. These are largely specific to software products (e.g. 3ds Max, After Effects, Flame, Houdini, Nuke, Photoshop, ZBrush); once these were mainly hardware-specific, but now they tend to be platform-agnostic.

These data files can be stored on any available storage device, including hard-disk drives, RAID arrays, solid-state devices (flash drives), optical storage devices (CDs, DVDs, Blu-rays — standardized in their consumer forms but less standardized in for archival data storage), and data tape. These storage objects are data-agnostic — it is possible to store any kind of data on them. At present, the most ubiquitous is the LTO, a high-capacity tape storage format that follows a development path prescribed by rules set down by a consortium. (See also “Linear Tape-Open.”)

Cloud Storage and Digital Preservation

Entertainment companies have embraced “the cloud” for its potential to help support production and distribution. It enables flexible computing resources for processes like special effects, access to key materials remotely from pre-vis to post-production, delivery to fans at home.  As a result, archivists and other technology stakeholders responsible for the longevity of films and television programs are seeing the storage of assets migrating to the cloud.  The benefits are clear:  ready access to assets now and in the future for long-term distribution through the ever-growing streaming options for fans; availability of those same assets for transformation to the next generation of display technology, or yet-to-be-imagined viewing experiences.

Some key issues archivists are facing in their care of assets in the cloud:

  • The concept of fixity:  Right now, this is the way that archivists can definitively prove that assets have not been changed or corrupted.  It is very much an on-prem concept and is discussed further at this link (ETC) and this talk (Sam).
  • Unique identifiers for assets: often a file name is used as an identifier by software and current processes to identify a file as an asset. A file name is not unique, and retrieval of that file from a cloud service without a unique identifier can be very problematic.
  • Replication: In on-prem preservation, infrastructure replication is a key principle. Multiple copies of assets are registered and stored in specific sites for disaster recovery and fixity. In the cloud, replication happens as a matter of course but is not necessarily tracked asset-per-asset.
  • Multiple vendors: Archivists are using multiple cloud vendors to store assets as a way of mitigating disaster that can result from hacking, natural disaster, financial collapse, or other risk.