jump to navigation

NARA to Declassify 400 Million Pages of Documents in Three Years 2011/12/06

Posted by nydawg in Archives, Digital Archives, Digital Preservation, Electronic Records, Information Technology (IT), Media, Records Management.
Tags: , , , , , ,
add a comment

For a very long time, I have been trying to ask anyone who knows (from my colleagues to the AOTUS himself), why are we even attempting to preserve 250 million emails created during the Bush Administration.  As I’ve mentioned before, that works out to nearly one email every second for eight years!  (And remember, part of that time included Bush’s annual month-long vacations.)  So this story really seemed to give a bit of context in ways that the National Archives (NARA) deals with processing large collections of backlog materials.  “All of these pages had been piling up here, literally,” said Sheryl J. Shenberger, a former CIA official who is the head of the National Declassification Center (NDC) at the National Archives. “We had to develop a Costco attitude: We had 400 million pages . . . and we have three years to do them in.”

If you read Saturday’s article in the Washington Post, you’ll learn that “All of the backlogged documents date back 25 years or more, and most are Cold War-era files from the departments of Defense, State and Justice, among other agencies. The CIA manages the declassification of its own files.”  and that ““The current backlog is so huge that Americans are being denied the ability to hold government officials accountable for their actions,” [AOTUS David] Ferriero said. “By streamlining the declassification process, the NDC will usher in a new day in the world of access.”

If NARA is really trying to declassify, process, catalog, describe, preserve and make these pages available, I hope they’re planning on hiring some more archivists!  The problem is that when institutions are dealing with mass quantities of materials, the (quantitative) metrics we use, may actually hurt us in the future.  In the archival world, the prevailing wisdom seems to be MPLP (More Product, Less Process), but I would argue that archivists need to have qualitative metrics as well, if only to ensure that they are reducing redundancies and older, non-needed versions.  This gets to the crux of the distinction between best practices for records managers and best practices for digital asset managers (or digital archivists).  Ideally, a knowledgeable professional will collect and appraise these materials, and describe it in a way, so that a future plan can be created to ensure that these assets (or records) can be migrated forward into new formats accessible on emerging (or not-yet invented) media players and readers.

Ultimately, this leads to the most serious problem facing archivists: the metadata schemas that are most popular (DublinCore, IPTC, DACS, EAD, etc.) are not specific enough to help archivists plan for the future.  Until our metadata schemas can be updated to ensure that content, context, function, structure, brand, storage media and file formats can be specifically and granularly identified and notated, we will continue paddling frantically against the digital deluge with no workable strategy or plan, or awareness of potential problems (e.g. vendor lock-in, non-backwards compatible formats, etc.)  Sadly, in the face of huge quantities of materials (emails and pages), NARA will probably embrace MPLP, and ultimately hinder and hurt future access to the most important specific files, pages, emails, etc., because they will refuse to hire more professionals to do this work, and will (probably) rely on computer scientists and defense contractors to whitewash the problems and sell more software.

Day of Digital Archives: McLuhan “The [digital] medium is [no longer] the [only] message.” 2011/10/06

Posted by nydawg in Digital Archives, Digital Archiving, Digital Preservation, Education, Information Technology (IT), Media.
Tags: , , , , ,
add a comment
Day of Digital Archives October 6, 2011 Marshall McLuhan: “The Medium Is the Message?” or “The [digital] medium is [no longer] the [only] message.”

This year marks the 100th anniversary of the birth of “the new spokesman of the electronic age”, Marshall (Understanding Media) McLuhan, and digital archivists should take a moment to think about how media, digital and analog, hot and cool, and in many different formats change our jobs, lives and responsibilities. With threats of technological obsolescence, vendor lock-in, hardware failure, bit rot and link rot, non-backwards compatible software, and format and media obsolescence, digital archivists need a system to accurately describe digital objects and assets in their form and function, content, subject, object and context. If we miss key details, we run the risk of restricting access in the future because, for example, data may not be migrated or media refreshed as needed. By studying and understanding media, digital archivists can propose a realistic and trustworthy digital strategy and implement better and best practices to guarantee more efficiency from capture (and digitization or ingest) and appraisal (selection and description), to preservation (storage) and access (distribution).

Over the last ten, forty, one hundred and twenty thousand years, we have crossed many thresholds and lived through many profound media changes– from oral culture to hieroglyphic communications to the alphabet and the written word, and from scrolls to books, and most recently transiting from the Atomic Age (age of atoms) to the Information Age (era of bits). While all changes were not paradigm shifts, many helped shift currencies of trust and convenience to establish new brand loyalties built on threats of imminent obsolescence and vendor lock-in. As digital archivists, we stand at the line separating data from digital assets, so we need to ensure that we are archiving and preserving the assets and describing the content, technical and contextual metadata as needed.

Today, Day of Digital Archives, is a good day to consider Marshall McLuhan’s most famous aphorism, “The medium is the massage,” and update it for the Information Age. In a nutshell, McLuhan argues that “the medium is the message” because an electric light bulb (medium) is pure information (light). He goes on to state: “This fact, characteristic of all media, means that the “content” of any medium is always another medium. The content of writing is speech, just as the written word is the content of print, and print is the content of the telegraph.” (Understanding Media, 23-24) But in the Information Age, the [digital] medium is [no longer] the [only] message. Every born-digital or digitized file is a piece in an environment in which it was created or is accessed, and needs to be described on multiple planes to articulate technical specifications (hardware & software versions, operating system, storage media, file format, encryption) as well as its content. For archivists and librarians describing content, the medium and the message, many use MARC, DublinCore and VRA Core are guides, but PBCore provides a richly defined set of technical, content and Intellectual Property metadata fields to ensure all stakeholders, including IT staff will be able to efficiently access, copy or use the asset (or a copy).

With More Product, Less Process [MPLP] the prevailing processing strategy, many libraries, archives and museums encourage simplified descriptions to catalog digital objects, but these generic descriptions (e.g. moving image, video or digital video) do not provide the most critical information to ensure future users can watch the video online, on an iPad or with a DVD player (or VHS player or film projector). Until digital objects and assets are described in their granular, multi-dimensional digital splendor, we are hurting ourselves and archival access in the future. Once we understand that the medium and message are split into many different categories, we can focus descriptive metadata on critical access points (subject, format or function), and we will not need to panic and makework every time a new [moving image] format [or codec] gains temporary popularity. With better description and critical appraisal at ingest, digital archivists will understand that the medium, the message and the content, subject, structure, form, format and other aspects are all integral parts. At that point we will start to change the commonly-held mindset that “The [digital] medium is [no longer] the [only] message.” 

Whither Appraisal?: David Bearman’s “Archival Strategies” 2011/08/22

Posted by nydawg in Archives, Best Practices, Curating, Digital Archives, Digital Preservation, Education, Electronic Records, Information Technology (IT), Media, Records Management.
Tags: , , , , , ,
1 comment so far

Back in Fall 1995, American Archivist published one of the most controversial and debate-inspiring essays written by archival bad-boy David Bearman of Archives & Museum Informatics from Pittsburgh (now living in Canada).  The essay, “Archival Strategies” pointed to several problems (challenges/obstacles) in archival methods and strategies which, at the time, threatened to make the profession obsolete.   The piece was a follow-up to his “Archival Methods” from 1989 and showed “time and again that archivists have themselves documented order of magnitude and greater discrepancies between our approaches and our aims, they call for a redefinition of the problems, the objectives, the methods or the technologies appropriate to the archival endeavor.”  As he points out in Archival Strategies, “In Archival Methods, I argued that “most potential users of archives don’t,” and that “those who do use archives are not the users we prefer.””

This disconnect between archives and their future users led Bearman to write “I urged that we seek justification in use, and that we become indispensable to corporate functioning as the source of information pertaining to what the organization does, and as the locus of accountability.”  With his well-stated pithy aphorisms like “most potential users of archives don’t,” and that “those who do use archives are not the users we prefer,” he was able to point to the serious problem facing us today: past practices have led us to preserve the wrong stuff for our unprefered users!  Of course Information Technology has led us down this road since computer storage is marketed as so cheap (and always getting cheaper),  and it seems much easier to store everything than to let an archivist do his job starting with selection and appraisal, retention and preservation, arrangement and description, and access and use.

Ultimately, his essay is a clarion call for archivists to establish a clear goal for the profession, namely to accept their role in risk management and providing accountability for the greater societal goal.  The role of an archivist, in my opinion, is to serve as an institution’s conscience!  Perhaps that is the reason why library science and archival studies are considered science.   He suggests that strategic thinking is required “Because strategic thinking focuses on end results, it demands “outcome” oriented, rather than “output” oriented, success measures. For example, instead of measuring the number of cubic feet of accessions (an output of the accessioning process), we might measure the percentage of requests for records satisfied (which comes closer to reflecting the purpose of accessioning).”

This seminal essay is a fascinating read and groundbreaking analysis of the sorry state of appraisal.  “What we have actually been doing is scheduling records to assure that nothing valuable is thrown away, but this is not at all equivalent to assuring that everything valuable is kept.  Instead, these methods reduce the overall quantity of documentation; presumably we have felt that if the chaff was separated from the wheat it would be easier to identify what was truly important.  The effect, however, is to direct most records management and archival energy into
controlling the destruction of the 99 percent of records which are of only temporary value, rather than into identifying the 1 percent we want, and making efforts to secure them.”

Using incendiary language, Bearman goes on to state the obvious:  “Appraisal, which is the method we have
employed to select or identify records, is bankrupt.  Not only is it hopeless to try to sort out the cascade of “values” that can be found in records and to develop a formula by which these are applied to records, 16 it wastes resources and rarely even encounters the evidence of those business functions which we most want to document.”

2D lifecycle or 3D continuum

This is a revolutionary essay, and I strongly encourage every archivist to read it and think about it deeply.  The ideas have mostly languished and been ignored in this country as we continue to use the life cycle model, but Bearman’s ideas are written in the international standards for records management (ISO 15489) and  widely embraced in Australia (and China) where, over the last two decades, they have conceptualized and implemented the “Australian records continuum” model to great effect and, in doing so, they are looking at born-digital assets and electronic records from perspectives of all users, functions, and needs.  In my opinion, it seems like the continuum model is a 3D version of the lifecycle, which reminds me of this image from A Wrinkle in Time in which Mrs. Who and Mrs. Whatsit explain time travel to Meg and Charles Wallace by showing how an ant can quickly move across a string if the two ends are brought closer together.   In other words, if archivists look at the desired end result, they can appraise and process accordingly.


After reading the Bearman essay for the first time and seeing how it has caused such dramatic changes in archival conceptualizations, methods, strategies and processes elsewhere, but is still not taught in any depth in US library or archival studies schools, I spoke with other nydawg members, and we decided to use it as the text as for our next discussion group on Tuesday August 23.   I hope to revisit this topic later.

One last point.  Because of the deluge of materials accessioned by archives, “uncataloged backlog among manuscripts collections was a mean of nearly one-third repository holdings”, leading the authors to claim “Cataloging is  function that is not working.”  With budgets cut and small staffs unable to make progress, Mark Greene and Dennis Meissner wrote another revolutionary piece titled “More Product, Less Process: Pragmatically Revamping Traditional Processing Approaches to Deal with Late 20th-Century Collections” [MPLP] which was a plea for minimal processing.

Unlike Bearman’s “Archival Strategies”, MPLP leads archivists to believe that we must remain passive filers or describers or catalogers or undertakers.  But without a better understanding of appraisal and how to do it, we are doomed with analog, paper, born-digital or electronic records!  The clearest example of this is the National Archives and Records Administration’s Electronic Records Archive (ERA) which, according to Archivist of the United States David Ferriero “At the moment, most of the electronic records in ERA are Presidential records from the George W. Bush White House.  This important collection includes more than 200 million e-mail messages and more than 3 million digital photographs, as well as more than 30 million additional electronic records in other formats. ”

A few weeks ago, I actually crunched the numbers and figured out that 200 million emails over the course of eight years works out to nearly one email a second!  (365 days a year x 8 years = 2920 days plus 2 (leap year days)  2922 x 24 hours a day = 70,128 hours x 60 mins in an hour = 4,207,680 x 60 seconds per minute = 252,460,800. )
After doing the math, my first thought was, “if we’re trying to process and preserve every email sent every second by the George W. Bush Administration, we must be doing something wrong.”  And now, I think I understand the problem: we’re not doing effective appraisal.  Although we still have to wait for public access to the emails, I am fairly confident that researchers will find that nearly 90 percent of the collection are duplicates, or that they are keeping copies of the sent email, the different received emails, plus backups of all of them.  With better appraisal, this task should not be so difficult, and would leave more time for catalogers to do more detailed descriptions (which will be more important later, especially with different formats of “moving images” which are not compatible  with newer versions of hardware (e.g. iPads don’t play Flash Video).