jump to navigation

NARA to Declassify 400 Million Pages of Documents in Three Years 2011/12/06

Posted by nydawg in Archives, Digital Archives, Digital Preservation, Electronic Records, Information Technology (IT), Media, Records Management.
Tags: , , , , , ,
add a comment

For a very long time, I have been trying to ask anyone who knows (from my colleagues to the AOTUS himself), why are we even attempting to preserve 250 million emails created during the Bush Administration.  As I’ve mentioned before, that works out to nearly one email every second for eight years!  (And remember, part of that time included Bush’s annual month-long vacations.)  So this story really seemed to give a bit of context in ways that the National Archives (NARA) deals with processing large collections of backlog materials.  “All of these pages had been piling up here, literally,” said Sheryl J. Shenberger, a former CIA official who is the head of the National Declassification Center (NDC) at the National Archives. “We had to develop a Costco attitude: We had 400 million pages . . . and we have three years to do them in.”

If you read Saturday’s article in the Washington Post, you’ll learn that “All of the backlogged documents date back 25 years or more, and most are Cold War-era files from the departments of Defense, State and Justice, among other agencies. The CIA manages the declassification of its own files.”  and that ““The current backlog is so huge that Americans are being denied the ability to hold government officials accountable for their actions,” [AOTUS David] Ferriero said. “By streamlining the declassification process, the NDC will usher in a new day in the world of access.”

If NARA is really trying to declassify, process, catalog, describe, preserve and make these pages available, I hope they’re planning on hiring some more archivists!  The problem is that when institutions are dealing with mass quantities of materials, the (quantitative) metrics we use, may actually hurt us in the future.  In the archival world, the prevailing wisdom seems to be MPLP (More Product, Less Process), but I would argue that archivists need to have qualitative metrics as well, if only to ensure that they are reducing redundancies and older, non-needed versions.  This gets to the crux of the distinction between best practices for records managers and best practices for digital asset managers (or digital archivists).  Ideally, a knowledgeable professional will collect and appraise these materials, and describe it in a way, so that a future plan can be created to ensure that these assets (or records) can be migrated forward into new formats accessible on emerging (or not-yet invented) media players and readers.

Ultimately, this leads to the most serious problem facing archivists: the metadata schemas that are most popular (DublinCore, IPTC, DACS, EAD, etc.) are not specific enough to help archivists plan for the future.  Until our metadata schemas can be updated to ensure that content, context, function, structure, brand, storage media and file formats can be specifically and granularly identified and notated, we will continue paddling frantically against the digital deluge with no workable strategy or plan, or awareness of potential problems (e.g. vendor lock-in, non-backwards compatible formats, etc.)  Sadly, in the face of huge quantities of materials (emails and pages), NARA will probably embrace MPLP, and ultimately hinder and hurt future access to the most important specific files, pages, emails, etc., because they will refuse to hire more professionals to do this work, and will (probably) rely on computer scientists and defense contractors to whitewash the problems and sell more software.

Whither Appraisal?: David Bearman’s “Archival Strategies” 2011/08/22

Posted by nydawg in Archives, Best Practices, Curating, Digital Archives, Digital Preservation, Education, Electronic Records, Information Technology (IT), Media, Records Management.
Tags: , , , , , ,
1 comment so far

Back in Fall 1995, American Archivist published one of the most controversial and debate-inspiring essays written by archival bad-boy David Bearman of Archives & Museum Informatics from Pittsburgh (now living in Canada).  The essay, “Archival Strategies” pointed to several problems (challenges/obstacles) in archival methods and strategies which, at the time, threatened to make the profession obsolete.   The piece was a follow-up to his “Archival Methods” from 1989 and showed “time and again that archivists have themselves documented order of magnitude and greater discrepancies between our approaches and our aims, they call for a redefinition of the problems, the objectives, the methods or the technologies appropriate to the archival endeavor.”  As he points out in Archival Strategies, “In Archival Methods, I argued that “most potential users of archives don’t,” and that “those who do use archives are not the users we prefer.””

This disconnect between archives and their future users led Bearman to write “I urged that we seek justification in use, and that we become indispensable to corporate functioning as the source of information pertaining to what the organization does, and as the locus of accountability.”  With his well-stated pithy aphorisms like “most potential users of archives don’t,” and that “those who do use archives are not the users we prefer,” he was able to point to the serious problem facing us today: past practices have led us to preserve the wrong stuff for our unprefered users!  Of course Information Technology has led us down this road since computer storage is marketed as so cheap (and always getting cheaper),  and it seems much easier to store everything than to let an archivist do his job starting with selection and appraisal, retention and preservation, arrangement and description, and access and use.

Ultimately, his essay is a clarion call for archivists to establish a clear goal for the profession, namely to accept their role in risk management and providing accountability for the greater societal goal.  The role of an archivist, in my opinion, is to serve as an institution’s conscience!  Perhaps that is the reason why library science and archival studies are considered science.   He suggests that strategic thinking is required “Because strategic thinking focuses on end results, it demands “outcome” oriented, rather than “output” oriented, success measures. For example, instead of measuring the number of cubic feet of accessions (an output of the accessioning process), we might measure the percentage of requests for records satisfied (which comes closer to reflecting the purpose of accessioning).”

This seminal essay is a fascinating read and groundbreaking analysis of the sorry state of appraisal.  “What we have actually been doing is scheduling records to assure that nothing valuable is thrown away, but this is not at all equivalent to assuring that everything valuable is kept.  Instead, these methods reduce the overall quantity of documentation; presumably we have felt that if the chaff was separated from the wheat it would be easier to identify what was truly important.  The effect, however, is to direct most records management and archival energy into
controlling the destruction of the 99 percent of records which are of only temporary value, rather than into identifying the 1 percent we want, and making efforts to secure them.”

Using incendiary language, Bearman goes on to state the obvious:  “Appraisal, which is the method we have
employed to select or identify records, is bankrupt.  Not only is it hopeless to try to sort out the cascade of “values” that can be found in records and to develop a formula by which these are applied to records, 16 it wastes resources and rarely even encounters the evidence of those business functions which we most want to document.”

2D lifecycle or 3D continuum

This is a revolutionary essay, and I strongly encourage every archivist to read it and think about it deeply.  The ideas have mostly languished and been ignored in this country as we continue to use the life cycle model, but Bearman’s ideas are written in the international standards for records management (ISO 15489) and  widely embraced in Australia (and China) where, over the last two decades, they have conceptualized and implemented the “Australian records continuum” model to great effect and, in doing so, they are looking at born-digital assets and electronic records from perspectives of all users, functions, and needs.  In my opinion, it seems like the continuum model is a 3D version of the lifecycle, which reminds me of this image from A Wrinkle in Time in which Mrs. Who and Mrs. Whatsit explain time travel to Meg and Charles Wallace by showing how an ant can quickly move across a string if the two ends are brought closer together.   In other words, if archivists look at the desired end result, they can appraise and process accordingly.


After reading the Bearman essay for the first time and seeing how it has caused such dramatic changes in archival conceptualizations, methods, strategies and processes elsewhere, but is still not taught in any depth in US library or archival studies schools, I spoke with other nydawg members, and we decided to use it as the text as for our next discussion group on Tuesday August 23.   I hope to revisit this topic later.

One last point.  Because of the deluge of materials accessioned by archives, “uncataloged backlog among manuscripts collections was a mean of nearly one-third repository holdings”, leading the authors to claim “Cataloging is  function that is not working.”  With budgets cut and small staffs unable to make progress, Mark Greene and Dennis Meissner wrote another revolutionary piece titled “More Product, Less Process: Pragmatically Revamping Traditional Processing Approaches to Deal with Late 20th-Century Collections” [MPLP] which was a plea for minimal processing.

Unlike Bearman’s “Archival Strategies”, MPLP leads archivists to believe that we must remain passive filers or describers or catalogers or undertakers.  But without a better understanding of appraisal and how to do it, we are doomed with analog, paper, born-digital or electronic records!  The clearest example of this is the National Archives and Records Administration’s Electronic Records Archive (ERA) which, according to Archivist of the United States David Ferriero “At the moment, most of the electronic records in ERA are Presidential records from the George W. Bush White House.  This important collection includes more than 200 million e-mail messages and more than 3 million digital photographs, as well as more than 30 million additional electronic records in other formats. ”

A few weeks ago, I actually crunched the numbers and figured out that 200 million emails over the course of eight years works out to nearly one email a second!  (365 days a year x 8 years = 2920 days plus 2 (leap year days)  2922 x 24 hours a day = 70,128 hours x 60 mins in an hour = 4,207,680 x 60 seconds per minute = 252,460,800. )
After doing the math, my first thought was, “if we’re trying to process and preserve every email sent every second by the George W. Bush Administration, we must be doing something wrong.”  And now, I think I understand the problem: we’re not doing effective appraisal.  Although we still have to wait for public access to the emails, I am fairly confident that researchers will find that nearly 90 percent of the collection are duplicates, or that they are keeping copies of the sent email, the different received emails, plus backups of all of them.  With better appraisal, this task should not be so difficult, and would leave more time for catalogers to do more detailed descriptions (which will be more important later, especially with different formats of “moving images” which are not compatible  with newer versions of hardware (e.g. iPads don’t play Flash Video).


Libraries Outsource Digitization, Get Access, Give Away IP: Preservation Is Not a One-Time Cost 2011/08/21

Posted by nydawg in Archives, Digital Archives, Digital Preservation, Information Technology (IT), Intellectual Property.
Tags: , , , ,
add a comment

One thing that still baffles me, is the idea (still taught in many library schools), that once done, digitization projects preserve materials forever.   Occasionally, there’s a few people who mention that “standards change” or “software becomes obsolete” or “media rot” or “links rot” or “bits rot” or ideas of “media refreshment” or “migration” or emulation or whatever, but mostly, the conventional wisdom is that when a collection is digitized it will remain accessible online and will be preserved “forever.”

So I was gratified to notice this important bit in the Blue Ribbon Task Force on Sustainable Digital Preservation and Access‘s Interim Report “Sustaining the Digital Investment:  Issues and Challenges of Economically Sustainable Digital Preservation” pdf (not the FINAL report):  “More than this, preservation is not a one-time cost; instead, it is a commitment to an ongoing series of costs, in some cases stretching over an indefinite time horizon.” (p. 18)  Since this is such an important point especially when so many digitization projects are funded using transient or “one-time” vehicles, I was a little disappointed that this point was ignored and was not included in the Final Report.

The closest their “Sustainable Economics for a Digital Planet: Ensuring Long-Term Access to Digital Information” Final Report comes to mentioning this is: “Decisions about longevity are made throughout the digital lifecycle. Bench scientists face a choice at the end of every research project about what to do with their data. . . . The curatorial team at a museum that is bringing down an exhibition for which many original images and design materials were created faces similar decisions about what to retain and whose responsibility it is to provide and fund long-term retention. Preservation decision makers run the gamut from university provosts, foundations, and philanthropic funders to anyone who has created a website that has potential value for reuse.” (p. 11)

So I saw this piece, “Step Easily into the Digital Future” on American Libraries Magazine which shows that this idea has taken hold among cash-starved libraries looking for an easy (and cheap!) way to digitize their collections once and for all.  Unfortunately, the trade-off seems to be that they are willing to pay for it AND let the digitizers keep the originals– while libraries can copy access copies. . . .  . Hmm.

“Libraries know the future is digital, but how do we get there in these times of shrinking budgets and staffs? In a tough economy, a collaborative approach makes digitization possible for many libraries. By joining a mass digitization collaborative, the historical society, museum, public library, or academic institution new to digitization can launch a small project and unlock the doors to their hidden collections for the first time; the larger university or cultural heritage institution can mount a large-scale project and quickly achieve a digitization goal at low cost.  The Lyrasis Mass Digitization Collaborative (MDC) is an example of a sustainable model that does not rely exclusively on grants or one-time funding; the collaborative works for libraries and cultural heritage institutions of all types and sizes.”

. . . “The MDC, administered by Lyrasis in partnership with the Internet Archive, is arguably the best deal going for libraries and similar institutions to get significant quantities of printed materials digitized and online-accessible very quickly and inexpensively,” said Gregory S. Sigman, acting librarian for the Music/Dance Library at Ohio University, in Lyrasis’s Solutions Magazine.

so how does it work?  It’s sounds so easy, as long as you don’t think too much about it?  (Wait, the MDC gets to own to Intellectual Property too?)  “Participating in the collaborative makes digitization easy for participants, whatever the size of their collection and budget, and whether or not they have experience and staff expertise in digitization. In the collaborative model, many steps along the way to digitization are already in place.

Participants do not need to purchase equipment, select a metadata schema or digitization standards, set up a technical infrastructure for digitization and delivery, or provide for hosting, storage, and preservation. They follow best practices and collection development guidelines established by the collaborative. The entire project workflow is already set up and streamlined. The process is extremely simple and conducive to very quick turnaround: Libraries place an order; select items for digitization; prepare metadata; and ship or deliver to the scanning center. The collaborative shares the new digital resources on the web through its partnership with the Internet Archive and the archive’s involvement in the Open Content Alliance. Participants may also download copies of the digital resources to add to their own digital collections.”

While cost-effective and efficient, passing off these responsibilities to third-parties will not necessarily help libraries for much longer.  As the Interim Report points out: “This preliminary finding (the authors note that their work is early and suggestive rather than exhaustive and definitive.) points to the importance of effective management
strategies early in the life cycle of information, confirming archivists long held belief, based on their experience, that preservation begins at creation. Not all material acquired may have been created with preservation in mind, however; the observation is most relevant for organizational settings where there is a requirement and a commitment to maintaining a record for the long-term. Further, acquisition and ingest tends to have a high setup cost, and particularly in the early days of digital preservation, every acquisition seems unique, requiring specialized lengthy analysis and processing strategies. Acquisition of at least partially processed material will become
more routine and, most likely, more standardized over time.”

Let’s hope so, but it might be helpful if the BRTF’s Final Report addressed many of the good ideas presented in the Interim Report, and libraries would begin to understand that the most cost-effective way to digitize collections (after initial start-up costs and equipment purchases) may be  DIY.

Oh, and meanwhile, Internet Archive Canada is laying off 75% of its staff!  “Though the office had initially experimented with automated scanning robots, the machines were unable to adapt to the wide variety of manuscripts and books. In 2005, the Internet Archive developed their own machines called Scribes, equipped with two high-resolution digital cameras poised above a v-shaped desk.  These machines require human operators to turn the pages, meaning that they are more expensive to run than automated robots, but can handle fragile texts. An experienced operator can turn and scan two pages every six seconds, but layoffs mean the number of operators will drop from 27 down to 11. Output is expected to drop significantly, from current levels of around 1,500 books a month to 250.”



Salman Rushdie’s Papers Accessioned by Emory; Access Thru Emulation 2011/08/19

Posted by nydawg in Digital Archives, Digital Preservation, Electronic Records, Information Technology (IT), Records Management.
Tags: , , , , , , ,
add a comment

One of the early stories that encouraged discussion among early nydawg members, was this story in the NYTimes about Emory accessioning author Salman Rushdie’s papers including diaries, notebooks, journals, notes, stickies, four Apple computers, a hard drive and 18 Gigabytes of born-digital materials.  The article, “Fending Off Digital Decay, Bit by Bit” is an interesting look at one institution’s attempt to capture and appraise the work of a living artist and attempting to use emulation and migration as a preservation strategy.  A few months later, there was a fascinating multi-part series, “Born-Digital: The New Archive part 3“, from World Policy Institute Blog which mentioned the Rushdie model.

“In 2007, Emory acquired Salman Rushdie’s papers, which included a
“hundred linear feet of his paper material, including diaries, notebooks,
library books, first-edition novels, notes scribbled on
napkins, but also forty thousand files and eighteen gigabytes of data
on a Mac desktop, three Mac laptops, and an external hard drive.” Much
has been written about Emory’s important achievement, but it should be
noted that Emory only focused on Rushdie’s Macintosh Performa 5400 to
test the emulation of the complete desktop environment.

As the authors of “Digital Materiality” note, Rushdie’s use of
Stickies (electronic Post-It notes) on his early Mac “provides
insights into [Rushdie’s] tendencies to meld the personal and the
literary” and reinforces the “importance of providing both file-level
access and operating system-level access.” According Kenneth
Thibodeau, in his report on “The State of Digital Preservation,”
Emory’s emulation is technically a step in the right direction but
ultimately a deficient one.
. . .

The Times article goes on to point out: ”

Leslie Morris, a curator at the Houghton Library, said, “We don’t really have any methodology as of yet” to process born-digital material. “We just store the disks in our climate-controlled stacks, and we’re hoping for some kind of universal Harvard guidelines,” she added.

Among the challenges facing libraries: hiring computer-savvy archivists to catalog material; acquiring the equipment and expertise to decipher, transfer and gain access to data stored on obsolete technologies like floppy disks; guarding against accidental alterations or deletions of digital files; and figuring out how to organize access in a way that’s useful.

At Emory, Mr. Rushdie’s outdated computers presented archivists with a choice: simply save the contents of files or try to also salvage the look and organization of those early files.” and “At the Emory exhibition, visitors can log onto a computer and see the screen that Mr. Rushdie saw, search his file folders as he did, and find out what applications he used. (Mac Stickies were a favorite.) They can call up an early draft of Mr. Rushdie’s 1999 novel, “The Ground Beneath Her Feet,” and edit a sentence or post an editorial comment.  “I know of no other place in the world that is providing access through emulation to a born-digital archive,” said Erika Farr, the director of born-digital initiatives at the Robert W. Woodruff Library at Emory. (The original draft is preserved.)”

In fact, come to think of it, this was probably the first mention we archivists ever heard of digital forensics!  “Located in Silicon Valley, Stanford has received a lot of born-digital collections, which has pushed it to become a pioneer in the field. This past summer the library opened a digital forensics laboratory — the first in the nation.  The heart of the lab is the Forensic Recovery of Evidence Device, nicknamed FRED, which enables archivists to dig out data, bit by bit, from current and antiquated floppies, CDs, DVDs, hard drives, computer tapes and flash memories, while protecting the files from corruption.”

As former head of NARA’s Electronic Records Archive Kenneth Thibodeau wrote in 2002 Overview of Technological Approaches to Digital Preservation and Challenges in Coming Years

“Every digital object is a physical object, a logical object, and a conceptual object, and its properties at each of those levels can be significantly different. A physical object is simply an inscription of signs on some physical medium. A logical object is an object that is recognized and processed by software. The conceptual object is the object as it is recognized and understood by a person, or in some cases recognized and processed by a computer application capable of executing business transactions.”

In other words, the metadata of a digital object (asset, record, electronic record) needs to accurately describe its content and format and/or medium, context including copyrights, permissions, operating systems, and Intellectual Property holders, and function, purpose, intended audience, etc.


Excuse Me:. . . Some Digital Preservation Fallacies? 2011/08/13

Posted by nydawg in Digital Archives, Digital Preservation.
Tags: , ,
add a comment

This article by Chris Rusbridge [Director, Digital Curation Centre, University of Edinburgh]
seemed to generate the most interest of nydawg googlegroup members last year, so re-read or share with someone else on #AskArchivists or #DODA day!

“Since then, a number of common assertions, or perhaps assumptions, about digital preservation have begun to worry me. No one person has said all these things, but increasingly they seem to be in the  background of conversations. I will put these forward as a list of
statements, but, in some respects at least, I think they are  fallacies:

1. Digital preservation is very expensive [because]
2. File formats become obsolete very rapidly [which means that]
3. Interventions must occur frequently, ensuring that continuing costs remain high.
4. Digital preservation repositories should have very long timescale aspirations,
5. ‘Internet-age’ expectations are such that the preserved object must be easily and instantly accessible in the format de jour, and
6. the preserved object must be faithful to the original in all respects.

These statements seem reasonable, and perhaps they are. However, I feel we might benefit from a rather jaundiced look at them. So that is what I thought I would attempt for this article. Beware, the arguments presented here are not settled in my mind; indeed this is to some extent part of an argument with myself!



SAA Announces Digital Archives Specialist (DAS) Certificate Curriculum 2011/08/13

Posted by nydawg in Archives, Digital Archives, Digital Archiving, Education.
Tags: , ,
add a comment

This is really exciting to me! Last year, I volunteered my services and time and Society of American Archivists [SAA] President Dr. Helen Tibbo appointed me to act as a member of the 5-person Digital Archives Continuing Education Task Force.  We had a few conference calls and met in Chicago for most of a weekend last October, and I’m thrilled to see that our DAS certificate program proposal and revisions have been accepted by the Education Committee and will be offered beginning in September 2011!

I should give a big thanks and congratulations to all the hard-working members of the DACE Task Force including Chairman Geof Huth (Director of Government Records, New York State Archives),  SAA President Dr. Helen Tibbo (Distinguished Professor at University of North Carolina), Jackie Esposito (University Archivist, Penn State University), Mahnaz Ghaznavi (Archivist, Loyola Marymount University), Solveig DeSutter (SAA Director of Education), and yours truly, David Kay, MLS (Director of Archives & Digital Archivist, Little Airplane Productions).

“SAA is committed to providing education and training to ensure that archivists adopt appropriate practices for appraising, capturing, preserving, and providing access to electronic records. That’s why we’ve developed the Digital Archives Specialist (DAS) Curriculum and Certificate Program, designed to provide you with the information and tools you need to manage the demands of born-digital records.  The DAS Curriculum, developed by experts in the field of digital archives, is structured in tiers of study that guide you to choose courses based on your specific knowledge, training, and needs. You can choose individual courses—or you can take your learning to the next level by earning a Digital Archives Specialist Certificate from SAA after completing required coursework and passing both course and comprehensive examinations.

Read all the details and check out course offerings here:

nydawg “”Digital Archiving in the Information Age” 2010/12/22

Posted by nydawg in Archives, Education.
Tags: , ,
add a comment

nydawg is the new york digital archivists working group.  Founded by David Kay, MLS in January 2010, we are archivists, media librarians and records managers concerned about the transition from the Atomic Age (the age of atoms) to the Digital Era & the Information Age (the age of bits).  This blog provides a platform to select & share GoogleGroup discussions with the public at large as we analyze news, stories, changes, documentation strategies, technical specifications and pursue open standards for near-term access, middle-term storage and long-term preservation of today’s born-digital assets and electronic records. We hope one day through research to find new methods, define better and best practices,  and to provide training to current and future archivists.  In short, I want this nydawg blog to discuss
“Digital Archiving in the Information Age.”

If you want to join our GoogleGroup (with posting permissions), please email me directly.