NARA to Declassify 400 Million Pages of Documents in Three Years 2011/12/06

For a very long time, I have been trying to ask anyone who knows (from my colleagues to the AOTUS himself), why are we even attempting to preserve 250 million emails created during the Bush Administration.  As I’ve mentioned before, that works out to nearly one email every second for eight years!  (And remember, part of that time included Bush’s annual month-long vacations.)  So this story really seemed to give a bit of context in ways that the National Archives (NARA) deals with processing large collections of backlog materials.  “All of these pages had been piling up here, literally,” said Sheryl J. Shenberger, a former CIA official who is the head of the National Declassification Center (NDC) at the National Archives. “We had to develop a Costco attitude: We had 400 million pages . . . and we have three years to do them in.”

If you read Saturday’s article in the Washington Post, you’ll learn that “All of the backlogged documents date back 25 years or more, and most are Cold War-era files from the departments of Defense, State and Justice, among other agencies. The CIA manages the declassification of its own files.”  and that ““The current backlog is so huge that Americans are being denied the ability to hold government officials accountable for their actions,” [AOTUS David] Ferriero said. “By streamlining the declassification process, the NDC will usher in a new day in the world of access.”

If NARA is really trying to declassify, process, catalog, describe, preserve and make these pages available, I hope they’re planning on hiring some more archivists!  The problem is that when institutions are dealing with mass quantities of materials, the (quantitative) metrics we use, may actually hurt us in the future.  In the archival world, the prevailing wisdom seems to be MPLP (More Product, Less Process), but I would argue that archivists need to have qualitative metrics as well, if only to ensure that they are reducing redundancies and older, non-needed versions.  This gets to the crux of the distinction between best practices for records managers and best practices for digital asset managers (or digital archivists).  Ideally, a knowledgeable professional will collect and appraise these materials, and describe it in a way, so that a future plan can be created to ensure that these assets (or records) can be migrated forward into new formats accessible on emerging (or not-yet invented) media players and readers.

Ultimately, this leads to the most serious problem facing archivists: the metadata schemas that are most popular (DublinCore, IPTC, DACS, EAD, etc.) are not specific enough to help archivists plan for the future.  Until our metadata schemas can be updated to ensure that content, context, function, structure, brand, storage media and file formats can be specifically and granularly identified and notated, we will continue paddling frantically against the digital deluge with no workable strategy or plan, or awareness of potential problems (e.g. vendor lock-in, non-backwards compatible formats, etc.)  Sadly, in the face of huge quantities of materials (emails and pages), NARA will probably embrace MPLP, and ultimately hinder and hurt future access to the most important specific files, pages, emails, etc., because they will refuse to hire more professionals to do this work, and will (probably) rely on computer scientists and defense contractors to whitewash the problems and sell more software.

NARA’s Erratic ERA Offers No Content-Searching 2011/10/29

Many of us have been watching the unruly boondoggle of NARA’s ERA over
the years, but this story seems a bit overdue. . . . In a nutshell,
“Searching text impossible on NARA’s e-Records Archive”.
I hope soon they’ll take on the task of separating the wheat from the
chaff of those 250million Bush emails. (nearly one [out-of-office?]
email every second for 8 years)

“People trying to search the text of documents through the National
Archives and Records Administration’s $430 million Electronic Records
Archive are going to be disappointed, according to the agency’s
inspector general.  Under the currently deployed system, users can
search only by metadata. That typically includes tags for information
such as name of the original publication, date of publication, agency
that originated the document, and a small number of keywords. Users
who hope to locate a document by a word or phrase that isn’t part of
the metadata will be unable to. . . .

The public’s ability to use the ERA is likely to be hampered because
of the lack of a full text-based search capability, which would be
similar to what is available on Google.com or other commercial search
engines, NARA Inspector General Paul Brachfeld said in an interview on
Oct. 26.  Lack of full text search “is one of the profound problems
with the ERA at this point,” Brachfeld said. “Metadata alone does not
tell the story of what is in the documents.””


CLIR: Future Generations Will Know More About the Civil War than the Gulf War 2011/09/22

When I was in Queens College Graduate Library School six years ago, I took Professor Santon’s excellent course in Records Management which led me to understand that every institution has to manage its records and its assets and Intellectual Property.   The vital role the archive and records center play for every day use and long-term functions was made clear by the fact that records have a life cycle, basically creation – – use – – destruction or disposition.   The course was excellent, despite the fact that the main text books we used were from the early 1990s (and included a 3 1/4″ floppy that ran on Windows 3.1).

While doing an assignment, I found a more recent article which really led me to a revelation: electronic records will cause a lot of problems!  The one part that stuck out most and I still remember to this day was in a 2002 article “Record-breaking Dilemma” in Government Technology.  “The Council on Library and Information Resources, a nonprofit group that supports ways to keep information accessible, predicts that future generations will know more about the Civil War than the Gulf War. Why? Because the software that enables us to read the electronic records concerning the events of 1991 have already become obsolete. Just ask the folks who bought document-imaging systems from Wang the year that Saddam Hussein invaded Kuwait. Not only is Wang no longer in business, but locating a copy of the proprietary software, as well as any hardware, used to run the first generation of imaging systems is about as easy as finding a typewriter repairman. ” (emphasis added)

Obviously that article impacted my thinking about the Digital Dark Ages greatly, and it got me to wondering what will best practices be for managing born-digital assets or electronic records for increasingly long periods of time on storage media that is guaranteed for decreasing periods of time.  Or  “”We’re constantly asking ourselves, ‘How do we retain and access electronic records that must be stored permanently?'” she said. ”  Well, this gets to the crux of the issue, especially when records managers and archivists aren’t invited into the conversations with IT.  So when we are using more and more hard drives (or larger servers even in the cloud), “Hard-drive Makers Weaken Warranties“.  In a nutshell : “Three of the major hard-drive makers will cut down the length of warranties on some of their drives, starting Oct. 1, to streamline costs in the low-margin desktop disk storage business.”

So if we’re storing more data on storage media that is not for long-term preservation, then records and archival management must be an ongoing relay race, with appropriate ongoing funding and support, as more and more materials are copied or moved from one storage medium to another, periodically, every 3-5 years (or maybe that will soon be  1-3 years?).   Benign neglect is no longer a sound records management strategy.

That’s the technological challenge.  But there’s more!  I’ve gone on and on and on before about NARA’s ERA program and how one top priority is to ingest 250 million emails from the Bush Administration.  (I’ve done the math, it works out to nearly one email every second of the eight years.)  So we know that NARA is interested in preserving electronic records.  But a couple years ago I read this scary Fred Kaplan piece, “PowerPoint to the People: The urgent need to fix federalarchiving policies” in which he learned that “Finally—and this is simply stunning—the National Archives’ technology branch is so antiquated that it cannot process some of the most common software programs. Specifically, the study states, the archives “is still unable to accept Microsoft Word documents and PowerPoint slides.””

Uhhhhh, wait!  Well, at least that was written in 2009, so we can hope they have gotten their act together, but if you think about it too much, you might wonder if EVERYTHING NEEDED TO ARCHIVE IS ON MICROSOFT’S PROPRIETARY FORMATS?  Or you might just be inspired to ask if anyone really uses Powerpoint in the military.  Well, as Kaplan points out “This is a huge lapse. Nearly all internal briefings in the Pentagon these days are presented as PowerPoint slides. Officials told me three years ago that if an officer wanted to make a case for a war plan or a weapons program or just about anything, he or she had better make the case in PowerPoint—or forget about getting it approved.”  Or this piece from the NYTimes “We Have Met the Enemy and He Is Powerpoint” in which “Commanders say that behind all the PowerPoint jokes are serious concerns that the program stifles discussion, critical thinking and thoughtful decision-making. Not least, it ties up junior officers — referred to as PowerPoint Rangers — in the daily preparation of slides, be it for a Joint Staff meeting in Washington or for a platoon leader’s pre-mission combat briefing in a remote pocket of Afghanistan.”

We Have Met the Enemy, and He Is PowerPoint

NARA, Why Is the Government Destroying Our History? 2011/09/07

A colleague posted this sad (but true) story about the National Archives asking “Why Is the Government Destroying Our History?” and I noticed this set-up, “The U.S. National Archives and Records Administration (NARA) said it will destroy millions of federal court records and bankruptcy files from 1970 through 1995 but will hold those records the government deems “historically valuable.” . . . Ok, for those of you who think archivists and information purists are dirty curmudgeons who toil away amid dust balls to avoid socializing (on say, Facebook!), consider what is actually lost when these records are destroyed:

. . . Incrimination.  You are about to hire an executive. You call us to do a background check.  We find out he was charged with running a prostitution ring in the ’80s. Or, you are about to hire a new CFO.  You call us and during our research we find he has filed for personal bankruptcy protection three times in the last 15 years. ”

So, this is very troubling.  Offhand I don’t know what the retention schedules for court records and bankruptcy files are, but now it seems like the historians at NARA are convinced that they can describe these files as having “historic” value, but they won’t go near the “evidential” or “transactional” value.   Professional records managers are not making these decisions at NARA, because they would recognize the legal value.   So NARA, in its “infinite wisdom” will decide whether or not large parts of our shared legal history have “historical value”, at the same time that they believe that redundant digital junk (e.g. 250 million George W. Bush emails) merit long-term preservation, but court records related to criminal activity may not have value in the eyes of a .  They’re going to throw out the original, authentic records and create a black hole in our shared knowledge of our judicial system!

Anyone remember when George W. Bush signed Executive Order 132333 to limit access to President Reagan’s records?  Well, now imagine that NARA is doing the same thing with federal records.  So what does the Federal Records Act (FRA) have to say about court records, or how does NARA deal with court records and bankruptcy files?

Well, in Spring 2008,this power was held by the federal records centers (FRCs) of the National Archives and Records Administration (NARA).  In a promotional piece, “Ready Access NARA’s Federal Records Centers Offer Agencies Storage, Easy Use for 80 Billion Pages of Documents they were providing ready access.   “However, the majority of federal records—approximately 95 percent—are considered “temporary records.” Every temporary record has an official records retention schedule—that is, the amount of time it must legally be preserved for use before it is destroyed (usually by recycling). Retention schedules for temporary federal records vary widely, ranging from a few months to more than a century. For example, most agency information request correspondence is kept for less than a year. Individual tax returns are preserved for seven years. Corporate tax returns, while not considered “permanent,” must be retained for 75 years. And certain aircraft certification engineering files must be kept for 100 years.”

I’m not exactly sure what they are doing ,but I assume it’s something like saying that since the papers were digitized (scanned), the originals are no longer needed.  But for public records, NARA is steward.   “The public can also access federal court records held by FRCs. These records include files from U.S. bankruptcy courts, the U.S. court of appeals, and U.S. district court civil and criminal files. FRCs make court documents available for researchers such as reporters writing stories on high-profile cases, former bankruptcy court litigants applying for mortgages or other loans, companies conducting background checks on individuals, and legal professionals researching precedents.”

Okay, so it’s an interesting piece from NARA, but this part really stopped me in my tracks: “The federal records centers have ably served the federal government and the citizens of the United States for more than 50 years. As the needs of federal agencies change and grow, NARA’s FRCs are also changing and growing to ensure that they will continue to protect the information assets of the federal government.”

I hope I’m not the only person to cry foul on this!  It drives me crazy especially when you check the FRC website and see how heavily invested they are in having a social media presence (Twitter, Facebook).