jump to navigation

NARA to Declassify 400 Million Pages of Documents in Three Years 2011/12/06

Posted by nydawg in Archives, Digital Archives, Digital Preservation, Electronic Records, Information Technology (IT), Media, Records Management.
Tags: , , , , , ,
add a comment

For a very long time, I have been trying to ask anyone who knows (from my colleagues to the AOTUS himself), why are we even attempting to preserve 250 million emails created during the Bush Administration.  As I’ve mentioned before, that works out to nearly one email every second for eight years!  (And remember, part of that time included Bush’s annual month-long vacations.)  So this story really seemed to give a bit of context in ways that the National Archives (NARA) deals with processing large collections of backlog materials.  “All of these pages had been piling up here, literally,” said Sheryl J. Shenberger, a former CIA official who is the head of the National Declassification Center (NDC) at the National Archives. “We had to develop a Costco attitude: We had 400 million pages . . . and we have three years to do them in.”

If you read Saturday’s article in the Washington Post, you’ll learn that “All of the backlogged documents date back 25 years or more, and most are Cold War-era files from the departments of Defense, State and Justice, among other agencies. The CIA manages the declassification of its own files.”  and that ““The current backlog is so huge that Americans are being denied the ability to hold government officials accountable for their actions,” [AOTUS David] Ferriero said. “By streamlining the declassification process, the NDC will usher in a new day in the world of access.”

If NARA is really trying to declassify, process, catalog, describe, preserve and make these pages available, I hope they’re planning on hiring some more archivists!  The problem is that when institutions are dealing with mass quantities of materials, the (quantitative) metrics we use, may actually hurt us in the future.  In the archival world, the prevailing wisdom seems to be MPLP (More Product, Less Process), but I would argue that archivists need to have qualitative metrics as well, if only to ensure that they are reducing redundancies and older, non-needed versions.  This gets to the crux of the distinction between best practices for records managers and best practices for digital asset managers (or digital archivists).  Ideally, a knowledgeable professional will collect and appraise these materials, and describe it in a way, so that a future plan can be created to ensure that these assets (or records) can be migrated forward into new formats accessible on emerging (or not-yet invented) media players and readers.

Ultimately, this leads to the most serious problem facing archivists: the metadata schemas that are most popular (DublinCore, IPTC, DACS, EAD, etc.) are not specific enough to help archivists plan for the future.  Until our metadata schemas can be updated to ensure that content, context, function, structure, brand, storage media and file formats can be specifically and granularly identified and notated, we will continue paddling frantically against the digital deluge with no workable strategy or plan, or awareness of potential problems (e.g. vendor lock-in, non-backwards compatible formats, etc.)  Sadly, in the face of huge quantities of materials (emails and pages), NARA will probably embrace MPLP, and ultimately hinder and hurt future access to the most important specific files, pages, emails, etc., because they will refuse to hire more professionals to do this work, and will (probably) rely on computer scientists and defense contractors to whitewash the problems and sell more software.

Comparing Documentation Strategy of Civil War and First Gulf War 2011/11/21

Posted by nydawg in Archives, Best Practices, Digital Archives, Digital Preservation, Media, Records Management.
Tags: , , , ,
add a comment

I’ve said it before, and I’ll say it again (paraphrasing someone
else): “We are at risk of knowing less about the events leading up to
the First Gulf War than events leading up to the Civil War, because
all of the records and documents from the Civil War were conserved and
preserved, whereas all the records from the First Gulf War were
created on Wang Wordprocessors and never migrated forward and now lost
forever.”

Case in point: Lincoln at Gettysburg; photo by Matthew Brady
http://blogs.archives.gov/prologue/?p=2564

Or 1991 Gulf War speech by Sec of Def Cheney:
http://en.wikipedia.org/wiki/File:Cheney_Gulf_War_news_conference.jpg
http://upload.wikimedia.org/wikipedia/commons/thumb/5/52/Powell,_Schw…

or http://www.pbs.org/mediashift/2007/08/the-tangled-state-of-archived-n…

NARA’s Erratic ERA Offers No Content-Searching 2011/10/29

Posted by nydawg in Archives, Digital Archives, Electronic Records, Records Management.
Tags: , , ,
add a comment

Many of us have been watching the unruly boondoggle of NARA’s ERA over
the years, but this story seems a bit overdue. . . . In a nutshell,
“Searching text impossible on NARA’s e-Records Archive”.
I hope soon they’ll take on the task of separating the wheat from the
chaff of those 250million Bush emails. (nearly one [out-of-office?]
email every second for 8 years)

“People trying to search the text of documents through the National
Archives and Records Administration’s $430 million Electronic Records
Archive are going to be disappointed, according to the agency’s
inspector general.  Under the currently deployed system, users can
search only by metadata. That typically includes tags for information
such as name of the original publication, date of publication, agency
that originated the document, and a small number of keywords. Users
who hope to locate a document by a word or phrase that isn’t part of
the metadata will be unable to. . . .

The public’s ability to use the ERA is likely to be hampered because
of the lack of a full text-based search capability, which would be
similar to what is available on Google.com or other commercial search
engines, NARA Inspector General Paul Brachfeld said in an interview on
Oct. 26.  Lack of full text search “is one of the profound problems
with the ERA at this point,” Brachfeld said. “Metadata alone does not
tell the story of what is in the documents.””

http://fcw.com/articles/2011/10/26/nara-electronic-archive-has-fundamental-flaw-in-search–it-says.aspx

Day of Digital Archives: McLuhan “The [digital] medium is [no longer] the [only] message.” 2011/10/06

Posted by nydawg in Digital Archives, Digital Archiving, Digital Preservation, Education, Information Technology (IT), Media.
Tags: , , , , ,
add a comment
Day of Digital Archives October 6, 2011 Marshall McLuhan: “The Medium Is the Message?” or “The [digital] medium is [no longer] the [only] message.”

This year marks the 100th anniversary of the birth of “the new spokesman of the electronic age”, Marshall (Understanding Media) McLuhan, and digital archivists should take a moment to think about how media, digital and analog, hot and cool, and in many different formats change our jobs, lives and responsibilities. With threats of technological obsolescence, vendor lock-in, hardware failure, bit rot and link rot, non-backwards compatible software, and format and media obsolescence, digital archivists need a system to accurately describe digital objects and assets in their form and function, content, subject, object and context. If we miss key details, we run the risk of restricting access in the future because, for example, data may not be migrated or media refreshed as needed. By studying and understanding media, digital archivists can propose a realistic and trustworthy digital strategy and implement better and best practices to guarantee more efficiency from capture (and digitization or ingest) and appraisal (selection and description), to preservation (storage) and access (distribution).

Over the last ten, forty, one hundred and twenty thousand years, we have crossed many thresholds and lived through many profound media changes– from oral culture to hieroglyphic communications to the alphabet and the written word, and from scrolls to books, and most recently transiting from the Atomic Age (age of atoms) to the Information Age (era of bits). While all changes were not paradigm shifts, many helped shift currencies of trust and convenience to establish new brand loyalties built on threats of imminent obsolescence and vendor lock-in. As digital archivists, we stand at the line separating data from digital assets, so we need to ensure that we are archiving and preserving the assets and describing the content, technical and contextual metadata as needed.

Today, Day of Digital Archives, is a good day to consider Marshall McLuhan’s most famous aphorism, “The medium is the massage,” and update it for the Information Age. In a nutshell, McLuhan argues that “the medium is the message” because an electric light bulb (medium) is pure information (light). He goes on to state: “This fact, characteristic of all media, means that the “content” of any medium is always another medium. The content of writing is speech, just as the written word is the content of print, and print is the content of the telegraph.” (Understanding Media, 23-24) But in the Information Age, the [digital] medium is [no longer] the [only] message. Every born-digital or digitized file is a piece in an environment in which it was created or is accessed, and needs to be described on multiple planes to articulate technical specifications (hardware & software versions, operating system, storage media, file format, encryption) as well as its content. For archivists and librarians describing content, the medium and the message, many use MARC, DublinCore and VRA Core are guides, but PBCore provides a richly defined set of technical, content and Intellectual Property metadata fields to ensure all stakeholders, including IT staff will be able to efficiently access, copy or use the asset (or a copy).

With More Product, Less Process [MPLP] the prevailing processing strategy, many libraries, archives and museums encourage simplified descriptions to catalog digital objects, but these generic descriptions (e.g. moving image, video or digital video) do not provide the most critical information to ensure future users can watch the video online, on an iPad or with a DVD player (or VHS player or film projector). Until digital objects and assets are described in their granular, multi-dimensional digital splendor, we are hurting ourselves and archival access in the future. Once we understand that the medium and message are split into many different categories, we can focus descriptive metadata on critical access points (subject, format or function), and we will not need to panic and makework every time a new [moving image] format [or codec] gains temporary popularity. With better description and critical appraisal at ingest, digital archivists will understand that the medium, the message and the content, subject, structure, form, format and other aspects are all integral parts. At that point we will start to change the commonly-held mindset that “The [digital] medium is [no longer] the [only] message.” 

WikiLeaks’ Cablegate Links State Dept. Bureau of Diplomatic Security to Madness 2011/09/28

Posted by nydawg in Archives, Digital Archives, Digital Preservation, Electronic Records, Information Technology (IT), Media, Privacy & Security, Records Management, WikiLeaks.
Tags: , , , ,
add a comment

For the last year or so, I’ve been fascinated by the whole WikiLeaks Cablegate story.  As I posted previously, there are a number of factors that contribute to this story which make it particularly interesting for people concerned with records  management and best practices for accessing and sharing information.   In my opinion, Private first class Bradley Manning is a fall guy (lipsynching to Lady Gaga), but problems revealed serious systemic malfunctions.  So I was very interested to read this article by Andy Kroll: “The Only State Dept. Employee Who May Be Fired Over WikiLeaks“.

Peter Van Buren is no insurgent. Quite the opposite: For 23 years he’s worked as a foreign service officer at the State Department, and a damn good one from the looks of it. He speaks Japanese, Mandarin Chinese, and Korean; served his country from Seoul to Sydney, Tokyo to Baghdad; and has won multiple awards for his disaster relief work. So why was Van Buren treated like a terror suspect by his own employer? For linking to a single leaked cable dumped online by WikiLeaks earlier this month.”

Well, this led me to read a TomDispatch.com posting by Van Buren himself which offers a clear-headed look at the madness!  For one thing, Van Buren got into a heap of trouble and was “under investigation for allegedly disclosing classified information” for LINKING to a WikiLeaks document which was already on the Web!  As he put it: “two DS agents stated that the inclusion of that link amounted to disclosing classified material. In other words, a link to a document posted by who-knows-who on a public website available at this moment to anyone in the world was the legal equivalent of me stealing a Top Secret report, hiding it under my coat, and passing it to a Chinese spy in a dark alley.”

Van Buren goes on to analyze the situation by stating: “Let’s think through this disclosure of classified info thing, even if State won’t. Every website on the Internet includes links to other websites. It’s how the web works. If you include a link to say, a CNN article about Libya, you are not “disclosing” that information — it’s already there. You’re just saying: “Have a look at this.”  It’s like pointing out a newspaper article of interest to a guy next to you on the bus.  (Careful, though, if it’s an article from the New York Times or the Washington Post.  It might quote stuff from Wikileaks and then you could be endangering national security.)”

And, for me, the cherry on the top, and something I’ve been trying to state for most of the last year (including at the Archivists Round Table of Metropolitan New York meeting in January 2011), is the fact that “No one will ever be fired at State because of WikiLeaks — except, at some point, possibly me. Instead, State joined in the Federal mugging of Army Private Bradley Manning, the person alleged to have copied the cables onto a Lady Gaga CD while sitting in the Iraqi desert. That all those cables were available electronically to everyone from the Secretary of State to a lowly Army private was the result of a clumsy post-9/11 decision at the highest levels of the State Department to quickly make up for information-sharing shortcomings. Trying to please an angry Bush White House, State went from sharing almost nothing to sharing almost everything overnight. They flung their whole library onto the government’s classified intranet, SIPRnet, making it available to hundreds of thousands of Federal employees worldwide. . . . . State did not restrict access. If you were in, you could see it all. There was no safeguard to ask why someone in the Army in Iraq in 2010 needed to see reporting from 1980s Iceland. . . . . Most for-pay porn sites limit the amount of data that can be downloaded. Not State. Once those cables were available on SIPRnet, no alarms or restrictions were implemented so that low-level users couldn’t just download terabytes of classified data. If any activity logs were kept, it does not look like anyone checked them.

In other words, by pointing the finger of blame at a few (two) bad apples (Pfc Manning and Foreign Services Officer/ Author Van Buren), “… gets rid of a “troublemaker,” and the Bureau of Diplomatic Security people can claim that they are “doing something” about the WikiLeaks drip that continues even while they fiddle.”  Yet, the State Department and the Department of Defense still refuse to acknowledge the systemic problems of trying to provide UNRESTRICTED and UNTRACEABLE ACCESS to ALL CABLES to all LEVELS of employees from the highest administrative levels at State and Defense  to the lowliest of the low  (Private first class on probation or a contractor, like Aaron Barr, working in White Hat or Black Hat Ops.)  Okay, according to Homeland Security Today, there’s 3 million people (not just Americans, btw) with “secret” clearance and “only” half a million with access to SIPRNet!

This still strikes me as an example of the US acting like ostriches and burying its head so we will not have to acknowledge the serious problems that are all around us.  Mark my words: the system is still broken, and even though certain changes have been instituted (thumb drive bans), we have a much more serious and systemic problem which few dare to acknowledge.  What’s the solution?  Better appraisal and better records management!

No one will ever be fired at State because of WikiLeaks — except, at some point, possibly me. Instead, State joined in the Federal mugging of Army Private Bradley Manning, the person alleged to have copied the cables onto a Lady Gaga CD while sitting in the Iraqi desert. That all those cables were available electronically to everyone from the Secretary of State to a lowly Army private was the result of a clumsy post-9/11 decision at the highest levels of the State Department to quickly make up for information-sharing shortcomings. Trying to please an angry Bush White House, State went from sharing almost nothing to sharing almost everything overnight. They flung their whole library onto the government’s classified intranet, SIPRnet, making it available to hundreds of thousands of Federal employees worldwide.

Three Screens and a Cloud: Netflix’s Qwikster, Facebook & Amazon 2011/09/23

Posted by nydawg in Copyright, Curating, Digital Archives, Digital Archiving, Information Literacy, Information Technology (IT), Intellectual Property, Media.
Tags: , , , , , ,
add a comment

One of the most pressing and intimidating challenges digital archivists face today, is the fact that there is so much content offered in so many quick-changing distribution formats and accessible on short-lived storage media.  I found that the easiest way to describe this is “Three screens and a cloud” or as former Microsoft head Ray Ozzie put it: “how we consume IT is really shifting from a machine-centric viewpoint to what we refer to as three screens and a cloud:  the phone, the PC, and the TV ultimately, and how we deliver value to them.” [i would change that to IP, but hey, I’m not CEO of Microsoft.]

So as archivists who are concerned with the distribution and accessibility of our digital assets, it is important to ask early, “What format or what media will be required and who is the targeted end user on what appliance?”  In other words, you probably don’t want to send a hi-def Blu-Ray digital video stream meant for a big screen tv to a tiny smartphone!  Or you probably don’t want to stream a FlashVideo version to an iPad user.

But, on the other hand, archivists may not need to archive or preserve (for long-term functions) every possible variation of each format version (for smartphone or netbook (iPad) or television).   By articulating what is really needed, archivists can streamline processes and avoid making mountains where molehills are sufficient.  Archivists who can see the forest for the trees will be able to describe fewer assets more completely so that specific needles can be found within the haystacks.

This leads me to the real groundshifting news stories that happened this week.  The first one is that NetFlix is splitting its DVDs-by-mail service from its streaming.  According to Huffington Post: “In a post on The Netflix Blog that went up Sunday night, the company’s CEO, Reed Hastings, announced that Netflix would split its DVD-by-mail service and its streaming-video service into two companies. The new DVD-only company, called “Qwikster,” will be completely separate from the streaming business. Hastings also expressed contrition for the way the company rolled out its recent price hike, which alienated many customers. . . . “It is clear from the feedback over the past two months that many members felt we lacked respect and humility in the way we announced the separation of DVD and streaming, and the price changes. That was certainly not our intent, and I offer my sincere apology.”

Well, obviously, many people are up in arms and think this is the biggest boneheaded marketing move since Coke introduced New Coke! The NY Times’s David Pogue does a pretty good job of getting his gander up as he parses the Netflix apology without fully acknowledging the economics of the “streaming” game.  I won’t get too much into the legal issues (which I don’t fully understand), but I do remember when I was working in “streaming media” as Senior Encoder at SonicNet (and Streamland), licensing costs and marketing dollars generally shift from one medium (vhs, CD or radio) to another (DVD, streaming media or satellite radio).   It seems inevitable that NetFlix realizes, as Blockbuster did years ago, that physical media will soon be obsolete, . . .  so they’re trying to split themselves in order to have different licensing deals with different stakeholders and end users. . . . . and Blockbuster, long-ago doomed, seeks to get in on the action too!

But ultimately, “An issue that both Netflix and Dish face, even when they don’t want to admit it, is the inconsistency of broadband connectivity across the United States.”

Another huge news story from this week was at f8 where Facebook founder and CEO Mark Zuckerberg announced major Facebook renovations. ““Millions of people curate stories of their lives on Facebook every day and have no way to share them once they fall off your profile page…we have been working on ‘timeline’ all year…it’s the story of your life and completely new way to express yourself.  “It has three pieces: all your stories, your apps and a new way to express who you are.”  Zuckerberg said he wanted people to be able to share “their entire lives” on Facebook and have “total control” over how their content appeared online.”

Zuckerberg “also announced a series of partnerships with music, media and games companies –including Spotify, Netflix, Zynga [the maker of Farmville] and The Washington Post.”  So this brings us back to the idea of Netflix which  “announced it is integrating its video streaming service with Facebook — allowing users to watch videos on either site and see what people on their friends lists are viewing.  It will be available in 44 countries except in Netflix’s biggest market — the United States, because of the 1998 Video Privacy Protection Act that prohibits the disclosure of video sales or rental records, the company explained.”

So what does this all mean for “Three Screens and a Cloud?”  Well, it’s important to remember that “Netflix is the biggest driver of U.S. Internet traffic, according to one study. As Internet service providers begin capping or tiering their data plans, that could cause consumers to watch fewer streaming videos on Netflix, analysts say.”  So as phone companies begin capping data plans for distribution (streaming), then another part of the archival equation is the storage medium. . . . and, as many people know, the battle is in the Clouds!

CLIR: Future Generations Will Know More About the Civil War than the Gulf War 2011/09/22

Posted by nydawg in Archives, Best Practices, Digital Archives, Education, Electronic Records, Information Technology (IT), Records Management.
Tags: , , , , , , , , , , , ,
add a comment

When I was in Queens College Graduate Library School six years ago, I took Professor Santon’s excellent course in Records Management which led me to understand that every institution has to manage its records and its assets and Intellectual Property.   The vital role the archive and records center play for every day use and long-term functions was made clear by the fact that records have a life cycle, basically creation – – use – – destruction or disposition.   The course was excellent, despite the fact that the main text books we used were from the early 1990s (and included a 3 1/4″ floppy that ran on Windows 3.1).

While doing an assignment, I found a more recent article which really led me to a revelation: electronic records will cause a lot of problems!  The one part that stuck out most and I still remember to this day was in a 2002 article “Record-breaking Dilemma” in Government Technology.  “The Council on Library and Information Resources, a nonprofit group that supports ways to keep information accessible, predicts that future generations will know more about the Civil War than the Gulf War. Why? Because the software that enables us to read the electronic records concerning the events of 1991 have already become obsolete. Just ask the folks who bought document-imaging systems from Wang the year that Saddam Hussein invaded Kuwait. Not only is Wang no longer in business, but locating a copy of the proprietary software, as well as any hardware, used to run the first generation of imaging systems is about as easy as finding a typewriter repairman. ” (emphasis added)

Obviously that article impacted my thinking about the Digital Dark Ages greatly, and it got me to wondering what will best practices be for managing born-digital assets or electronic records for increasingly long periods of time on storage media that is guaranteed for decreasing periods of time.  Or  “”We’re constantly asking ourselves, ‘How do we retain and access electronic records that must be stored permanently?'” she said. ”  Well, this gets to the crux of the issue, especially when records managers and archivists aren’t invited into the conversations with IT.  So when we are using more and more hard drives (or larger servers even in the cloud), “Hard-drive Makers Weaken Warranties“.  In a nutshell : “Three of the major hard-drive makers will cut down the length of warranties on some of their drives, starting Oct. 1, to streamline costs in the low-margin desktop disk storage business.”

So if we’re storing more data on storage media that is not for long-term preservation, then records and archival management must be an ongoing relay race, with appropriate ongoing funding and support, as more and more materials are copied or moved from one storage medium to another, periodically, every 3-5 years (or maybe that will soon be  1-3 years?).   Benign neglect is no longer a sound records management strategy.

That’s the technological challenge.  But there’s more!  I’ve gone on and on and on before about NARA’s ERA program and how one top priority is to ingest 250 million emails from the Bush Administration.  (I’ve done the math, it works out to nearly one email every second of the eight years.)  So we know that NARA is interested in preserving electronic records.  But a couple years ago I read this scary Fred Kaplan piece, “PowerPoint to the People: The urgent need to fix federalarchiving policies” in which he learned that “Finally—and this is simply stunning—the National Archives’ technology branch is so antiquated that it cannot process some of the most common software programs. Specifically, the study states, the archives “is still unable to accept Microsoft Word documents and PowerPoint slides.””

Uhhhhh, wait!  Well, at least that was written in 2009, so we can hope they have gotten their act together, but if you think about it too much, you might wonder if EVERYTHING NEEDED TO ARCHIVE IS ON MICROSOFT’S PROPRIETARY FORMATS?  Or you might just be inspired to ask if anyone really uses Powerpoint in the military.  Well, as Kaplan points out “This is a huge lapse. Nearly all internal briefings in the Pentagon these days are presented as PowerPoint slides. Officials told me three years ago that if an officer wanted to make a case for a war plan or a weapons program or just about anything, he or she had better make the case in PowerPoint—or forget about getting it approved.”  Or this piece from the NYTimes “We Have Met the Enemy and He Is Powerpoint” in which “Commanders say that behind all the PowerPoint jokes are serious concerns that the program stifles discussion, critical thinking and thoughtful decision-making. Not least, it ties up junior officers — referred to as PowerPoint Rangers — in the daily preparation of slides, be it for a Joint Staff meeting in Washington or for a platoon leader’s pre-mission combat briefing in a remote pocket of Afghanistan.”

We Have Met the Enemy, and He Is PowerPoint

Keep Bit Rot at Bay: Change is Afoot as LoC’s DPOE Trains the Trainers 2011/09/20

Posted by nydawg in Archives, Best Practices, Digital Archives, Digital Archiving, Digital Preservation, Information Technology (IT), Media.
Tags: , , , , ,
add a comment

This was forwarded to me by a nydawg member who subscribes to the UK’s Digital Preservation listserv.  I don’t know if  it’s been posted publicly in the US, but I guess this first one is by invitation-only.  I would LOVE to hear what they are teaching and how they are doing it, so I hope someday to attend as well.

Library of Congress To Launch New Corps of Digital Preservation Trainers

The Digital Preservation Outreach and Education program at the Library of Congress will hold its first national train-the-trainer workshop on September 20-23, 2011, in Washington, DC.

The DPOE Baseline Workshop will produce a corps of trainers who are equipped to teach others, in their home regions across the U.S., the basic principles and practices of preserving digital materials.  Examples of such materials include websites; emails; digital photos, music, and videos; and official records.

The 24 students in the workshop (first in a projected series) are professionals from a variety of backgrounds who were selected from a nationwide applicant pool to  represent their home regions, and who have at least some familiarity with community-based training and with digital preservation. They will be instructed by the following subject matter experts:

*   Nancy McGovern, Inter-university Consortium for Political and Social  Research, University of Michigan
*   Robin Dale, LYRASIS
*   Mary Molinaro, University of Kentucky Libraries
*   Katherine Skinner, Educopia Institute and MetaArchive Cooperative
*   Michael Thuman,  Tessella
*   Helen Tibbo, School of Information and Library Science, University of  North Carolina at Chapel Hill, and Society of American Archivists.

The curriculum has been developed by the DPOE staff and expert volunteer advisors and informed by DPOE-conducted research–including a nationwide needs-assessment survey and a review of curricula in existing training programs. An outcome of the September workshop will be for each participant to, in turn, hold at least one basic-level digital-preservation workshop in his or her home U.S. region by mid-2012.

The intent of the workshop is to share high-quality training in digital preservation, based upon a standardized set of core
principles, across the nation.  In time, the goal is to make the training available and affordable to virtually any interested
organization or individual.

The Library’s September 2011 workshop is invitation-only, but informational and media inquiries are welcome to George Coulbourne, DPOE Program Director, at gcou@loc.gov.

The Library created DPOE  in 2010.  Its mission is to foster national outreach and education to encourage individuals and organizations to actively preserve their digital content, building on a collaborative network of instructors, contributors and institutional partners. The DPOE website is www.loc.gov/dpoe
http://digitalpreservation.gov/education/.  Check out the curriculum and course offerings here.

 

dk
###

Curating Google Doodle Highlights incl. Freddie Mercury’s Tribute 2011/09/06

Posted by nydawg in Curating, Digital Archives, Digital Archiving, Digital Preservation, Information Technology (IT), Intellectual Property, Media.
Tags: , , , , ,
add a comment

Hi everyone: Maybe this isn’t totally an archival or curatorial issue, but in some ways, these GoogleDoodles do what a good archive strives to do: provide easy access to available information and resources.  So pump up the volume, click on today’s GoogleDoodle, look for the cc [closed-captioning] button for lyrics to sing-along as you watch an animated music video tribute to the late great Queen singer Freddie Mercury.  http://www.google.com/

and check out Queen guitarist Brian May’s blog tribute here.

But if you want more of those awesome GoogleDoodles, don’t forget some of my favorites including: Alex Calder’s moving mobiles;  playable and recordable Les Paul guitar; John Lennon’s hand-drawn Imagine (animation); Martha’ Graham’s “Thought of You” dance; Mr. Men and Little Miss; Charlie Chaplin’s 122nd Birthday; and who can forget GoogleDoodle Dots, Jules Verne or the Google PacMan?

Those are some of my favorites, but I can probably think of a dozen more if i put my head to it. . . .If you’re interested in learning about the doodle history, check it out here.  And if i’m missing any good ones, please let me know!

WikiLeaks’ Cablegate and Systemic Problems 2011/09/06

Posted by nydawg in Best Practices, Digital Archives, Electronic Records, Information Technology (IT), Media, Privacy & Security, Records Management, WikiLeaks.
Tags: , , , , ,
1 comment so far

WikiLeaks Cablegate

Since late November of last year, the whole world has been watching as WikiLeaks got its hands on and slowly released thousands of classified cables created and distributed by the US over the last four decades.  As you may recall, the suspected leaker was Army Private First Class Pfc Bradley Manning who, undetected, was able to locate all the cables, copy them to his local system, burn them to CD-R (while allegedly lipsyncing Lady Gaga), and uploading an encrypted file to WikiLeaks.  (I’ve written previously , so I won’t get too detailed here.)

But last week, the story changed dramatically when The Guardian revealed that “A security breach has led to the WikiLeaks archive of 251,000 secret US diplomatic cables being made available online, without redaction to protect sources.  WikiLeaks has been releasing the cables over nine months by partnering with mainstream media organisations.  Selected cables have been published without sensitive information that could lead to the identification of informants or other at-risk individuals.”  To further confuse matters related to the origin of this newest leak, “A Twitter user has now published a link to the full, unredacted database of embassy cables. The user is believed to have found the information after acting on hints published in several media outlets and on the WikiLeaks Twitter feed, all of which cited a member of rival whistleblowing website OpenLeaks as the original source of the tipoffs.”  The Cablegate story, with all its twists and turns over the months, has left a big impression on me and, as an archivist and records manager, I think it is important to strip this story of all its emotionality and look at it calmly and rationally so that we can get to the bottom of this madness.

The first problem I have with the story, or more specifically, with the records management practices of the Defense Department is the scary fact that a low-level Private first class (Pfc) would have full access to the Army’s database.  This became a bit scarier when we learned that Pfc Manning used SIPRNet (Secret Internet Protocol Router Network) to gain full access to JWICS (Joint Worldwide Intelligence Communications System) as well as the [cilivian/non-military] diplomatic cables generated by the State Department.

So the first question I had to ask was: why does DoD have access to the State Department’s diplomatic cables, are they spying on the State Department?!  Well, maybe, but even if not, this staggering fact from a different Guardian article sent shivers down my spine:  “The US general accounting office identified 3,067,000 people cleared to “secret” and above in a 1993 study. Since then, the size of the security establishment has grown appreciably. Another GAO report in May 2009 said: “Following the terrorist attacks on September 11 2001 the nation’s defence and intelligence needs grew, prompting increased demand for personnel with security clearances.” A state department spokesman today refused to say exactly how many people had access to Siprnet.”

Other factors that scare the heck out of me related to “bad records management” and WikiLeaks Cablegate are the fact that there is a lack of CONTROL of these assets (they store everything online?!  Really?!); the DoD and State Department don’t use ENCRYPTION or cryptographic keys or protected distribution systems; the names of confidential sources were  not REDACTED in the embassy before uploading and sharing the cables with the world; their RETENTION SCHEDULES do not allow for some cables to be declassified and/or destroyed (so they keep everything online for decades and/or years); the majority of cables were UNCLASSIFIED suggesting that so many cables are created that they don’t even have enough staff to describe and CLASSIFY them in a better way?  The DoD didn’t have a method for setting ACCESS PRIVILEGES, or PERMISSIONS or AUTHORIZATION to ensure that a Pfc who is on probation would not be able to access (and copy and burn to portable media) all those cables undetected?!  There’s a question about password protection and authorization, but those problems could probably be covered with better ACCESS PRIVILEGES and PERMISSIONS.  Another question that leaves archivists confused is the idea that there seems to be limited version control.  In other words, it seems as if once a cable is completed, someone immediately uploads it, and then if the cable is updated and revised, a second cable will be created and uploaded.  This doesn’t seem to be a very smart way of trying to control the information when multiple copies may suggest differing viewpoints.

But perhaps the scariest part of the whole WikiLeaks’ Cablegate madness is simply that there was no TRACKING or TRACING mechanism so that the DoD could, through LOGS, trace data flows to show that one person (or one machine or one room in one building or whatever) had just downloaded a whole collection of CLASSIFIED materials!  [From the IT perspective, large flows of data may actually impact data flow speeds for other soldiers on the same network!]  And the fact that Pfc Manning was able to burn the data to CD-R suggests that when IT deployed the systems they forgot or neglected to DISABLE the burn function on a classified network!  (Okay, they’ve made some recent changes, but is it too late?!)

Many assume that Digital Forensics will provide a new way to authenticate data.  Well, if so, then why can’t they run a program on the cables and find out which system was used to burn the data and then trace the information back to the person who was using the machine at that time, as opposed to putting a soldier in jail, in solitary confinement, awaiting trial, convicted merely on a hearsay online chat he had with a known hacker?!  One other important consideration that also scares me: The military uses Outlook for their email correspondences, and Outlook creates multiple PST files.  As the National Journal puts it: “So how did Manning allegedly manage to get access to the diplomatic cables? They’re transmitted via e-mail in PDF form on a State Department network called ClassNet, but they’re stored in PST form on servers and are searchable. If Manning’s unit needed to know whether Iranian proxies had acquired some new weapon, the information might be contained within a diplomatic cable. All any analyst has to do is to download a PST file with the cables, unpack them, SNAP them up or down to a computer that is capable of interacting with a thumb drive or a burnable CD, and then erase the server logs that would have provided investigators with a road map of the analyst’s activities.”

Obviously the system was broken, informants’ security was compromised, our secrets are exposed, and the cat is out of the bag!  Yet even now, many are unwilling to listen to or heed the lessons we need to learn from this debacle.  Back in January, I attended a WikiLeaks panel discussion hosted by the Archivists Round Table of Metropolitan New York and was surprised to hear that most of these issues raised above were ignored.  I tried to ask a question regarding the systemic problems (don’t blame Manning), but even that was mostly ignored (or misunderstood) and not answered by everyone on the panel.

In my opinion, we have very serious problems related to best practices for records management.  If you look closely at DoD 5015.2, you can see that the problems are embedded in the language for software reqs, and nobody is looking at these problems in the ways that many archivists or records managers do (or should).  But honestly, the most insightful analysis and explanation were confessed by Manning himself: ““I would come in with music on a CD-RW labeled with something like ‘Lady Gaga,’ erase the music then write a compressed split file,” he was quoted in the logs as saying. “[I] listened and lip-synced to Lady Gaga’s ‘Telephone’ while exfiltrating possibly the largest data spillage in American history. Weak servers, weak logging, weak physical security, weak counter-intelligence, inattentive signal analysis … a perfect storm.

So maybe it is time for the military, the US National Archives, and all computer scientists and IT professionals to stop relying on computer processing and automated machine actions and start thinking of better ways to actually protect and control their classified and secret data.   Perhaps a good first move would be to hire more archivists and try to minimize the backlog quantity of Unclassified cables!  Or maybe it’s time to make sure that the embassies take responsibility for redacting the names of their sources before uploading the cables to a shared network?  And maybe it is time to consider a different model than the life cycle model which will account for the fact that often these cables will be used for different functions by different stakeholders through the course of its existence.