jump to navigation

NARA’s Erratic ERA Offers No Content-Searching 2011/10/29

Posted by nydawg in Archives, Digital Archives, Electronic Records, Records Management.
Tags: , , ,
add a comment

Many of us have been watching the unruly boondoggle of NARA’s ERA over
the years, but this story seems a bit overdue. . . . In a nutshell,
“Searching text impossible on NARA’s e-Records Archive”.
I hope soon they’ll take on the task of separating the wheat from the
chaff of those 250million Bush emails. (nearly one [out-of-office?]
email every second for 8 years)

“People trying to search the text of documents through the National
Archives and Records Administration’s $430 million Electronic Records
Archive are going to be disappointed, according to the agency’s
inspector general.  Under the currently deployed system, users can
search only by metadata. That typically includes tags for information
such as name of the original publication, date of publication, agency
that originated the document, and a small number of keywords. Users
who hope to locate a document by a word or phrase that isn’t part of
the metadata will be unable to. . . .

The public’s ability to use the ERA is likely to be hampered because
of the lack of a full text-based search capability, which would be
similar to what is available on Google.com or other commercial search
engines, NARA Inspector General Paul Brachfeld said in an interview on
Oct. 26.  Lack of full text search “is one of the profound problems
with the ERA at this point,” Brachfeld said. “Metadata alone does not
tell the story of what is in the documents.””

http://fcw.com/articles/2011/10/26/nara-electronic-archive-has-fundamental-flaw-in-search–it-says.aspx

Advertisements

WikiLeaks’ Cablegate and Systemic Problems 2011/09/06

Posted by nydawg in Best Practices, Digital Archives, Electronic Records, Information Technology (IT), Media, Privacy & Security, Records Management, WikiLeaks.
Tags: , , , , ,
1 comment so far

WikiLeaks Cablegate

Since late November of last year, the whole world has been watching as WikiLeaks got its hands on and slowly released thousands of classified cables created and distributed by the US over the last four decades.  As you may recall, the suspected leaker was Army Private First Class Pfc Bradley Manning who, undetected, was able to locate all the cables, copy them to his local system, burn them to CD-R (while allegedly lipsyncing Lady Gaga), and uploading an encrypted file to WikiLeaks.  (I’ve written previously , so I won’t get too detailed here.)

But last week, the story changed dramatically when The Guardian revealed that “A security breach has led to the WikiLeaks archive of 251,000 secret US diplomatic cables being made available online, without redaction to protect sources.  WikiLeaks has been releasing the cables over nine months by partnering with mainstream media organisations.  Selected cables have been published without sensitive information that could lead to the identification of informants or other at-risk individuals.”  To further confuse matters related to the origin of this newest leak, “A Twitter user has now published a link to the full, unredacted database of embassy cables. The user is believed to have found the information after acting on hints published in several media outlets and on the WikiLeaks Twitter feed, all of which cited a member of rival whistleblowing website OpenLeaks as the original source of the tipoffs.”  The Cablegate story, with all its twists and turns over the months, has left a big impression on me and, as an archivist and records manager, I think it is important to strip this story of all its emotionality and look at it calmly and rationally so that we can get to the bottom of this madness.

The first problem I have with the story, or more specifically, with the records management practices of the Defense Department is the scary fact that a low-level Private first class (Pfc) would have full access to the Army’s database.  This became a bit scarier when we learned that Pfc Manning used SIPRNet (Secret Internet Protocol Router Network) to gain full access to JWICS (Joint Worldwide Intelligence Communications System) as well as the [cilivian/non-military] diplomatic cables generated by the State Department.

So the first question I had to ask was: why does DoD have access to the State Department’s diplomatic cables, are they spying on the State Department?!  Well, maybe, but even if not, this staggering fact from a different Guardian article sent shivers down my spine:  “The US general accounting office identified 3,067,000 people cleared to “secret” and above in a 1993 study. Since then, the size of the security establishment has grown appreciably. Another GAO report in May 2009 said: “Following the terrorist attacks on September 11 2001 the nation’s defence and intelligence needs grew, prompting increased demand for personnel with security clearances.” A state department spokesman today refused to say exactly how many people had access to Siprnet.”

Other factors that scare the heck out of me related to “bad records management” and WikiLeaks Cablegate are the fact that there is a lack of CONTROL of these assets (they store everything online?!  Really?!); the DoD and State Department don’t use ENCRYPTION or cryptographic keys or protected distribution systems; the names of confidential sources were  not REDACTED in the embassy before uploading and sharing the cables with the world; their RETENTION SCHEDULES do not allow for some cables to be declassified and/or destroyed (so they keep everything online for decades and/or years); the majority of cables were UNCLASSIFIED suggesting that so many cables are created that they don’t even have enough staff to describe and CLASSIFY them in a better way?  The DoD didn’t have a method for setting ACCESS PRIVILEGES, or PERMISSIONS or AUTHORIZATION to ensure that a Pfc who is on probation would not be able to access (and copy and burn to portable media) all those cables undetected?!  There’s a question about password protection and authorization, but those problems could probably be covered with better ACCESS PRIVILEGES and PERMISSIONS.  Another question that leaves archivists confused is the idea that there seems to be limited version control.  In other words, it seems as if once a cable is completed, someone immediately uploads it, and then if the cable is updated and revised, a second cable will be created and uploaded.  This doesn’t seem to be a very smart way of trying to control the information when multiple copies may suggest differing viewpoints.

But perhaps the scariest part of the whole WikiLeaks’ Cablegate madness is simply that there was no TRACKING or TRACING mechanism so that the DoD could, through LOGS, trace data flows to show that one person (or one machine or one room in one building or whatever) had just downloaded a whole collection of CLASSIFIED materials!  [From the IT perspective, large flows of data may actually impact data flow speeds for other soldiers on the same network!]  And the fact that Pfc Manning was able to burn the data to CD-R suggests that when IT deployed the systems they forgot or neglected to DISABLE the burn function on a classified network!  (Okay, they’ve made some recent changes, but is it too late?!)

Many assume that Digital Forensics will provide a new way to authenticate data.  Well, if so, then why can’t they run a program on the cables and find out which system was used to burn the data and then trace the information back to the person who was using the machine at that time, as opposed to putting a soldier in jail, in solitary confinement, awaiting trial, convicted merely on a hearsay online chat he had with a known hacker?!  One other important consideration that also scares me: The military uses Outlook for their email correspondences, and Outlook creates multiple PST files.  As the National Journal puts it: “So how did Manning allegedly manage to get access to the diplomatic cables? They’re transmitted via e-mail in PDF form on a State Department network called ClassNet, but they’re stored in PST form on servers and are searchable. If Manning’s unit needed to know whether Iranian proxies had acquired some new weapon, the information might be contained within a diplomatic cable. All any analyst has to do is to download a PST file with the cables, unpack them, SNAP them up or down to a computer that is capable of interacting with a thumb drive or a burnable CD, and then erase the server logs that would have provided investigators with a road map of the analyst’s activities.”

Obviously the system was broken, informants’ security was compromised, our secrets are exposed, and the cat is out of the bag!  Yet even now, many are unwilling to listen to or heed the lessons we need to learn from this debacle.  Back in January, I attended a WikiLeaks panel discussion hosted by the Archivists Round Table of Metropolitan New York and was surprised to hear that most of these issues raised above were ignored.  I tried to ask a question regarding the systemic problems (don’t blame Manning), but even that was mostly ignored (or misunderstood) and not answered by everyone on the panel.

In my opinion, we have very serious problems related to best practices for records management.  If you look closely at DoD 5015.2, you can see that the problems are embedded in the language for software reqs, and nobody is looking at these problems in the ways that many archivists or records managers do (or should).  But honestly, the most insightful analysis and explanation were confessed by Manning himself: ““I would come in with music on a CD-RW labeled with something like ‘Lady Gaga,’ erase the music then write a compressed split file,” he was quoted in the logs as saying. “[I] listened and lip-synced to Lady Gaga’s ‘Telephone’ while exfiltrating possibly the largest data spillage in American history. Weak servers, weak logging, weak physical security, weak counter-intelligence, inattentive signal analysis … a perfect storm.

So maybe it is time for the military, the US National Archives, and all computer scientists and IT professionals to stop relying on computer processing and automated machine actions and start thinking of better ways to actually protect and control their classified and secret data.   Perhaps a good first move would be to hire more archivists and try to minimize the backlog quantity of Unclassified cables!  Or maybe it’s time to make sure that the embassies take responsibility for redacting the names of their sources before uploading the cables to a shared network?  And maybe it is time to consider a different model than the life cycle model which will account for the fact that often these cables will be used for different functions by different stakeholders through the course of its existence.

Arab Spring Diplomatics & Libyan Records Management 2011/09/05

Posted by nydawg in Archives, Best Practices, Digital Archives, Digital Preservation, Electronic Records, Information Technology (IT), Media, Records Management.
Tags: , , , , , , , , , , ,
add a comment

At 75th Annual Meeting of the SAA (Society of American Archivists) last week, I had the fortunate opportunity to attend many very interesting panels, speeches and discussions on archives, archival education, standards, electronic records, digital forensics, photography archives, digital media, and my mind is still reeling.   But when I heard this story on the news radio frequency, I needed to double-check.

As you all know, the Arab Springrevolutionary wave of demonstrations and protests in the Arab world. Since 18 December 2010 there have been revolutions in Tunisia and Egypt;
civil uprisings in BahrainSyria, &Yemen; major protests in AlgeriaIraqJordanMorocco, and
 

Omanand minor protests civil war in Libya resulting in the fall of the regime there; in

Kuwait, LebanonMauritaniaSaudi ArabiaSudan, and Western Sahara! Egypian President Hosni Mubarak resigned (or retired) and there’s a Civil War going on in Libya.   Meanwhile, with poor records management, documents were found in Libya’s External Security agency headquarters showing that the US was firmly on their side in the War on Terror:

“CIA moved to establish “a permanent presence” in Libya in 2004, according to a note from Stephen Kappes, at the time the No. 2 in the CIA’s clandestine service, to Libya’s then-intelligence chief, Moussa Koussa.  Secret documents unearthed by human rights activists indicate the CIA and MI6 had very close relations with Libya’s 2004 Gadhafi regime.

The memo began “Dear Musa,” and was signed by hand, “Steve.” Mr. Kappes was a critical player in the secret negotiations that led to Libyan leader Col. Moammar Gadhafi’s 2003 decision to give up his nuclear program. Through a spokeswoman, Mr. Kappes, who has retired from the agency, declined to comment.  A U.S. official said Libya had showed progress at the time. “Let’s keep in mind the context here: By 2004, the U.S. had successfully convinced the Libyan government to renounce its nuclear-weapons program and to help stop terrorists who were actively targeting Americans in the U.S. and abroad,” the official said.””

Shudder.

So I guess that means that if all of those documents from the CIA are secret, there would be no metric for tracing a record (at least on the US side).   In other words, every time a record is sent, copied or moved, a new version is created, but where is the original?  Depending on the operating system, the metadata may have a new Date Created.  How will anybody be able to find an authentic electronic record when it’s still stored on one person’s local system which is probably upgraded every few years?

There is a better way, a paradigm shift, and by looking at the Australian records continuum, “certainly provides a better view of reality than an approach that separates space and time”, we can find a better way so all [useless] data created is not aggregated.   With better and more appraisal, critical and analytical and technical and IP content, we can select and describe more completely the born digital assets and separate the wheat from the chaff, the needles and the haystacks, the molehills from the mountains, and (wait for it)  . . . see the forest for the trees.  By storing fewer assets and electronic records more carefully, we can actually guarantee better results.  Otherwise, we are simply pawns in the games of risk played (quite successfully) by IT Departments ensuring (but not insuring) the higher-ups that “we are archiving: we backup every week.” [For those who are wondering: when institutions “backup” they backup the assets one week, moves the tapes offsite and overwrite the assets the following week.  They don’t archive-to-tape for long-term preservation.]

Diplomatics may present a way for ethical archivists in to the world of IT, especially when it comes down to Digital Forensics.  But the point I’m ultimately trying to make, I think, is that electronic (or born digital) records management requires new skills, strategies, processes, standards, plans, goals and better practices than the status quo.  And this seems to be the big elephant in the room that nobody dares describe.

FYI: EMR = EHR 2011/08/17

Posted by nydawg in Archives, Digital Archives, Electronic Records, Information Technology (IT), Privacy & Security, Records Management.
Tags: , ,
add a comment

This news story from Reuters seems bizarre given known limitations (and challenges) of iPads such as no USB drives, doesn’t read or edit PowerPoints, doesn’t stream Flash video, and etc. (is there more?)

“The iPad may help electronic medical records (EMR, sometimes also referred to as electronic health records, or EHR) finally gain wide adoption, thanks in part to a new program that will see the federal government dispersing grants to doctors who make use of a free native EMR iPad app.” Here’s the article: “Electronic Medical Records Get a Boost from iPad, Federal Funding” .

So ultimately the question will be: if doctors are using portable iPads for their EMR (EHR) duties, how will they distribute the information?  Email?  Shared network?  And what will the security risks be?  And will the copy of the “record” be stored as a read/write version, and will it be shareable with other systems and, if so, in what format and what software version, etc.?!