Information Technology (IT) | nydawg New York Digital Archivists Working Group

Digital New York: Still a Few Bugs in the System 2011/09/05

Posted by nydawg in Curating, Digital Archiving, Education, Electronic Records, Information Technology (IT), Media.
Tags: disaster plan, hurricane irene, nyc, saa, society of american archivists
add a comment

Hurricane Irene (not to scale)

Many of you know that I missed all the excitement last week as Hurricane Irene bore down on the New York area. I was in Chicago for the 75th Annual Meeting of the SAA (Society of American Archivists) and it got so bad that I received warning emails from my mother and my oldest brother. [I assume they had received but not read my itinerary which clearly showed that I was heading to Minneapolis/St Paul after the meeting.] So I figured I was in the clear until I realized sometime on Friday, “Whoops! I forgot to close my windows!” So I guess I can say I was tangentially affected (by guilt caused) by Tropical Storm Irene. . . .

But as the story was developing, I was in touch with friends back East and learned that some who live in my neighborhood were advised to evacuate! My ex-girlfriend evacuated our two (Brooklyn) cats to Manhattan, and sent me pictures! Well, I live close enough to the East River to start to worry about my (second floor) apartment. . With a little research, I learned that I could find the evacuation areas from nyc.gov. But on Saturday, I didn’t have any luck accessing the PDF or whatever it was.

So this morning, I stopped for a cup of coffee in Champion, and happened to read an article that “The New York Times reported that the city’s official website, www.nyc.gov, was down on the morning of Friday, Aug. 26. The news outlet suggested that the site was overwhelmed by people looking for information about the hurricane. As of 1:30 p.m. Pacific time, however, the site was back online. The timing couldn’t have been worse. In what New York City Mayor Michael R. Bloomberg called a “first time” for the city, he ordered a mandatory evacuation of various coastal areas of the city’s five boroughs, covering roughly 250,000 people.” So this is dysfunctional modern-day disaster planning.

From the Times “City Learns Lessons From the Storm, Many of Them the Hard Way” we learn that “For example, the mayor’s office had predicted a surge in Web traffic on nyc.gov when it issued the evacuation order. But nobody expected five times the normal volume of traffic. By Friday afternoon, computer servers had become severely overloaded. The Web site sputtered and crashed for hours, when New Yorkers needed it most. In the future, the city will try to modify the Web site so that it can be quickly stripped down to a few essential features — like an evacuation map, searchable by ZIP code — that are in highest demand during an emergency.”

Hurricane Irene: NYC Evacuation Zones

I’m curious about what is the “normal volume” of traffic on that webpage? But it seems to me that this is ultimately a problem wit making information accessible, but not thinking it through to the extent that an end-user (who may have to evacuate his/her house!) has to first click on the PDF, then download it, wait for it to finish downloading, launch it, and then search for the data needed. . . . . The fact that this is not an integrated system where a person can easily plug his/her zip code into an online system to find out if his house is in an evacuation zone suggests that the system is not very functional, best practices are not in use, and further, that perhaps the metrics used to show how vital Digital New York is, are the wrong metrics to use.

Why wouldn’t the IT staff at DoITTT consider creating mirror sites for downloading the PDFs? So the first victim of Hurricane Irene was NYC.gov. “In a tweet earlier this morning the city’s Chief Digital Officer apologized for the outage while giving specific links (which were also frequently down) to find the city’shurricane evacuation map (we’ve included it below for your convenience). And the city’s main Twitter feed just put out a similar tweet. Which means, damn, a LOT of people must be trying to access the city’s website. We’ve e-mailed to find out just how many users it takes to take down nyc.gov but have yet to hear back.”

Well, fortunately, they’ve probably learned some lessons from this hysteria, and it seems like no one suffered much damage in this area and, ironically (or fortunately) September is a good time to Get Prepared: “National Preparedness Month . . . a nationwide campaign to promote emergency preparedness and encourage volunteerism.” To learn more about NYC’s Digital Strategy and the Chief Digital Officer check here for the Road Map. (more…)

Arab Spring Diplomatics & Libyan Records Management 2011/09/05

Posted by nydawg in Archives, Best Practices, Digital Archives, Digital Preservation, Electronic Records, Information Technology (IT), Media, Records Management.
Tags: arab spring, digital forensics, diplomatics, egypt, electronic records, libya, media, mubarak, records continuum, records management, saa, saa11
add a comment

At 75th Annual Meeting of the SAA (Society of American Archivists) last week, I had the fortunate opportunity to attend many very interesting panels, speeches and discussions on archives, archival education, standards, electronic records, digital forensics, photography archives, digital media, and my mind is still reeling. But when I heard this story on the news radio frequency, I needed to double-check.

As you all know, the Arab Spring a revolutionary wave of demonstrations and protests in the Arab world. Since 18 December 2010 there have been revolutions in Tunisia and Egypt;
civil uprisings in Bahrain, Syria, &Yemen; major protests in Algeria, Iraq, Jordan, Morocco, and

Oman, and minor protests a civil war in Libya resulting in the fall of the regime there; in

Kuwait, Lebanon, Mauritania, Saudi Arabia, Sudan, and Western Sahara! Egypian President Hosni Mubarak resigned (or retired) and there’s a Civil War going on in Libya. Meanwhile, with poor records management, documents were found in Libya’s External Security agency headquarters showing that the US was firmly on their side in the War on Terror:

“CIA moved to establish “a permanent presence” in Libya in 2004, according to a note from Stephen Kappes, at the time the No. 2 in the CIA’s clandestine service, to Libya’s then-intelligence chief, Moussa Koussa. Secret documents unearthed by human rights activists indicate the CIA and MI6 had very close relations with Libya’s 2004 Gadhafi regime.

The memo began “Dear Musa,” and was signed by hand, “Steve.” Mr. Kappes was a critical player in the secret negotiations that led to Libyan leader Col. Moammar Gadhafi’s 2003 decision to give up his nuclear program. Through a spokeswoman, Mr. Kappes, who has retired from the agency, declined to comment. A U.S. official said Libya had showed progress at the time. “Let’s keep in mind the context here: By 2004, the U.S. had successfully convinced the Libyan government to renounce its nuclear-weapons program and to help stop terrorists who were actively targeting Americans in the U.S. and abroad,” the official said.””

Shudder.

So I guess that means that if all of those documents from the CIA are secret, there would be no metric for tracing a record (at least on the US side). In other words, every time a record is sent, copied or moved, a new version is created, but where is the original? Depending on the operating system, the metadata may have a new Date Created. How will anybody be able to find an authentic electronic record when it’s still stored on one person’s local system which is probably upgraded every few years?

There is a better way, a paradigm shift, and by looking at the Australian records continuum, “certainly provides a better view of reality than an approach that separates space and time”, we can find a better way so all [useless] data created is not aggregated. With better and more appraisal, critical and analytical and technical and IP content, we can select and describe more completely the born digital assets and separate the wheat from the chaff, the needles and the haystacks, the molehills from the mountains, and (wait for it) . . . see the forest for the trees. By storing fewer assets and electronic records more carefully, we can actually guarantee better results. Otherwise, we are simply pawns in the games of risk played (quite successfully) by IT Departments ensuring (but not insuring) the higher-ups that “we are archiving: we backup every week.” [For those who are wondering: when institutions “backup” they backup the assets one week, moves the tapes offsite and overwrite the assets the following week. They don’t archive-to-tape for long-term preservation.]

Diplomatics may present a way for ethical archivists in to the world of IT, especially when it comes down to Digital Forensics. But the point I’m ultimately trying to make, I think, is that electronic (or born digital) records management requires new skills, strategies, processes, standards, plans, goals and better practices than the status quo. And this seems to be the big elephant in the room that nobody dares describe.

Errol Morris and Photo Archives 2011/09/04

Posted by nydawg in Archives, Curating, Information Literacy, Information Technology (IT), Intellectual Property, Media.
Tags: errol morris, film, media, photography, photos
add a comment

Filmmaker Errol Morris has been thinking about collecting, describing and using photographs in many different formats including one of my favorite documentary films “The Thin Blue Line” from 1988. His book, Believing Is Seeing : Creating the Culture of Art, was reviewed in the NYTimes and is now available at bookstores everywhere (well, maybe not borders). The NYTimes review starts out: “Likewise, “Believing Is Seeing,” though perceptive about photography, is fundamentally concerned with something very different: epistemology. Morris is chiefly interested in the nature of knowledge, in figuring out where the truth — in both senses — lies.

As that suggests, Morris believes in objective truth, and believes that people can grasp it — “even though,” as he has written elsewhere, “the world is unutterably insane.” The question then becomes how to coax an insane world into yielding up its truths, and “Believing Is Seeing” amounts to a provisional, pastiche-y, deeply interesting attempt at an answer.”

If you’re interested in getting a glimpse of some of those chapters and thoughts, you might want to check out some of his The Opinionator blogs including “It Was All Started by a Mouse (part 1)“, “Did My Brother Invent E-Mail with Tom Van Vleck?“, “The Ashtray“, ‘The Anosognosic’s Dilemma: Something’s Wrong but You’ll Never Know What It Is’” and pretty much anything he has written, spoken, shot, edited, made or shared ever! And then check out this book review podcast too.

From Scroll to Screen and Back: Vendor Lock-In and eBooks 2011/09/04

Posted by nydawg in Best Practices, Digital Archives, Digital Humanities, Digital Preservation, Information Literacy, Information Technology (IT).
Tags: Amazon, Apple, ebooks, Kindle, saa, society of american archivists, vendor lock-in, web 2.0
add a comment

I saw an interesting article in the NYTimes titled “From Scroll to Screen” which looks at the transition in print media over the last two thousand years. While the author is specifically looking at the transition from books to eBooks, he declares, “The last time a change of this magnitude occurred was circa 1450, when Johannes Gutenberg invented movable type. But if you go back further there’s a more helpful precedent for what’s going on. Starting in the first century A.D., Western readers discarded the scroll in favor of the codex — the bound book as we know it today.”

Like many archivists and librarians, I am also highly interested in how this transition will work in the future. Last year I read William Powers‘ excellent Hamlet’s Blackberry which led me to new ways of thinking about media and different formats used to carry data and information between different stakeholders across time and space. . . .

So this article in the NYTimes by book critic Lev Grossman caught my interest when discussing how one format replaces the previous: “In the classical world, the scroll was the book format of choice and the state of the art in information technology. Essentially it was a long, rolled-up piece of paper or parchment. To read a scroll you gradually unrolled it, exposing a bit of the text at a time; when you were done you had to roll it back up the right way, not unlike that other obsolete medium, the VHS tape.”

He goes on to explain how those scrolls were items of prestige, probably because of the “scarcity” of scroll-creators. “Scrolls were the prestige format, used for important works only: sacred texts, legal documents, history, literature. To compile a shopping list or do their algebra, citizens of the ancient world wrote on wax-covered wooden tablets using the pointy end of a stick called a stylus. Tablets were for disposable text — the stylus also had a flat end, which you used to squash and scrape the wax flat when you were done. At some point someone had the very clever idea of stringing a few tablets together in a bundle. Eventually the bundled tablets were replaced with leaves of parchment and thus, probably, was born the codex. But nobody realized what a good idea it was until a very interesting group of people with some very radical ideas adopted it for their own purposes. Nowadays those people are known as Christians, and they used the codex as a way of distributing the Bible.”

And anyone who has ever tried to compare two or more passages in a book on an ereader, you may be interested to read: “The codex also came with a fringe benefit: It created a very different reading experience. With a codex, for the first time, you could jump to any point in a text instantly, nonlinearly.” This doesn’t quite work as easily in the tablet or eReader age, but stay tuned, as I imagine at some point they will improve on the technology. “If the fable of the scroll and codex has a moral, this is it. We usually associate digital technology with nonlinearity, the forking paths that Web surfers beat through the Internet’s underbrush as they click from link to link. But e-books and nonlinearity don’t turn out to be very compatible.

So as we move from the tried and trusted durable medium of the codex and hard-cover book (even if printed on cheap paper) to the electronic tablet (early versions, soon-to-be-obsolete operating systems, playing outdated versions), our content management expertise and digital asset curator skills should become more valuable as new technologies eveolve and media formats become obsolete and disposable and our culture is at-risk.

But our hands are tied. Even to address the pressing concerns of eBooks and eReaders and tablets, archivists are left out in the cold. We dare not say anything about the vendor lock-in regarding Kindle’s proprietary formats, because they are Amazon’s Intellectual Property. We cannot say anything about vendor lock-out in regards to Apple’s iPad tablet not playing (hot) Flash videos (see Steve Jobs “Thoughts on Flash”), and causing problems when accessing Fedora through its Flash application. We cannot even mention the fact that iPads do not have any support for portable SD card and USB 2.0 external drives. In other words, if you want to get information on (or off) your iPad, you probably have to email or upload it. . . 😦

So what can we do? Or, more clearly, what should a digital archivist know to catalog and describe when working with born-digital materials?! Well, of course there’s so much (not everything entirely relevant though!), but at the very least, better format (not just medium, but format, and maybe codec) descriptions can create better strategies leading to better plans, processes and best practices. And keep your damn books!

I’ll admit that I finally broke down and bought an eReader. Since I was going to be travelling to Chicago for the SAA, there were many articles I wanted to read and think about in advance, so for the last few months I was searching for the right one. Of course, I was quite wary of Kindles because of the proprietary format and wasn’t sure how well it would read PDFS (or if it could), but friends suggested the Digital Reader or Sony eReader, and at least one friend suggested checking out the Nook from Barnes and Noble.

I wasn’t really sure what I wanted, and I ddn’t really care that much. Basically, anything that would let me read PDFs, most of which I would download from the internet or specifically, the Society of American Archivists‘ (SAA), American Archivist journal. Of course, I also found some excellent reads on Project Gutenberg and Internet Archive (where, btw, they’re preserving physical books!). I’m really interested in the e-ink technology, so that was one factor, and the other factor was that I didn’t want to pay more than $130. (Another factor, on which I thought I would have to compromise was that I wanted an eReader with two screens that would open like a book.)

Well, as you might expect, my research was not leading me to any answers, and I had almost decided to just go with one or the other (whichever was cheapest), and know it would be a temporary solution, until I can afford to buy a nice tablet computer . . . . .But then one day, I got an email offer from Woot for a one-day only clearance sale of all 2-screen dualbook (eReader & Tablet) from Entourage Pocket Edge!! So I picked that up and I love it. . .. (Yes, there’s problems, but for reading pdfs and drawing on a journal or annotating pdfs and surfing the web on the tablet side, and etc. ) Maybe I’ll write more on it later, but for now, I hope you’ll just give a long thought about what we’ll lose if we give up functional non-linearity in our published works! (and I don’t mean Digital Humanities people with their datamining techniques..)

Football GOOOOOOL: Describing Every Action or Each Event 2011/08/30

Posted by nydawg in Digital Archives, Information Technology (IT), Intellectual Property, Media.
Tags: football, on-demand, soccer, sports, streaming media
add a comment

When I was a kid, I used to play soccer and I loved playing and watching the game. To me, the sport was all about teamwork, continuous movement and evolving strategies. Now, years later, I still like the sport, but never play and only occasionally watch it on TV. Looking at it from an archivists’ perspective, there’s a few points which merit mentioning.
The first point is the distinction between soccer and football. In the US, we call it soccer, but almost everywhere else in the world knows it as football (not “American football”). So this got me to thinking about the way a culture names a sport, location (“English” channel ) or even art movement (Dada?). If digital humanities specialists were trying to research the development of the sport in this country and around the world, it might be a more difficult proposition because the terms used are different. . . . and may have changed over the years.

Another interesting difference between soccer and football on TV is the way it is described by the commentators. It is so infuriating to me to watch it on American TV because the English-speaking commentators are usually so caught up telling the dramatic (or financial) backstory of the players, that they neglect to accurately describe the action on the field as it happens. Although I’m not fluent in Spanish, I much prefer watching soccer in Spanish because they are telling a story in real-time. . . . and then when someone shoots and scores? GOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOLLLLLL! The announcers are excited as the plot develops. The point i’m trying to make is that although the “content” of the game (the action, the game, the video or moving images) is the same, but the Spanish-language commodity (w/ the play-by-play) makes it, in my opinion, a more valuable commodity because it is more descriptive (even though I don’t speak the language), especially for those who do not know (or care about) Manchester United or the football clubs in the UK (or elsewhere). I’m not interested in learning that David Beckham used to play with so-and-so, but if someone describes the six passes leading to a shot on goal, please do tell! Like other soccer fans, I am interested in the process and how the game develops, or specifically, in knowing that the left wing crosses the ball to the forward who is guarded by the fullback (or whatever). . . . If the people at FIFA or the World Cup wanted to build up an audience, I think they should really consider using Spanish-language commentators for the play-by-play, and then hiring translators to put it in English. (Yes, I’m sure that could probably be automated.) Let’s give soccer it’s due.

And this brings me to the point about the game, and why I loved it so much, how it’s different from baseball (or American football), and how it relates to archiving. In baseball, tennis and American football, the whole game is set up as one event (pitch, serve or down) after another, and each event is described (and archived) as something with a beginning middle and end. (“The pitcher winds up, releases and throws, called strike one.”) In soccer though, because it is so dependent on the uninterrupted passage of time, the whole game is about process and the subtle changes that happen as players work together to accomplish a “goal”. I find this fascinating, because as in archiving, parameters need to be set in order to determine best practices. For example, professional games usually run 90 minutes, but only the referees keep accurate time, so even if the clock expires, the game continues until the referee says it is finished. It’s not over-time, they keep track of how much “real” playing time is “owed” from injuries, fouls and other circumstances that arise in play.

In some ways, soccer is like basketball, but even that sport is set up and described as a series of events, and it is at such a fast pace, that play-by-play announcers don’t have to announce every pass, so they can usually focus on the end result “He shoots and scores!” I guess what I’m trying to say is that baseball and those other sports can be described mechanistically, or the descriptions can be automated.

A few years ago, I worked at a video indexing retrieval software company called Virage (later bought by Autonomy recently bought by HP), and worked in a small department in New Jersey that ” captured” (encoded, digitized, etc.) and cataloged every Major League Baseball game from the 2001 season from Opening Pitch of the first game to the last out of the last game of the World Series. Since I was a video technician, I was mostly responsible for making sure the video feeds (content) were captured, saved, renamed and uploaded so they would be accessible (and fully searchable) online within an hour after the game was concluded, but we also had stringers in the ballparks who were keeping score of the games, and electronically sending the data to our computers in New Jersey. At the end of the game, the data and footage would be combined so a fan or any subscribers could search for any event (e.g. every double Derek Jeter hit in the month of May, [but not every pitch]), and create a highlights reel to be streamed or (maybe) saved locally. MLB still does it, probably using the same software, but now they catalog every pitch and every commercial, and rather than simply rely on the software to combine the data, they hire seasonal workers to describe everything. What’s interesting to me is that the original project was simplified and stripped down and provided an excellent way to standardize scoring and create a functional system.

Working with these brilliant software engineers (some now work for Google), I learned a lot about how structured data can be used to create a more valuable commodity. If the software is created to match the scoring with the video feeds, then it becomes a mathematical exercise in which an event like a home run will take about 45 seconds, where as a strikeout (third strike) may take less than 1 second. By using data in this way, Virage figured out a way to quickly and efficiently make every baseball game fully searchable with a very small group of people. Today, it’s gotten a little more descriptive, but almost to the point of too descriptive, and not really functional anymore (in my opinion). Who would ever search for ““ground out,” “from knees,” “last out,” “premier plays,” instead of “no-hitter”!
I suppose it’s a testament to the original idea that MLB is using the software on a larger scale with more (internal) stakeholders than simply fans creating on demand highlight reels, but I’m still left wondering why, and how would any logger want to tag everything: “It is not only the game action that is tagged. If a squirrel runs onto the field, the play will be tagged with “animal.” If there is a shot of a man sipping a beer, there is a “drinking” option under the “fans” category. I suppose if you’re not a professional archivist or cataloger, it might seem “cool” to be able to do that, but as an archivist, I’m left thinking, “Are some taggers wasting their time tagging 1000s of events in a 3 hour game?
But, I guess if there’s an app for that, there must be some function somewhere at some time.

And of course, there’s some other very cool things that MLB is able to do with their content in order to maximize subscriptions, but really, do you need to hire some person to point out that a squirrel ran on the field some day.

Paradigm Shift from Economics of Scarcity to Abundance & Scarcity of Common Sense 2011/08/24

Posted by nydawg in Archives, Best Practices, Digital Archives, Information Technology (IT), Intellectual Property.
Tags: access, common sense, data deluge, digital deluge, information age, saa, society of american archivists
add a comment

One of the most exciting (and scary!) aspects of being a digital archivist these days is that everyone is living through a transition from the Atomic Age (age of atoms) to the Information (Digital) Age (age of bits), but archivists are also living through a professional paradigm shift from the economics of scarcity to the economics of abundance. There is so much born-digital information created every day, month, year, decade, etc., that it is overwhelming just to contemplate how much information is created (and stored), and, while it seems like archivists are doing more and more work, there is some question about the metrics used to show that we are preserving the most significant material. (e.g. NARA is accessioning 200 million emails from the George W. Bush Administration which, as I’ve blogged previously, works out to nearly one email every second of the Administration).

For me, this is a fascinating time for archivists because few people seem to understand how significant this transition is and will be. In fact, from my experiences from library schools, many older faculty members seem unwilling (or unable?) to articulate this transition and, by extension, cannot even teach younger students how these changes will impact their lives and professions. So rather than try to address these issues head-on, some educators ignore them and assign student readings from books written in the early 1990s or before. (I have nothing against the study of “history”, but practicality would be helpful for students trying to get jobs as Information or Knowledge Mangers.)

Years ago, for example, when President John F. Kennedy wrote a memo or correspondence, his secretary would type it up in triplicate and send one copy to the intended recipient, file a second copy in the office, and send the third copy to the archives. Decades later, if somebody wanted to find the original, the office copy or the archived version, it would most likely be filed away and accessible in its original paper format. This system worked very well for hundreds and probably thousands of years! In the Information Age, a similar memo for President Barack Obama might be created by the secretary as a born-digital Word format file, and copies of the file could (or should) be distributed in a similar manner (or perhaps converted to ISO 32000 PDF/A format for stable long-term preservation). This may or may not be happening, but one big difference is that these electronic records (or born-digital files) are dependent on the software used to create them, and if the software is upgraded or replaced and newer versions are not backwards compatible, it may prove difficult to find, access and open those files. (Also, it’s important to note that those files may have been stored on any variety of media formats which are no longer supported or accessible (e.g. remember Jaz drives or zip disks or CD-ROMS or 5.25″ floppys disks?)

To prevent losing mass quantities of materials, many libraries subscribe to LOCKSS or Lots Of Copies Keep Stuff Safe. This may work for electronic journals created in PDF/A format, but it doesn’t work so well if ALL those copies are in a format (or font) that is obsolete or not supported– and/or are stored on a medium (floppies) that are no longer accessible on newer technologies (eg. iPads don’t have a DVD-ROM drive or a USB port)!

But this strategy may not work for Digital Archives because of the difference between accessibility and access or, as James Gleick, author of The Information: A History, A Theory, A Flood, puts it: “We’re in the habit of associating value with scarcity, but the digital world unlinks them. You can be the sole owner of a Jackson Pollock or a Blue Mauritius but not of a piece of information — not for long, anyway. Nor is obscurity a virtue. A hidden parchment page enters the light when it molts into a digital simulacrum. It was never the parchment that mattered.”

As Maria Popova puts it in her excellent essay “Accessibility vs. access: How the rhetoric of “rare” is changing in the age of information abundance“: “Because in a culture where abundance has replaced scarcity as our era’s greatest information problem, without these human sensemakers and curiosity sherpas, even the most abundant and accessible information can remain tragically “rare.””

Archivists and librarians have mastered the processes and practices from an earlier era of scarcity (e.g. item-level description) and seem unwilling (or unable) to consider a new and more efficient model. I was trying to think of an analogy for this, and it hit me in Kennedy Airport where I am waiting for my flight to SAA’s annual meeting in Chicago: For hundreds and thousands of years, men and women have moved around while struggling to pack and carry their luggage, but it wasn’t until 1970 that Bernard Sadow “invented” the suitcase with wheels. What took so long?
It’s hard to say exactly what took so long, but it seems likely that travelers (especially macho travelers) had gotten so used to the inconvenience of lugging their heavy luggage through changing transportation systems that no one considered an easier, faster and better way. But ultimately “common sense” won out, and now just about everyone (except me) has wheels on his/her luggage. Why am I still holding out?! I’m still waiting for suitcases that fly!

MP3tunes Cloud Music Storage Is Not a Crime 2011/08/23

Posted by nydawg in Digital Archives, Digital Preservation, Information Technology (IT), Intellectual Property, Media.
Tags: cloud storage, music, record labels
add a comment

Michael Robertson is a name that is very familiar to those of us who have been involved in digital media for the last decade or so. He founded his first company, MP3.com in 1997 and, after a successful IPO, was targeted and shut down by the record labels a few years back. Obviously he still has a few disruptive ideas that directly threaten the labels’ existence including MP3tunes. So this sounds like big news. “The disk drives powering Dropbox, Amazon’s Cloud Drive, and Google Music likely issued a small sigh of relief Monday, after a federal court judge found that the MP3tunes cloud music service didn’t violate copyright laws when it used only a single copy of a MP3 on its servers, rather than storing 50 copies for 50 users.”

The recording industry argued that music locker sites are illegal with no licenses from copyright holders

Specifically, the part that stands out for archivists and online storage lockers is this bit from Ars Technica: “The ruling contains even more good news for music locker sites and fans of sensible copyright laws. As we reported last month, a key 2008 decision had suggested that locker sites would be more vulnerable to copyright infringement claims if they used deduplication technology to save hard drive space. That ruling was based on the theory that keeping a single “master copy” of a work and sending it to multiple users would constitute an infringing public performance.”

But Judge William H. Pauley III in this case said that MP3tunes’ system complies with that ruling. “Importantly, the system preserves the exact digital copy of each song uploaded to MP3tunes.com,” Pauley ruled. “Thus, there is no “master copy” of any of EMI’s songs stored on MP3tunes’ computer servers.” Curiously, the judge goes on to state that “MP3tunes does not use a ‘master copy’ to store or play back songs stored in its lockers. Instead, MP3tunes uses a standard data compression algorithm that eliminates redundant digital data. “Sherwin Siy of Public Knowledge hailed the ruling, saying it “paves the way for both cloud locker services and integrated media search engines.”

We’ll see how this plays out in the future, but you can bet that sites like Google, Dropbox, and Amazon who offer similar services (without making a deal with the record labels) are very relieved. Meanwhile, we will see how Apple iTunes deals with this in the future. (They’ll no longer need to get agreement with the record labels?!) Read about it in Wiredand Ars Technica.

Whither Appraisal?: David Bearman’s “Archival Strategies” 2011/08/22

Posted by nydawg in Archives, Best Practices, Curating, Digital Archives, Digital Preservation, Education, Electronic Records, Information Technology (IT), Media, Records Management.
Tags: david bearman, Digital Dark Ages, digital preservation, email, mplp, nara, national archives
1 comment so far

Back in Fall 1995, American Archivist published one of the most controversial and debate-inspiring essays written by archival bad-boy David Bearman of Archives & Museum Informatics from Pittsburgh (now living in Canada). The essay, “Archival Strategies” pointed to several problems (challenges/obstacles) in archival methods and strategies which, at the time, threatened to make the profession obsolete. The piece was a follow-up to his “Archival Methods” from 1989 and showed “time and again that archivists have themselves documented order of magnitude and greater discrepancies between our approaches and our aims, they call for a redefinition of the problems, the objectives, the methods or the technologies appropriate to the archival endeavor.” As he points out in Archival Strategies, “In Archival Methods, I argued that “most potential users of archives don’t,” and that “those who do use archives are not the users we prefer.””

This disconnect between archives and their future users led Bearman to write “I urged that we seek justification in use, and that we become indispensable to corporate functioning as the source of information pertaining to what the organization does, and as the locus of accountability.” With his well-stated pithy aphorisms like “most potential users of archives don’t,” and that “those who do use archives are not the users we prefer,” he was able to point to the serious problem facing us today: past practices have led us to preserve the wrong stuff for our unprefered users! Of course Information Technology has led us down this road since computer storage is marketed as so cheap (and always getting cheaper), and it seems much easier to store everything than to let an archivist do his job starting with selection and appraisal, retention and preservation, arrangement and description, and access and use.

Ultimately, his essay is a clarion call for archivists to establish a clear goal for the profession, namely to accept their role in risk management and providing accountability for the greater societal goal. The role of an archivist, in my opinion, is to serve as an institution’s conscience! Perhaps that is the reason why library science and archival studies are considered science. He suggests that strategic thinking is required “Because strategic thinking focuses on end results, it demands “outcome” oriented, rather than “output” oriented, success measures. For example, instead of measuring the number of cubic feet of accessions (an output of the accessioning process), we might measure the percentage of requests for records satisfied (which comes closer to reflecting the purpose of accessioning).”

This seminal essay is a fascinating read and groundbreaking analysis of the sorry state of appraisal. “What we have actually been doing is scheduling records to assure that nothing valuable is thrown away, but this is not at all equivalent to assuring that everything valuable is kept. Instead, these methods reduce the overall quantity of documentation; presumably we have felt that if the chaff was separated from the wheat it would be easier to identify what was truly important. The effect, however, is to direct most records management and archival energy into
controlling the destruction of the 99 percent of records which are of only temporary value, rather than into identifying the 1 percent we want, and making efforts to secure them.”

Using incendiary language, Bearman goes on to state the obvious: “Appraisal, which is the method we have
employed to select or identify records, is bankrupt. Not only is it hopeless to try to sort out the cascade of “values” that can be found in records and to develop a formula by which these are applied to records, 16 it wastes resources and rarely even encounters the evidence of those business functions which we most want to document.”

2D lifecycle or 3D continuum

This is a revolutionary essay, and I strongly encourage every archivist to read it and think about it deeply. The ideas have mostly languished and been ignored in this country as we continue to use the life cycle model, but Bearman’s ideas are written in the international standards for records management (ISO 15489) and widely embraced in Australia (and China) where, over the last two decades, they have conceptualized and implemented the “Australian records continuum” model to great effect and, in doing so, they are looking at born-digital assets and electronic records from perspectives of all users, functions, and needs. In my opinion, it seems like the continuum model is a 3D version of the lifecycle, which reminds me of this image from A Wrinkle in Time in which Mrs. Who and Mrs. Whatsit explain time travel to Meg and Charles Wallace by showing how an ant can quickly move across a string if the two ends are brought closer together. In other words, if archivists look at the desired end result, they can appraise and process accordingly.

After reading the Bearman essay for the first time and seeing how it has caused such dramatic changes in archival conceptualizations, methods, strategies and processes elsewhere, but is still not taught in any depth in US library or archival studies schools, I spoke with other nydawg members, and we decided to use it as the text as for our next discussion group on Tuesday August 23. I hope to revisit this topic later.

One last point. Because of the deluge of materials accessioned by archives, “uncataloged backlog among manuscripts collections was a mean of nearly one-third repository holdings”, leading the authors to claim “Cataloging is function that is not working.” With budgets cut and small staffs unable to make progress, Mark Greene and Dennis Meissner wrote another revolutionary piece titled “More Product, Less Process: Pragmatically Revamping Traditional Processing Approaches to Deal with Late 20th-Century Collections” [MPLP] which was a plea for minimal processing.

Unlike Bearman’s “Archival Strategies”, MPLP leads archivists to believe that we must remain passive filers or describers or catalogers or undertakers. But without a better understanding of appraisal and how to do it, we are doomed with analog, paper, born-digital or electronic records! The clearest example of this is the National Archives and Records Administration’s Electronic Records Archive (ERA) which, according to Archivist of the United States David Ferriero “At the moment, most of the electronic records in ERA are Presidential records from the George W. Bush White House. This important collection includes more than 200 million e-mail messages and more than 3 million digital photographs, as well as more than 30 million additional electronic records in other formats. ”

A few weeks ago, I actually crunched the numbers and figured out that 200 million emails over the course of eight years works out to nearly one email a second! (365 days a year x 8 years = 2920 days plus 2 (leap year days) 2922 x 24 hours a day = 70,128 hours x 60 mins in an hour = 4,207,680 x 60 seconds per minute = 252,460,800. )
After doing the math, my first thought was, “if we’re trying to process and preserve every email sent every second by the George W. Bush Administration, we must be doing something wrong.” And now, I think I understand the problem: we’re not doing effective appraisal. Although we still have to wait for public access to the emails, I am fairly confident that researchers will find that nearly 90 percent of the collection are duplicates, or that they are keeping copies of the sent email, the different received emails, plus backups of all of them. With better appraisal, this task should not be so difficult, and would leave more time for catalogers to do more detailed descriptions (which will be more important later, especially with different formats of “moving images” which are not compatible with newer versions of hardware (e.g. iPads don’t play Flash Video).

Libraries Outsource Digitization, Get Access, Give Away IP: Preservation Is Not a One-Time Cost 2011/08/21

Posted by nydawg in Archives, Digital Archives, Digital Preservation, Information Technology (IT), Intellectual Property.
Tags: BRTF, digital longevity, digital preservation, Internet Archive, media
add a comment

One thing that still baffles me, is the idea (still taught in many library schools), that once done, digitization projects preserve materials forever. Occasionally, there’s a few people who mention that “standards change” or “software becomes obsolete” or “media rot” or “links rot” or “bits rot” or ideas of “media refreshment” or “migration” or emulation or whatever, but mostly, the conventional wisdom is that when a collection is digitized it will remain accessible online and will be preserved “forever.”

So I was gratified to notice this important bit in the Blue Ribbon Task Force on Sustainable Digital Preservation and Access‘s Interim Report “Sustaining the Digital Investment: Issues and Challenges of Economically Sustainable Digital Preservation” pdf (not the FINAL report): “More than this, preservation is not a one-time cost; instead, it is a commitment to an ongoing series of costs, in some cases stretching over an indefinite time horizon.” (p. 18) Since this is such an important point especially when so many digitization projects are funded using transient or “one-time” vehicles, I was a little disappointed that this point was ignored and was not included in the Final Report.

The closest their “Sustainable Economics for a Digital Planet: Ensuring Long-Term Access to Digital Information” Final Report comes to mentioning this is: “Decisions about longevity are made throughout the digital lifecycle. Bench scientists face a choice at the end of every research project about what to do with their data. . . . The curatorial team at a museum that is bringing down an exhibition for which many original images and design materials were created faces similar decisions about what to retain and whose responsibility it is to provide and fund long-term retention. Preservation decision makers run the gamut from university provosts, foundations, and philanthropic funders to anyone who has created a website that has potential value for reuse.” (p. 11)

So I saw this piece, “Step Easily into the Digital Future” on American Libraries Magazine which shows that this idea has taken hold among cash-starved libraries looking for an easy (and cheap!) way to digitize their collections once and for all. Unfortunately, the trade-off seems to be that they are willing to pay for it AND let the digitizers keep the originals– while libraries can copy access copies. . . . . Hmm.

“Libraries know the future is digital, but how do we get there in these times of shrinking budgets and staffs? In a tough economy, a collaborative approach makes digitization possible for many libraries. By joining a mass digitization collaborative, the historical society, museum, public library, or academic institution new to digitization can launch a small project and unlock the doors to their hidden collections for the first time; the larger university or cultural heritage institution can mount a large-scale project and quickly achieve a digitization goal at low cost. The Lyrasis Mass Digitization Collaborative (MDC) is an example of a sustainable model that does not rely exclusively on grants or one-time funding; the collaborative works for libraries and cultural heritage institutions of all types and sizes.”

. . . “The MDC, administered by Lyrasis in partnership with the Internet Archive, is arguably the best deal going for libraries and similar institutions to get significant quantities of printed materials digitized and online-accessible very quickly and inexpensively,” said Gregory S. Sigman, acting librarian for the Music/Dance Library at Ohio University, in Lyrasis’s Solutions Magazine.”

so how does it work? It’s sounds so easy, as long as you don’t think too much about it? (Wait, the MDC gets to own to Intellectual Property too?) “Participating in the collaborative makes digitization easy for participants, whatever the size of their collection and budget, and whether or not they have experience and staff expertise in digitization. In the collaborative model, many steps along the way to digitization are already in place.

“Participants do not need to purchase equipment, select a metadata schema or digitization standards, set up a technical infrastructure for digitization and delivery, or provide for hosting, storage, and preservation. They follow best practices and collection development guidelines established by the collaborative. The entire project workflow is already set up and streamlined. The process is extremely simple and conducive to very quick turnaround: Libraries place an order; select items for digitization; prepare metadata; and ship or deliver to the scanning center. The collaborative shares the new digital resources on the web through its partnership with the Internet Archive and the archive’s involvement in the Open Content Alliance. Participants may also download copies of the digital resources to add to their own digital collections.”

While cost-effective and efficient, passing off these responsibilities to third-parties will not necessarily help libraries for much longer. As the Interim Report points out: “This preliminary finding (the authors note that their work is early and suggestive rather than exhaustive and definitive.) points to the importance of effective management
strategies early in the life cycle of information, confirming archivists long held belief, based on their experience, that preservation begins at creation. Not all material acquired may have been created with preservation in mind, however; the observation is most relevant for organizational settings where there is a requirement and a commitment to maintaining a record for the long-term. Further, acquisition and ingest tends to have a high setup cost, and particularly in the early days of digital preservation, every acquisition seems unique, requiring specialized lengthy analysis and processing strategies. Acquisition of at least partially processed material will become
more routine and, most likely, more standardized over time.”

Let’s hope so, but it might be helpful if the BRTF’s Final Report addressed many of the good ideas presented in the Interim Report, and libraries would begin to understand that the most cost-effective way to digitize collections (after initial start-up costs and equipment purchases) may be DIY.

Oh, and meanwhile, Internet Archive Canada is laying off 75% of its staff! “Though the office had initially experimented with automated scanning robots, the machines were unable to adapt to the wide variety of manuscripts and books. In 2005, the Internet Archive developed their own machines called Scribes, equipped with two high-resolution digital cameras poised above a v-shaped desk. These machines require human operators to turn the pages, meaning that they are more expensive to run than automated robots, but can handle fragile texts. An experienced operator can turn and scan two pages every six seconds, but layoffs mean the number of operators will drop from 27 down to 11. Output is expected to drop significantly, from current levels of around 1,500 books a month to 250.”

###

There’s an app for that: Why Software Is Eating Our Lunch 2011/08/20

Posted by nydawg in Digital Archives, Information Technology (IT), Intellectual Property, Media.
Tags: browser war, browsers, digital, digital divide, flash, smartphones, social media, tablets, web 2.0
add a comment

Those of us who can remember the early days of the “browser wars“, there used to be a small “start-up” type company called Netscape. . . . and before Internet Explorer or Safari (or Firefox or Chrome, etc.), the new browser offered an easy way for people to get online, surf the world wide web and bridge the digital divide. Today MARC ANDREESSEN the co-founder, weighs in with analysis ignoring how HP is jettisoning the webOS and buying Autonomy . . . .

“This week, Hewlett-Packard (where I am on the board) announced that it is exploring jettisoning its struggling PC business in favor of investing more heavily in software, where it sees better potential for growth. Meanwhile, Google plans to buy up the cellphone handset maker Motorola Mobility [aka Googorola]. Both moves surprised the tech world. But both moves are also in line with a trend I’ve observed, one that makes me optimistic about the future growth of the American and world economies, despite the recent turmoil in the stock market.”

Well, I guess he doesn’t really get into the webOS problem, but I found this bit about Borders and Amazon particularly interesting: “Perhaps the single most dramatic example of this phenomenon of software eating a traditional business is the suicide of Borders and corresponding rise of Amazon. In 2001, Borders agreed to hand over its online business to Amazon under the theory that online book sales were non-strategic and unimportant.

Oops.

Today, the world’s largest bookseller, Amazon, is a software company—its core capability is its amazing software engine for selling virtually everything online, no retail stores necessary. On top of that, while Borders was thrashing in the throes of impending bankruptcy, Amazon rearranged its web site to promote its Kindle digital books over physical books for the first time. Now even the books themselves are software.”

The reality is that the books themselves are a medium, not a software, but they will be dependent on hardware to read it. If it’s a flash video on iPad, you won’t be able to, but if it’s a DVD, you won’t be able to watch it on the iPad either!

Andreesen goes on to remind us that video and content with data caps are ultimately the future cash cow.
“Today’s largest video service by number of subscribers is a software company: Netflix. How Netflix eviscerated Blockbuster is an old story, but now other traditional entertainment providers are facing the same threat. Comcast, Time Warner and others are responding by transforming themselves into software companies with efforts such as TV Everywhere, which liberates content from the physical cable and connects it to smartphones and tablets.”

Read all about it : “Why Software Is Eating the World” from WallStreetJournal

« older posts newer posts »

nydawg New York Digital Archivists Working Group