From Scroll to Screen and Back: Vendor Lock-In and eBooks 2011/09/04
Posted by nydawg in Best Practices, Digital Archives, Digital Humanities, Digital Preservation, Information Literacy, Information Technology (IT).Tags: Amazon, Apple, ebooks, Kindle, saa, society of american archivists, vendor lock-in, web 2.0
add a comment
I saw an interesting article in the NYTimes titled “From Scroll to Screen” which looks at the transition in print media over the last two thousand years. While the author is specifically looking at the transition from books to eBooks, he declares, “The last time a change of this magnitude occurred was circa 1450, when Johannes Gutenberg invented movable type. But if you go back further there’s a more helpful precedent for what’s going on. Starting in the first century A.D., Western readers discarded the scroll in favor of the codex — the bound book as we know it today.”
Like many archivists and librarians, I am also highly interested in how this transition will work in the future. Last year I read William Powers‘ excellent Hamlet’s Blackberry which led me to new ways of thinking about media and different formats used to carry data and information between different stakeholders across time and space. . . .
So this article in the NYTimes by book critic Lev Grossman caught my interest when discussing how one format replaces the previous: “In the classical world, the scroll was the book format of choice and the state of the art in information technology. Essentially it was a long, rolled-up piece of paper or parchment. To read a scroll you gradually unrolled it, exposing a bit of the text at a time; when you were done you had to roll it back up the right way, not unlike that other obsolete medium, the VHS tape.”
He goes on to explain how those scrolls were items of prestige, probably because of the “scarcity” of scroll-creators. “Scrolls were the prestige format, used for important works only: sacred texts, legal documents, history, literature. To compile a shopping list or do their algebra, citizens of the ancient world wrote on wax-covered wooden tablets using the pointy end of a stick called a stylus. Tablets were for disposable text — the stylus also had a flat end, which you used to squash and scrape the wax flat when you were done. At some point someone had the very clever idea of stringing a few tablets together in a bundle. Eventually the bundled tablets were replaced with leaves of parchment and thus, probably, was born the codex. But nobody realized what a good idea it was until a very interesting group of people with some very radical ideas adopted it for their own purposes. Nowadays those people are known as Christians, and they used the codex as a way of distributing the Bible.”
And anyone who has ever tried to compare two or more passages in a book on an ereader, you may be interested to read: “The codex also came with a fringe benefit: It created a very different reading experience. With a codex, for the first time, you could jump to any point in a text instantly, nonlinearly.” This doesn’t quite work as easily in the tablet or eReader age, but stay tuned, as I imagine at some point they will improve on the technology. “If the fable of the scroll and codex has a moral, this is it. We usually associate digital technology with nonlinearity, the forking paths that Web surfers beat through the Internet’s underbrush as they click from link to link. But e-books and nonlinearity don’t turn out to be very compatible.
So as we move from the tried and trusted durable medium of the codex and hard-cover book (even if printed on cheap paper) to the electronic tablet (early versions, soon-to-be-obsolete operating systems, playing outdated versions), our content management expertise and digital asset curator skills should become more valuable as new technologies eveolve and media formats become obsolete and disposable and our culture is at-risk.
But our hands are tied. Even to address the pressing concerns of eBooks and eReaders and tablets, archivists are left out in the cold. We dare not say anything about the vendor lock-in regarding Kindle’s proprietary formats, because they are Amazon’s Intellectual Property. We cannot say anything about vendor lock-out in regards to Apple’s iPad tablet not playing (hot) Flash videos (see Steve Jobs “Thoughts on Flash”), and causing problems when accessing Fedora through its Flash application. We cannot even mention the fact that iPads do not have any support for portable SD card and USB 2.0 external drives. In other words, if you want to get information on (or off) your iPad, you probably have to email or upload it. . . 😦
So what can we do? Or, more clearly, what should a digital archivist know to catalog and describe when working with born-digital materials?! Well, of course there’s so much (not everything entirely relevant though!), but at the very least, better format (not just medium, but format, and maybe codec) descriptions can create better strategies leading to better plans, processes and best practices. And keep your damn books!
I’ll admit that I finally broke down and bought an eReader. Since I was going to be travelling to Chicago for the SAA, there were many articles I wanted to read and think about in advance, so for the last few months I was searching for the right one. Of course, I was quite wary of Kindles because of the proprietary format and wasn’t sure how well it would read PDFS (or if it could), but friends suggested the Digital Reader or Sony eReader, and at least one friend suggested checking out the Nook from Barnes and Noble.
I wasn’t really sure what I wanted, and I ddn’t really care that much. Basically, anything that would let me read PDFs, most of which I would download from the internet or specifically, the Society of American Archivists‘ (SAA), American Archivist journal. Of course, I also found some excellent reads on Project Gutenberg and Internet Archive (where, btw, they’re preserving physical books!). I’m really interested in the e-ink technology, so that was one factor, and the other factor was that I didn’t want to pay more than $130. (Another factor, on which I thought I would have to compromise was that I wanted an eReader with two screens that would open like a book.)
Well, as you might expect, my research was not leading me to any answers, and I had almost decided to just go with one or the other (whichever was cheapest), and know it would be a temporary solution, until I can afford to buy a nice tablet computer . . . . .But then one day, I got an email offer from Woot for a one-day only clearance sale of all 2-screen dualbook (eReader & Tablet) from Entourage Pocket Edge!! So I picked that up and I love it. . .. (Yes, there’s problems, but for reading pdfs and drawing on a journal or annotating pdfs and surfing the web on the tablet side, and etc. ) Maybe I’ll write more on it later, but for now, I hope you’ll just give a long thought about what we’ll lose if we give up functional non-linearity in our published works! (and I don’t mean Digital Humanities people with their datamining techniques..)
Football GOOOOOOL: Describing Every Action or Each Event 2011/08/30
Posted by nydawg in Digital Archives, Information Technology (IT), Intellectual Property, Media.Tags: football, on-demand, soccer, sports, streaming media
add a comment
When I was a kid, I used to play soccer and I loved playing and watching the game. To me, the sport was all about teamwork, continuous movement and evolving strategies. Now, years later, I still like the sport, but never play and only occasionally watch it on TV. Looking at it from an archivists’ perspective, there’s a few points which merit mentioning.
The first point is the distinction between soccer and football. In the US, we call it soccer, but almost everywhere else in the world knows it as football (not “American football”). So this got me to thinking about the way a culture names a sport, location (“English” channel ) or even art movement (Dada?). If digital humanities specialists were trying to research the development of the sport in this country and around the world, it might be a more difficult proposition because the terms used are different. . . . and may have changed over the years.
Another interesting difference between soccer and football on TV is the way it is described by the commentators. It is so infuriating to me to watch it on American TV because the English-speaking commentators are usually so caught up telling the dramatic (or financial) backstory of the players, that they neglect to accurately describe the action on the field as it happens. Although I’m not fluent in Spanish, I much prefer watching soccer in Spanish because they are telling a story in real-time. . . . and then when someone shoots and scores? GOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOLLLLLL! The announcers are excited as the plot develops. The point i’m trying to make is that although the “content” of the game (the action, the game, the video or moving images) is the same, but the Spanish-language commodity (w/ the play-by-play) makes it, in my opinion, a more valuable commodity because it is more descriptive (even though I don’t speak the language), especially for those who do not know (or care about) Manchester United or the football clubs in the UK (or elsewhere). I’m not interested in learning that David Beckham used to play with so-and-so, but if someone describes the six passes leading to a shot on goal, please do tell! Like other soccer fans, I am interested in the process and how the game develops, or specifically, in knowing that the left wing crosses the ball to the forward who is guarded by the fullback (or whatever). . . . If the people at FIFA or the World Cup wanted to build up an audience, I think they should really consider using Spanish-language commentators for the play-by-play, and then hiring translators to put it in English. (Yes, I’m sure that could probably be automated.) Let’s give soccer it’s due.
And this brings me to the point about the game, and why I loved it so much, how it’s different from baseball (or American football), and how it relates to archiving. In baseball, tennis and American football, the whole game is set up as one event (pitch, serve or down) after another, and each event is described (and archived) as something with a beginning middle and end. (“The pitcher winds up, releases and throws, called strike one.”) In soccer though, because it is so dependent on the uninterrupted passage of time, the whole game is about process and the subtle changes that happen as players work together to accomplish a “goal”. I find this fascinating, because as in archiving, parameters need to be set in order to determine best practices. For example, professional games usually run 90 minutes, but only the referees keep accurate time, so even if the clock expires, the game continues until the referee says it is finished. It’s not over-time, they keep track of how much “real” playing time is “owed” from injuries, fouls and other circumstances that arise in play.
In some ways, soccer is like basketball, but even that sport is set up and described as a series of events, and it is at such a fast pace, that play-by-play announcers don’t have to announce every pass, so they can usually focus on the end result “He shoots and scores!” I guess what I’m trying to say is that baseball and those other sports can be described mechanistically, or the descriptions can be automated.
A few years ago, I worked at a video indexing retrieval software company called Virage (later bought by Autonomy recently bought by HP), and worked in a small department in New Jersey that ” captured” (encoded, digitized, etc.) and cataloged every Major League Baseball game from the 2001 season from Opening Pitch of the first game to the last out of the last game of the World Series. Since I was a video technician, I was mostly responsible for making sure the video feeds (content) were captured, saved, renamed and uploaded so they would be accessible (and fully searchable) online within an hour after the game was concluded, but we also had stringers in the ballparks who were keeping score of the games, and electronically sending the data to our computers in New Jersey. At the end of the game, the data and footage would be combined so a fan or any subscribers could search for any event (e.g. every double Derek Jeter hit in the month of May, [but not every pitch]), and create a highlights reel to be streamed or (maybe) saved locally. MLB still does it, probably using the same software, but now they catalog every pitch and every commercial, and rather than simply rely on the software to combine the data, they hire seasonal workers to describe everything. What’s interesting to me is that the original project was simplified and stripped down and provided an excellent way to standardize scoring and create a functional system.
Working with these brilliant software engineers (some now work for Google), I learned a lot about how structured data can be used to create a more valuable commodity. If the software is created to match the scoring with the video feeds, then it becomes a mathematical exercise in which an event like a home run will take about 45 seconds, where as a strikeout (third strike) may take less than 1 second. By using data in this way, Virage figured out a way to quickly and efficiently make every baseball game fully searchable with a very small group of people. Today, it’s gotten a little more descriptive, but almost to the point of too descriptive, and not really functional anymore (in my opinion). Who would ever search for ““ground out,” “from knees,” “last out,” “premier plays,” instead of “no-hitter”!
I suppose it’s a testament to the original idea that MLB is using the software on a larger scale with more (internal) stakeholders than simply fans creating on demand highlight reels, but I’m still left wondering why, and how would any logger want to tag everything: “It is not only the game action that is tagged. If a squirrel runs onto the field, the play will be tagged with “animal.” If there is a shot of a man sipping a beer, there is a “drinking” option under the “fans” category. I suppose if you’re not a professional archivist or cataloger, it might seem “cool” to be able to do that, but as an archivist, I’m left thinking, “Are some taggers wasting their time tagging 1000s of events in a 3 hour game?
But, I guess if there’s an app for that, there must be some function somewhere at some time.
And of course, there’s some other very cool things that MLB is able to do with their content in order to maximize subscriptions, but really, do you need to hire some person to point out that a squirrel ran on the field some day.
Paradigm Shift from Economics of Scarcity to Abundance & Scarcity of Common Sense 2011/08/24
Posted by nydawg in Archives, Best Practices, Digital Archives, Information Technology (IT), Intellectual Property.Tags: access, common sense, data deluge, digital deluge, information age, saa, society of american archivists
add a comment
One of the most exciting (and scary!) aspects of being a digital archivist these days is that everyone is living through a transition from the Atomic Age (age of atoms) to the Information (Digital) Age (age of bits), but archivists are also living through a professional paradigm shift from the economics of scarcity to the economics of abundance. There is so much born-digital information created every day, month, year, decade, etc., that it is overwhelming just to contemplate how much information is created (and stored), and, while it seems like archivists are doing more and more work, there is some question about the metrics used to show that we are preserving the most significant material. (e.g. NARA is accessioning 200 million emails from the George W. Bush Administration which, as I’ve blogged previously, works out to nearly one email every second of the Administration).
For me, this is a fascinating time for archivists because few people seem to understand how significant this transition is and will be. In fact, from my experiences from library schools, many older faculty members seem unwilling (or unable?) to articulate this transition and, by extension, cannot even teach younger students how these changes will impact their lives and professions. So rather than try to address these issues head-on, some educators ignore them and assign student readings from books written in the early 1990s or before. (I have nothing against the study of “history”, but practicality would be helpful for students trying to get jobs as Information or Knowledge Mangers.)
Years ago, for example, when President John F. Kennedy wrote a memo or correspondence, his secretary would type it up in triplicate and send one copy to the intended recipient, file a second copy in the office, and send the third copy to the archives. Decades later, if somebody wanted to find the original, the office copy or the archived version, it would most likely be filed away and accessible in its original paper format. This system worked very well for hundreds and probably thousands of years! In the Information Age, a similar memo for President Barack Obama might be created by the secretary as a born-digital Word format file, and copies of the file could (or should) be distributed in a similar manner (or perhaps converted to ISO 32000 PDF/A format for stable long-term preservation). This may or may not be happening, but one big difference is that these electronic records (or born-digital files) are dependent on the software used to create them, and if the software is upgraded or replaced and newer versions are not backwards compatible, it may prove difficult to find, access and open those files. (Also, it’s important to note that those files may have been stored on any variety of media formats which are no longer supported or accessible (e.g. remember Jaz drives or zip disks or CD-ROMS or 5.25″ floppys disks?)
To prevent losing mass quantities of materials, many libraries subscribe to LOCKSS or Lots Of Copies Keep Stuff Safe. This may work for electronic journals created in PDF/A format, but it doesn’t work so well if ALL those copies are in a format (or font) that is obsolete or not supported– and/or are stored on a medium (floppies) that are no longer accessible on newer technologies (eg. iPads don’t have a DVD-ROM drive or a USB port)!
But this strategy may not work for Digital Archives because of the difference between accessibility and access or, as James Gleick, author of The Information: A History, A Theory, A Flood, puts it: “We’re in the habit of associating value with scarcity, but the digital world unlinks them. You can be the sole owner of a Jackson Pollock or a Blue Mauritius but not of a piece of information — not for long, anyway. Nor is obscurity a virtue. A hidden parchment page enters the light when it molts into a digital simulacrum. It was never the parchment that mattered.”
As Maria Popova puts it in her excellent essay “Accessibility vs. access: How the rhetoric of “rare” is changing in the age of information abundance“: “Because in a culture where abundance has replaced scarcity as our era’s greatest information problem, without these human sensemakers and curiosity sherpas, even the most abundant and accessible information can remain tragically “rare.””
Archivists and librarians have mastered the processes and practices from an earlier era of scarcity (e.g. item-level description) and seem unwilling (or unable) to consider a new and more efficient model. I was trying to think of an analogy for this, and it hit me in Kennedy Airport where I am waiting for my flight to SAA’s annual meeting in Chicago: For hundreds and thousands of years, men and women have moved around while struggling to pack and carry their luggage, but it wasn’t until 1970 that Bernard Sadow “invented” the suitcase with wheels. What took so long?
It’s hard to say exactly what took so long, but it seems likely that travelers (especially macho travelers) had gotten so used to the inconvenience of lugging their heavy luggage through changing transportation systems that no one considered an easier, faster and better way. But ultimately “common sense” won out, and now just about everyone (except me) has wheels on his/her luggage. Why am I still holding out?! I’m still waiting for suitcases that fly!
MP3tunes Cloud Music Storage Is Not a Crime 2011/08/23
Posted by nydawg in Digital Archives, Digital Preservation, Information Technology (IT), Intellectual Property, Media.Tags: cloud storage, music, record labels
add a comment
Michael Robertson is a name that is very familiar to those of us who have been involved in digital media for the last decade or so. He founded his first company, MP3.com in 1997 and, after a successful IPO, was targeted and shut down by the record labels a few years back. Obviously he still has a few disruptive ideas that directly threaten the labels’ existence including MP3tunes. So this sounds like big news. “The disk drives powering Dropbox, Amazon’s Cloud Drive, and Google Music likely issued a small sigh of relief Monday, after a federal court judge found that the MP3tunes cloud music service didn’t violate copyright laws when it used only a single copy of a MP3 on its servers, rather than storing 50 copies for 50 users.”
The recording industry argued that music locker sites are illegal with no licenses from copyright holders
Specifically, the part that stands out for archivists and online storage lockers is this bit from Ars Technica: “The ruling contains even more good news for music locker sites and fans of sensible copyright laws. As we reported last month, a key 2008 decision had suggested that locker sites would be more vulnerable to copyright infringement claims if they used deduplication technology to save hard drive space. That ruling was based on the theory that keeping a single “master copy” of a work and sending it to multiple users would constitute an infringing public performance.”
But Judge William H. Pauley III in this case said that MP3tunes’ system complies with that ruling. “Importantly, the system preserves the exact digital copy of each song uploaded to MP3tunes.com,” Pauley ruled. “Thus, there is no “master copy” of any of EMI’s songs stored on MP3tunes’ computer servers.” Curiously, the judge goes on to state that “MP3tunes does not use a ‘master copy’ to store or play back songs stored in its lockers. Instead, MP3tunes uses a standard data compression algorithm that eliminates redundant digital data. “Sherwin Siy of Public Knowledge hailed the ruling, saying it “paves the way for both cloud locker services and integrated media search engines.”
We’ll see how this plays out in the future, but you can bet that sites like Google, Dropbox, and Amazon who offer similar services (without making a deal with the record labels) are very relieved. Meanwhile, we will see how Apple iTunes deals with this in the future. (They’ll no longer need to get agreement with the record labels?!) Read about it in Wiredand Ars Technica.
Libraries Outsource Digitization, Get Access, Give Away IP: Preservation Is Not a One-Time Cost 2011/08/21
Posted by nydawg in Archives, Digital Archives, Digital Preservation, Information Technology (IT), Intellectual Property.Tags: BRTF, digital longevity, digital preservation, Internet Archive, media
add a comment
One thing that still baffles me, is the idea (still taught in many library schools), that once done, digitization projects preserve materials forever. Occasionally, there’s a few people who mention that “standards change” or “software becomes obsolete” or “media rot” or “links rot” or “bits rot” or ideas of “media refreshment” or “migration” or emulation or whatever, but mostly, the conventional wisdom is that when a collection is digitized it will remain accessible online and will be preserved “forever.”
So I was gratified to notice this important bit in the Blue Ribbon Task Force on Sustainable Digital Preservation and Access‘s Interim Report “Sustaining the Digital Investment: Issues and Challenges of Economically Sustainable Digital Preservation” pdf (not the FINAL report): “More than this, preservation is not a one-time cost; instead, it is a commitment to an ongoing series of costs, in some cases stretching over an indefinite time horizon.” (p. 18) Since this is such an important point especially when so many digitization projects are funded using transient or “one-time” vehicles, I was a little disappointed that this point was ignored and was not included in the Final Report.
The closest their “Sustainable Economics for a Digital Planet: Ensuring Long-Term Access to Digital Information” Final Report comes to mentioning this is: “Decisions about longevity are made throughout the digital lifecycle. Bench scientists face a choice at the end of every research project about what to do with their data. . . . The curatorial team at a museum that is bringing down an exhibition for which many original images and design materials were created faces similar decisions about what to retain and whose responsibility it is to provide and fund long-term retention. Preservation decision makers run the gamut from university provosts, foundations, and philanthropic funders to anyone who has created a website that has potential value for reuse.” (p. 11)
So I saw this piece, “Step Easily into the Digital Future” on American Libraries Magazine which shows that this idea has taken hold among cash-starved libraries looking for an easy (and cheap!) way to digitize their collections once and for all. Unfortunately, the trade-off seems to be that they are willing to pay for it AND let the digitizers keep the originals– while libraries can copy access copies. . . . . Hmm.
“Libraries know the future is digital, but how do we get there in these times of shrinking budgets and staffs? In a tough economy, a collaborative approach makes digitization possible for many libraries. By joining a mass digitization collaborative, the historical society, museum, public library, or academic institution new to digitization can launch a small project and unlock the doors to their hidden collections for the first time; the larger university or cultural heritage institution can mount a large-scale project and quickly achieve a digitization goal at low cost. The Lyrasis Mass Digitization Collaborative (MDC) is an example of a sustainable model that does not rely exclusively on grants or one-time funding; the collaborative works for libraries and cultural heritage institutions of all types and sizes.”
. . . “The MDC, administered by Lyrasis in partnership with the Internet Archive, is arguably the best deal going for libraries and similar institutions to get significant quantities of printed materials digitized and online-accessible very quickly and inexpensively,” said Gregory S. Sigman, acting librarian for the Music/Dance Library at Ohio University, in Lyrasis’s Solutions Magazine.”
so how does it work? It’s sounds so easy, as long as you don’t think too much about it? (Wait, the MDC gets to own to Intellectual Property too?) “Participating in the collaborative makes digitization easy for participants, whatever the size of their collection and budget, and whether or not they have experience and staff expertise in digitization. In the collaborative model, many steps along the way to digitization are already in place.
“Participants do not need to purchase equipment, select a metadata schema or digitization standards, set up a technical infrastructure for digitization and delivery, or provide for hosting, storage, and preservation. They follow best practices and collection development guidelines established by the collaborative. The entire project workflow is already set up and streamlined. The process is extremely simple and conducive to very quick turnaround: Libraries place an order; select items for digitization; prepare metadata; and ship or deliver to the scanning center. The collaborative shares the new digital resources on the web through its partnership with the Internet Archive and the archive’s involvement in the Open Content Alliance. Participants may also download copies of the digital resources to add to their own digital collections.”
While cost-effective and efficient, passing off these responsibilities to third-parties will not necessarily help libraries for much longer. As the Interim Report points out: “This preliminary finding (the authors note that their work is early and suggestive rather than exhaustive and definitive.) points to the importance of effective management
strategies early in the life cycle of information, confirming archivists long held belief, based on their experience, that preservation begins at creation. Not all material acquired may have been created with preservation in mind, however; the observation is most relevant for organizational settings where there is a requirement and a commitment to maintaining a record for the long-term. Further, acquisition and ingest tends to have a high setup cost, and particularly in the early days of digital preservation, every acquisition seems unique, requiring specialized lengthy analysis and processing strategies. Acquisition of at least partially processed material will become
more routine and, most likely, more standardized over time.”
Let’s hope so, but it might be helpful if the BRTF’s Final Report addressed many of the good ideas presented in the Interim Report, and libraries would begin to understand that the most cost-effective way to digitize collections (after initial start-up costs and equipment purchases) may be DIY.
Oh, and meanwhile, Internet Archive Canada is laying off 75% of its staff! “Though the office had initially experimented with automated scanning robots, the machines were unable to adapt to the wide variety of manuscripts and books. In 2005, the Internet Archive developed their own machines called Scribes, equipped with two high-resolution digital cameras poised above a v-shaped desk. These machines require human operators to turn the pages, meaning that they are more expensive to run than automated robots, but can handle fragile texts. An experienced operator can turn and scan two pages every six seconds, but layoffs mean the number of operators will drop from 27 down to 11. Output is expected to drop significantly, from current levels of around 1,500 books a month to 250.”
dk
###