Catching up with past NDSA Innovation Awards Winners: Samantha Abrams

 

Nominations are now being accepted for the NDSA 2020 Innovation Awards.Samantha Abrams

Samantha Abrams won a 2016 Innovation Award in the Future Steward category. Samantha was recognized for her work with the Madison Public Library and its Personal Archiving Lab as well as her initiative to create innovative projects and classes. She is currently the Web Resources Collection Librarian for the Ivy Plus Libraries Confederation.

What have you been doing since receiving an NDSA Innovation Award?

Since May 2017, I’ve been the Web Resources Collection Librarian (read: Web Archivist) for the Ivy Plus Libraries Confederation, based at Columbia University. (When I received the award, I was the Community Archivist at StoryCorps.) In support of the Confederation, I manage the Web Collecting Program, a collaborative collection development effort to build curated, thematic collections of freely available, but at-risk, web content in order to support research at participating Libraries and beyond. Right now, we have 21 public collections on topics ranging from elections, to webcomics, to video games, to vaccination, and just about everything in between. We’re also working on a Coronavirus collection that will document social responses to the virus — music videos, art, writing, and more — in countries all over the world.

I’ve also been teaching Introduction to Web Archiving at the University of Wisconsin-Madison’s iSchool — one course for Master’s students, and one for Continuing Education students. I’ve engineered the course to be an overview of web archiving concepts: students pick a theme around which to collect, write a collection policy, use Archive-It, Conifer, and Perma.cc to crawl identified websites, and then, at the end of the course, make the case — or not! — for web archiving at their institution. It’s been a lot of fun!

What did receiving the NDSA Award mean to you?

I received the Future Steward award in 2016, right before I graduated with my Master’s degree in Library and Information Studies, which felt like a huge vote of confidence, personally — like I had picked the right path, and that I should keep at it. Receiving the award also motivated me to get more involved with both the National Digital Stewardship Alliance and the Digital Library Federation — I served as a member of the Planning Committee for the DLF Forum in 2019, and as a member of NDSA’s Innovation Working Group (which selects Innovation Award winners) in both 2019 and 2020. Both are excellent organizations with which to work!

What efforts, advances, or ideas over the last few years have you been impressed with or admired in the area of digital stewardship?

I still deeply admire the work of Documenting the Now — after the murder of George Floyd, Documenting the Now launched Archivists Supporting Activists, which connects archivists and memory workers willing to volunteer their time and expertise with with activists interested in documenting their vital work. In the same vein, the Blackivists — ‘a collective of trained Black archivists who prioritize Black cultural heritage preservation and memory work’ — recently released a call to action to ‘ethically and comprehensively archive’ both the Black experience during the ongoing Covid-19 pandemic and the current uprisings brought about by racist police violence against Black people. Both organizations — and others, like Project Stand —  encourage archivists to ethically and carefully engage with the communities they serve and document, and to be deliberate in their work as they collect and attempt to make sense of current events. I also remain deeply inspired by so many of my students — their creativity and willingness to approach web archiving and digital preservation with a careful eye is refreshing, and constantly recenters and reframes my own day-to-day work.

The 2020 NDSA Agenda discusses a number of web and social media archiving challenges, one of them being how labor-intensive much of the work is. How do you make visible your labor? Do you have any tips on advocating for additional resources for web archiving?

Oh, this is such a good question. I’m lucky because my position with the Confederation is full-time — I spend 40 hours per week on web archiving, and nothing else. (I’m also deeply indebted to Jean Park, the Program’s Bibliographic Assistant, who helps with metadata creation, quality assurance, and just about everything in between.) I make sure I’m direct with supervisors and colleagues about how long it will take to make our collections public — I can’t just drop a bunch of sites into Archive-It and make them available to researchers the next day: there’s metadata creation, running crawls, and quality assurance. And it’s quality assurance that takes the longest: I’ve easily spent a few hundred work hours since April performing quality assurance on the Confederation’s forthcoming Coronavirus collection — does this video play? Was this spreadsheet captured? Do the images look like their counterparts on the live site? Our goal, Program-wise, is to view each crawled website at least once before it’s made public — and sometimes that means a site won’t be made public for weeks or months. (I also have my students spend a week on quality assurance, which I know they don’t love — but it prepares them to go back to their own supervisors and directors and really push for the resources — and time — they’ll need to adequately support a fleshed-out web archiving program at their own institutions.)

Is there anything we didn’t ask you that you want to add?

We’re still accepting nominations for the 2020 Innovation Awards! Please help acknowledge and celebrate a new cohort of innovators by submitting worthy nominees — or by nominating yourself — via this form. Nominations are due by Friday, September 4, 2020.

Catching up with past NDSA Innovation Awards Winners: Archive Team

 

Nominations are now being accepted for the NDSA 2020 Innovation Awards.

Archive Team won a 2013 Innovation Award in the Organization category. Archive Team was recognized for both for its aggressive, vital work in preserving websites and digital content slated for deletion and for its work advocating for the preservation of digital culture within the technology and computing sectors. The answers below were provided by Jason Scott.

 

What has Archive Team been doing since receiving an NDSA Innovation Award? 

Since receiving the award, Archive Team has gone through a half-dozen generations of volunteers, entering idealistic young eggs and leaving as some sort of burnt-crisp buffalo wings. Our numbers have grown and shrank but generally are high, as people realize how fundamentally fragile and undependable the web continues to be and the need for someone, anyone, to provide a decent mirroring of user data.

We’ve been involved in well over 100 major projects to save websites and especially user-created works over the years, and untold thousands of tiny one-off jobs that our automated mirror service, ArchiveBot, has been sent over to do. On an average day, we generate a terabyte of preserved web content that often ends up at Internet Archive.

At the NDSA event we announced we had modified WGET to support WARC – just this year we have a new version of our use of WGET which has a strong attention to compression, meaning we’re saving a lot of space. For a rag-tag set of maniacs, that’s pretty good.

What did receiving the NDSA Award mean to you?

It mostly let us poke our head into established archivist world, which is a nice world even if we’re not a part of it. Being an activist can make you start to believe you’re the only force in the world, running forward without any need of collaboration or peers. The award gave us contact and awareness that we’re not alone, which made us better and mindful of practices and efforts that were doing items similar to us. We’re still the funniest, though.

What efforts, advances, or ideas over the last few years have you been impressed with or admired in the area of digital stewardship?

The Software Heritage foundation has been tireless in recognizing the important meaning of software and the source code behind it to keep that level of history alive. From a few people advocating to it to companies like github now working to mirror as many repositories as possible in solid data stores is a big deal.

The discovery and rediscovery that history is not only written by the winners, but stored as well; along with this is a greater need to have records of websites and history to prove points and provide evidence, and we’ve been delighted to be part of that.

One of the 2020 NDSA Agenda research priorities is environmental sustainability and sustainability of digital collections. How is Archive Team addressing these issues?

We’re not; we’re too busy saving thousands of URLs that are dying out from underneath us. We haven’t caught a breath in 11 years.

Is there anything we didn’t ask that you want to add? 

Hello, established Archive People! Archive Team is always looking for you to moonlight and join the rough and tumble band of singing dancing chorus line of volunteers we run through like off-brand batteries. Test yourself at https://archiveteam.org.

Catching up with past NDSA Innovation Awards Winners: DataUp

 

Nominations are now being accepted for the NDSA 2020 Innovation Awards.Screenshot of the DataUp interface

DataUp from the California Digital Library won a 2013 Innovation Award in the Project category. DataUp was recognized for creating an open-source tool uniquely built to assist individuals aiming to preserve research datasets by guiding them through the digital stewardship workflow process from dataset creation and description to the deposit of their datasets into public repositories. The following individuals are recognized for their contributions to DataUp and subsequent projects, and responses to this Q&A.

The original CDL DataUp team included:

  • Stephen Abrams, then CDL Associate Director of the UC Curation Center, currently Head of Digital Preservation, Harvard Library
  • Patricia Cruse, then CDL Director of the UC Curation Center, subsequently Executive Director of DataCite, and now retired
  • John Kunze, CDL Identifier Systems Architect
  • Carly Strasser, then CDL Data Curation Project Manager, currently Program Manager for Open Science at the Chan Zuckerberg Initiative

Current CDL staff responsible for the successor Dash and Dryad projects are:

  • John Chodacki, CDL Director of the UC Curation Center
  • Daniella Lowenberg, CDL Research Data Specialist and Product Manager

What has DataUp been doing since receiving an NDSA Innovation Award?

DataUp was conceived by the University of California Curation Center (UC3) at the California Digital Library (CDL) as an immediate response to the needs of researchers for an intuitive, effective, and self-service data curation platform.  DataUp initially targeted support for tabular datasets via an easy-to-use UI accessible to researchers themselves, rather than requiring mediation by librarians or archivists.  At the same time, CDL was engaged in other related initiatives, including the DataShare open data publication system.  Over time, the curatorial intentions and functional capabilities of both systems began to overlap considerably.  Consequently, in 2014 CDL decided to converge the two systems into a common technical platform under the Dash name.  More recently, similar synergies were recognized between Dash and the Dryad research data repository, which led to the integration of the Dash system as the new Dryad technology platform.  Throughout this multi-year evolution, the core principles and goals of the original DataUp project have remained steadfast: providing the best possible support to the scholarly community for the long-term curation, publication, and reuse of critical research data.

What did receiving the NDSA Award mean to you?

Receiving the NDSA Innovation Award was very gratifying as public affirmation by a significant stakeholder community of the value and beneficial impact of the DataUp vision, project, product, and service.  While the DataUp team was convinced of that value right from the start, it is always nice to have those beliefs recognized and confirmed by colleagues and peers.

What efforts, advances, or ideas over the last few years have you been impressed with or admired in the area of digital stewardship?

Tremendous strides forward have been made in digital stewardship over the past years.  This has been facilitated in large part by mutual recognition of all implicated stakeholders – scholars, administrators, librarians, archivists, funders – of the nature of common problems and needs and the necessity for coordinated response.  Positive outcomes have followed from the open contribution of their individual perspectives and strengths in collaborative efforts.  For example, the success of the DataUp/DataShare/Dash/Dryad activity called upon the active participation over many years by the CDL, University of California Libraries, the DataONE network, Microsoft Research, the Gordan and Betty Moore Foundation, the Alfred P. Sloan Foundation, DataCite, the Make Data Count initiative, and the Dryad community.  Looking towards the future, there are very promising avenues of exploration regarding the application of big data and machine learning techniques to the proactive curation of research data and other forms and genres of digital content deserving long-term stewardship.

The DataUp project began in 2011 – nearly a decade ago! Various challenges of preserving and providing access to research data sets continue to be discussed, and have been addressed in the 2014, 2015, and 2020 NDSA Agendas for Digital Stewardship. Where do we go from here?

The guiding tenets originally encapsulated by DataUp and its DataShare, Dash, and Dryad successors are fully consistent with the NDSA Agenda’s recommendations for organizing and ensuring long-term access to scientific data sets, including support for at-scale curation, promotion of the FAIR principles, and collaborative attention to innovation and sustainability (https://osf.io/7sfc6/, p. 26).  Three specific concerns seem particularly challenging and call out for concerted attention.  First, the academy as a whole needs to continue development of more flexible and sustainable financial practices concerning the curation of all legitimate research outputs, including research data, to avoid dis-incentivizing and confounding widespread adoption of effective RDM tools and practices.  Second, greater automation and intuitive self-service operation is still needed regarding the contribution of research data to managed curation environments such as Dryad.  Ideally, these actions would be automatic side-effects of other, more primary activities and workflows with which scholars and researchers are already engaged.  And third, more can be done regarding actionable linkages between research publication, research data, and research software, all of which interact within a cohesive and co-dependent web of scholarly activity and communication.  We feel that DataUp provided a pioneering attempt at addressing these issues and look forward to continuing progress towards these important goals.

Skip to content