NARA and the web harvest: a discussion of the issues

As I wrote last Thursday, the National Archives has decided not to conduct a harvest of Federal web sites at the end of this Presidential administration. In my post, I referred to this as a “public relations error.” It looks like I was right. Take a look at some of these links if you want to see how this story is being portrayed on the web:

After my post went up, I was encouraged to look into this situation more carefully. Many of the issues at stake in this controversy have their roots in key archival principles, and I think it’s our duty as archivists to bring understanding of those issues to the public debate. I’ll provide some basic background first, then discuss some of the appraisal and resource issues.

In January 2001 NARA collected a “web snapshot” by having Federal agency CIOs collect and transfer to NARA a “snapshot” of the agency’s public web site. The intent was to “ensure that we are able to document at least in part agency use of the Internet at the end of the Clinton Administration.” You can read more about this effort here. I do not believe the web records collected by this effort are currently available online.

In January 2005 NARA issued its Guidance on Web Records, which clarified that Federal web sites (both public-facing and intranets) are Federal records and must be scheduled like any other Federal record. It is the responsibility of Federal agencies and their records managers to develop schedules for their web records and submit them to NARA for approval. Those aspects of the agency web site which are determined by the agency and NARA to be of permanent value will be transferred to NARA custody in accordance with the disposition instructions in the schedule and NARA’s transfer guidance.

Around the same time the web record guidance came out, NARA conducted a web harvest of all government web sites as they existed prior to January 20, 2005. You can see the records harvested and more background here. NARA conducted another harvest of House and Senate public web sites as they existed prior to December 11 2006 which you can see here.

There were many issues with these web harvests. They did not necessarily capture the entire public site because they only captured up to four levels of depth. They did not capture agency or Congressional intranet sites. They provided only a snapshot of the sites at one particular moment. They were very expensive.

For archivists, these web harvests should be troubling because they dispense with the process of appraisal. In effect, anything on the top four levels of an agency’s web site was determined to be of permanent value. For NARA, they also established a troublesome precedent. Would NARA routinely conduct these harvests? If NARA was already capturing their web sites, why should agencies bother to schedule or transfer their web records?

Having conducted the harvest at the end of the last Presidential administration, NARA was now faced with the decision of whether or not to do another such harvest next year. Here are some factors that might have been taken into account in their decision making:

  • Unlike in 2004, NARA has had guidance for agency web records in place for several years now. Agencies should clearly understand what their responsibilities are and the process they need to follow.
  • If they conducted another web harvest, it might send the wrong message to agencies. It might give an excuse, for those agencies looking for one, for them not to schedule their web records, because NARA is “preserving them anyway.”
  • Such web harvests are very expensive, costing perhaps millions of dollars, and NARA, like most parts of the government, is strapped for resources.
  • There are other organizations, such as the Internet Archive and NARA’s affiliated archives, the University of North Texas Libraries, which have taken on the function of preserving some aspects of Federal web sites. The UNT, for example preserves “deceased federal agency web sites, the Congressional Research Service Reports electronic archive, and more.”
  • The harvests obligate NARA to permanently devote resources to preserve records that are not necessarily of permanent value.
  • The harvest process is in direct opposition to the archival process of appraisal.

For me, as an archivist and a former NARA employee, that’s a pretty compelling list of reasons against making another harvest.

Stacked on the other side of the argument is that there is a public expectation, created by the previous harvests, that this is something NARA regularly does. In fact, on its own web site about the harvests, NARA states:

“The National Archives and Records Administration (NARA) conducts a harvest (i.e., capture) of Federal Agency and Congressional public web sites as they exist at the end of each Presidential term and a harvest of Congressional web sites at the end of the Congressional term that does not coincide with a Presidential term.”

So deciding not to do a harvest is a break with existing practice and public statements, if not actual policy. There is also the possibility that agencies aren’t properly scheduling or transferring their web records, and that conducting a harvest preserves some records that would otherwise not be preserved by NARA (although they might be preserved by third parties, such as the Internet Archive).

I think NARA made the right decision. I now regret that in my previous post I agreed with the statement that NARA was abdicating their responsibility. They are complying with their responsibility by following the regulations and processes already in place for scheduling, appraising, accessioning, and preserving Federal records of permanent value. If there are concerns about what web records are being preserved, the available resources should be dedicated to addressing those concerns within the existing process. If the process needs to change in response to the shorter lifecycle of web records, then the process needs to be changed, not abandoned.

What I strongly disagree with was the way NARA presented their decision to the public. It appears as if the decision was announced, with very little justification or discussion, in a memo circulated only to Federal records officers. I don’t know if there was a plan for communicating the decision to the general public, but the memo made its way to a journalist. You saw the outcome in the list of links at the top of this post. Now they are having to justify a decision in the face of public outcry, and that is never a good place to be. If they had communicated their decision more effectively, and laid out all the reasoning behind their decision, they might still face public concern but they’d be in a much stronger position.

Now I am afraid that they will be forced to do another crawl, spending millions of dollars that may not have been budgeted for this activity. Agencies will have another excuse not to schedule their records, and NARA’s public image has a bit of a black eye. I believe on some blogs people are even speculating that this is some kind of Bush administration-backed effort to destroy evidence (which I have every reason to think is not true).

There are issues of archival principle here and issues of resources. It’s a real life case study playing out in front of us, and one that could have dramatic implications for NARA. What do you think about the decision and how it was handled?

Be Sociable, Share!
This entry was posted in Electronic records, Government information, National Archives & Records Administration (NARA), Web 1.0 and archives. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *