• Tuesday, May 29, 2012

Previous

Next

The First Free Research-Sharing Site, arXiv, Turns 20 With an Uncertain Future

August 10, 2011, 1:16 pm

The pioneering effort to share scientific research without the restrictions of journal prices and embargoes, arXiv, turns 20 this month. Its founder, the physicist Paul Ginsparg, reflects on his effort “to level the research playing field” in the August 11 issue of Nature. That’s the kind of expensive journal Mr. Ginsparg originally wanted to work around. We can share some of his published thoughts here.

ArXiv, back in 1991 and still today, focuses on physics. “The original plan was for roughly 100 full-text article submissions each year,” writes Mr. Ginsparg, who works at Cornell University. Today the site gets about 75,000 of these “preprints” every year, and it serves up about one million full-text downloads to about 400,000 users every week. It holds roughly 700,000 texts.

Physicists had no problem jumping ahead of journal publication, Mr. Ginsparg notes, but other fields vary in how they assign claims to a discovery. “It baffles me that scientists in some fields can announce results in a public forum, such as a meeting, while another group can reproduce the results, publish first in a journal, and be given complete intellectual precedence,” he writes. Journals and referees need to take more care to give credit where credit is due.

He has noticed that, in a world where citations count for tenure and money, some researchers try to game the arXiv system. They time their submissions to arrive just after his daily deadline, which gives them first place in the next day’s announcement about new submissions. And that results in a greater number of citations.

The next steps for arXiv are going to happen without Mr. Ginsparg in charge. It “was supposed to be a three-hour tour, not a life sentence,” he writes. In September the site will be completely run by staff from the Cornell library. Of course, it costs money to keep the site running and updated with the latest technology to handle images and interactivity embedded in today’s papers. Last year the library asked for contributions from other institutions and got pledges from 85 of them for about $300,000.

But arXiv needs a more sustainable model. So next month, Mr. Ginsparg writes, Cornell will hold a meeting of institutions and stakeholders to talk about changing arXiv to a “collaboratively governed, community-supported resource.” That’s in the absence, he notes, of “a wealthy donor willing to provide a small endowment in exchange for far more name recognition than any traditional building donation (hint, hint).”

This entry was posted in Uncategorized. Bookmark the permalink.

  • Print
  • Comment
  • mottgreene

    This is one of the most interesting developments in open access I have seen in many years.
    Collaboratively governed = something like a board of editors,- community-supported = a subscription base, run by Cornell library staff = editorial offices.
    ArXiv is a superlative journal and this is a definitive experiment – about to morph into a structure that has possibilities to become limited access ( editorial decisions, reading or subscription fees).
    and subject to a different struture of norms, more traditional in each aspect
    As for the gaming of the timing of submissions, Derek Price pointed out 50 years ago that the “scientific paper” is much more a system designed to assign credit than to distribute knowledge,and has been so since its invention in the 17th century.

  • http://twitter.com/JordonAndrade Jordon Andrade

    Happy 20th ArXiv!

  • http://www.crsc.uqam.ca/ Stevan Harnad

    ARXIV’S FUNDING PAINS MAY BE A WAKE-UP CALL: INSTITUTIONAL VERSUS CENTRAL ARCHIVES 

    Anonymous FTP archives: Arxiv (1991) was an invaluable milestone on the road to Open Access. But it was not the first free research-sharing site: That began in the 1970′s with the internet itself, with authors making their papers freely accessible to all users net-wide by self-archiving them in their own local institutional “anonymous FTP archives”: http://www.w3.org/Protocols/rfc959/2_Overview.html

    Distributed local websites: With the creation of the world wide web in 1990, HTTP began replacing FTP sites for the self-archiving of papers on authors’ institutional websites. FTP and HTTP sites were mostly local and distributed, but accessible free for all, webwide. Arxiv was the first important central HTTP site for research self-archiving, with physicists webwide all depositing their papers in one central locus (first hosted at Los Alamos). Arxiv’s remarkable growth and success were due to both its timeliness and the fact that it had emerged from a widespread practice among high energy physicists that had already predated the web, namely, to share hard copies of their papers before publication by mailing them to central preprint distribution sites such as SLAC and CERN. 

    Central harvesting and search: At the same time, while physicists were taking to central self-archiving, in other disciplines (particularly computer science), distributed self-archiving continued to grow. Later web developments, notably google and webwide harvesting and search engines, continued to make distributed self-archiving more and more powerful and attractive. Meanwhile, under the stimulus of Arxiv itself, the Open Archives Initiative (OAI) was created in 1999 — a metadata-harvesting protocol that made all distributed OAI-compliant websites interoperable, as if their distributed local contents were all in one global, searchable archive.

    No need for direct central deposit in google: Together, google and OAI probably marked the end of the need for central archives. The cost and effort can instead be distributed across institutions, with all the essential search and retrieval functionality provided by automated central “overlay” services for harvesting, indexing, search and retrieval (e.g., OAIster, Scirus, Base andGoogle Scholar). Arxiv continues to flourish, because two decades of invaluable service to the physics community has several generations of users deeply committed to it. But no other dedicated central archive has arisen since. Like computer scientists, whose local, distributed self-archiving is harvested centrally by Citeseerx, economists, for example, self-archive institutionally, with central harvesting by RepEc.

    Mandating self-archiving: In biomedicine, PubMed Central looks to be an exception, with direct central depositing rather than local. But PubMed Central was not a direct author initiative, like anonymous FTP, author websites or Arxiv. It was designed by NLM, deposit was mandated by NIH, and deposit is done not only by authors but by publishers.

    Institutions are the universal research providers: Open Access is still growing far more slowly than it might, and one of the factors holding it back might be notional conflicts betweeninstitutional and central archiving. It is clear that Open Access self-archiving will have to be universally mandated, if all disciplines are to enjoy its benefits (maximized research access, uptake, usage and impact, minimized costs). The universal providers of all research paper output, funded and unfunded, are the world’s universities and research institutions, distributed globally across all scholarly and scientific disciplines, all languages, and all national boundaries. 

    Deposit institutionally, harvest centrally: Hence funder self-archiving mandates like NIH’s and institutional self-archiving mandates like Harvard’s need to join forces to reinforce one another, and the most natural, efficient and economical way to do this is to mandate that all self-archiving should be done locally, in the author’s institutional OAI-compliant repository. The contents of the institutional repositories can then be harvested automatically by central OAI-compliant repositories such as PubMed Central (as well as by google and other central harvesters) for global indexing and search.

    Distribute the archiving, rather than a central cost: In this light, Arxiv’s self-funding pains may be a wake-up call: Why should Cornell University (or a “wealthy donor”) subsidize a cost that institutions can best “sponsor” by each doing (and mandating) their own distributed archiving locally (thereby reducing total cost, to boot)? After all, no one deposits directly in Google…

    See: “How to Integrate University and Funder Open Access Mandates”
    http://openaccess.eprints.org/index.php?/archives/369-guid.htm

    Stevan Harnad
    EnablingOpenScholarship
    http://www.openscholarship.org/