Yassine Gargouri1, Steven Harnad1,2 & Chawki Hajjem1
(1) Cognition & Communication Laboratory
(2) Canada Research Chair in Cognitive Sciences, Université du Québec à Montréal
To maximize the benefits from research, findings need to be disseminated as broadly as possible to allow access by other researchers and the wider community. Some have argued that the Open Access (OA) Advantage might be all or mostly just quality (Self-Selection) bias. This study, carried out on a sample of 27197 articles published from 2002 to 2006 on 1984 journals, aims to show that citation advantage of OA is not necessarily result of self-selection that authors make for their best articles. To show this, we compare articles citations counts for authors having an institutional publication mandate, to those for not-mandated articles (across all fields) published in the same journal and year. In fact, the mandatory archives are open institutional deposits, where affiliated researchers are obliged to archive their publications whatever their quality. Hence, for this kind of mandated articles, there is no self-selection effect. Results show that there is an OA advantage for non-mandated articles as well as mandated ones. In addition, a logistic regression analysis has been conducted to study the multiple correlations between citation count (predictive variable) and a set of potential explanatory variables.
The 25,000 peer-reviewed journals and conference proceedings that exist today publish about 2.5 million articles per year, across all disciplines, languages and nations. No university or research institution anywhere, not even the richest, can afford to subscribe to all or most of the journals that its researchers may need to use (Odlyzko 2006). So, all articles are currently losing some of their potential research impact (usage and citations), because they are not accessible online to all its potential Internet users.
This is confirmed by recent findings, independently replicated by many investigators, showing that articles for which their authors have supplemented subscription-based access to the publisher’s version by self-archiving their own final drafts free for all on the web (“Open Access”, OA) are downloaded and cited twice as much across all 12 scientific, biological, social science and humanities disciplines analyzed so far (Lawrence 2001; Brody & Harnad 2004; Hajjem et al. 2005; Moed 2005; Kurtz & Brody, 2006). Hence, OA is not only about human rights and the greater circulation of knowledge. It is about increasing research impact. A work’s research impact is not only a measure of what it contributes to the work of others. It helps as well, for the recognition and reputation of the author.
Only 15% of the 2.5 million articles published annually are being spontaneously self-archived worldwide today (Hajjem and al, 2005). Creating an Institutional Repository (IR) and encouraging staff to self-archive their articles therein is a good first step, but it is not sufficient to raise the self-archiving rate appreciably above the spontaneous self-archiving one.
Just requesting or recommending deposit does not work. When comparing mandated and non-mandated self-archiving, rates have shown that mandates (and only mandates) work, with self-archiving of about 60% to 100% of annual institutional research output within a few years. 95% of researcher’s sampled report would self-archive if required to do so by their employers and/or funders: 81% of them willingly, 14% reluctantly, only 5% would not comply with the requirement (Swan & Brown, 2005).
Universities' own IR are the natural locus for the direct deposit of their own research output: Universities are the research providers and have a direct interest in archiving, monitoring, measuring, evaluating, and showcasing their own research assets, as well as in maximizing their uptake, usage and impact. OA self-archiving mandate enhances the visibility and the impact of the work of the university as well as of individual researchers (SWAN, 2008).
Southampton ECS (Electronics & Computer Science) department was the first department or institution in the world to adopt a self-archiving mandate in 2001. An increasing number of institutions (more than 61) have already decided to impose the mandate. Most of these institutions are from U.K and other European countries, but Asia, Australia and North of America had also followed the same way. These institutions and their policies are listed in ROARMAP1.
The European Universities Association (EUA)2, mindful of the benefits of mandating OA has recommended self-archiving mandates for its 800 universities. The EUA had unanimously recommended that all European Universities should create IRs and should mandate that all research publications must be deposited in them immediately upon publication (and made OA as soon as possible thereafter) as already mandated by RCUK (Research Councils UK), ERC (European Research Council), NIH (National Institutes of Health) and as recommended by EURAB (European Research Advisory Board). In addition, the EUA recommends that these self-archiving mandates should also be extended to all research results arising from EU research project funding.
The study was carried out on article outputs of 4 mandated institutions that have imposed a self-archiving deposit in their IRs:
Southampton University (Electronics & Computer Science Department) in U.K (since January 2003),
Minho University in Portugal (December, 2004),
Queensland University of Technology in Australia (February 2004),
CERN (European Organization for Nuclear Research) in Switzerland (November, 2003).
The idea is to compare citations counts for mandated articles (M) to those for non-mandated articles (N). The study was conducted for articles published between 2002 and 2006 (more recent articles have not been considered because a potential citation advantage is hardly perceived for such too recent articles). The metadata for the articles were collected from our four institutional archives, as well as from the ISI database. Citation counts were extracted from Thompson ISI (in November, 2008).
The way to test the impact advantage of OA is not to compare the citation impact factors of OA and non-OA journals but to compare the citation counts of individual OA and non-OA articles appearing in the same (non-OA) journals (Harnad and Brody, 2004). For each mandated article Mi, we collected all corresponding articles Nj published in the same journal, volume and year as controls. Articles were published in 1984 journals, distributed as follows, from 2002 to 2006 (Table 1):
Table 1: Journals counts per year
OA Journals are excluded from our sample to avoid any effect due to the fact that an article is OA, not because it is self-archived by the authors, but simply because the journal is OA. Based on the Directory of Open Access Journals (DOAJ), which presently indexes 4025 journals, only 2.10% out of our sample journals are OA. These journals are excluded from our analysis.
In order to reduce our article sample to a reasonable processing size, we limited the number of journal/volume/year-matched articles to 10 articles Nj that were semantically close to Mi. This narrowing of content should also make the control articles more comparable than using the entire spectrum of the journal's content. (The semantic closeness is computed based on shared words in titles, omitting stop words). The total size of the article sample (6215 Mandated and 20982 corresponding controls3) from 2002 to 2006 was 27197.
The full-text OA status of the articles in our sample was verified using an automated webwide search-robot (Hajjem and al, 2005). The result was consolidated using another robot based on Google Scholar search. The following table (Table 2) shows the percentages of OA articles among ISI articles per institution and per year:
Figure 1 : OA Percentage for ISI articles per year
Table 2 : OA Percentage for ISI articles per year
(je pense que le graphique est suffisant. Veux-tu qu’on enlève le tableau? Ou on le mets en annexe?)
Although the compliance rate of self-archived articles for mandated institutions is not 100% as hoped, it seems to be acceptable since the mandate is relatively recent; the average OA rate for mandated institutions is between 56% and 62%, from 2002 to 2006. These rates are largely higher compared to Non-Mandated institutions (13% to 16% from 2002 to 2006). Although both mandated and non-mandated authors can archive their articles or not, this large difference between OA rates (M vs N) shows that there is no self-selection of best articles at least, for mandated ones. In fact, they simply archive their articles to respect the mandate.
To evaluate the OA citation advantage, 4 sets of articles should be compared:
- O M : Self-archived Mandated,
- Ø M : Non-archived Mandated,
- O N : Self-archived Non-mandated,
- Ø N : Non-archived Non-mandated,
The comparison between these sets of articles is done using citation average within each journal/year. Since the mandate applying date is slightly different for the 4 institutions (from 2002 to 2004), we conducted the comparison for all the institutions together as well as separately. The separate analysis shows more clearly the evolution since the effective date of the mandate. On the other hand, the global analysis allows consolidating data, enlarging the sample size and avoiding any possible effects coming from specific institutions.
The comparisons are done between the following couples of groups: O/Ø, OM/ON, ON/ØN, OM/ØM, OM/Ø, ON/Ø and OM/ØN, using the ratio average between the logarithmic citations counts for the first group over the second one. For instance, the following ratio aims to compare OA mandated articles with OA non-mandated ones by computing arithmetic mean of logarithmic ratios of citations averages OMj and ONj calculated for a journal j.
The logarithm is in fact used to normalize data and reduce any effect coming from articles having a relatively high citation count, compared to the whole sample. Moreover, the comparison is done per journal to avoid any effect coming from journal’s impact factor.