Logistic regression by Impact Factor interval:
In order to compare articles belonging to comparable journals, we divided our sample into 4 quartile ranges by journal impact Factor (IF), each range covering 25% of the articles:
IF_1 : 0 ≤ IF < 0.633
IF_2 : 0.633 ≤ IF < 1.053
IF_3 : 1.035 ≤ IF < 1.782
IF_4 : 1.782 ≤ IF < 29.957
Only the top quartile contains journals with IFs from 1.782 to 29.957. As we are also interested in the variability within this quartile, we further subdivided it into two subgroups, each covering 12.5% of all the articles. Subdividing more minutely would generate would make the sample sizes too small to detect effects o interest. Finally, 5 ranges of IF are selected:
IF_1 : 0 ≤ IF < 0.633
IF_2 : 0.633 ≤ IF < 1.053
IF_3 : 1.035 ≤ IF < 1.782
IF_4 : 1.782 ≤ IF < 2.468
IF_5 : 2.468 ≤ IF ≤ 29.957
The same regression is done separately for each IF range by controlling all the variables (except IF). The following tables summarizes the values of Exp(ß) corresponding to the controlled variables for each IF range.
Our earlier remark also applies to these regressions: Exp(ß) values of variables have the same polarity and pattern whether or not we exclude selfcitations from the citations count.
When articles are published in a low IF journal, article citation counts are positively correlated with Age, Ref_N, Auth_N, OA and M. The OA effect increases for higher citation count intervals. For the low article citation range, the Age*OA interaction is significant, but OA itself is not.
Figure 6: The Exp(ß) values for logistic regressions (IF 1)
For articles in journals with IFs between 0.633 and 1.053, the pattern is quite similar, except the Age*OA interaction is absent and OA itself (alongside Age, as separate variables) is significant.
Figure 7: The Exp(ß) values for logistic regressions (IF 2)
For articles in journals with IFs between 1.053 and 1.782, the pattern is again quite similar. The USA and Review variables now also correlate with citation increase. In this IF range, some institutions (QUT, Southampton and CERN) have a small citation advantage. However, removing the articles from one of these institutions, does not change the pattern for the other variables.
Figure 8: The Exp(ß) values for logistic regressions (IF 3)
For journals with IFs between 1.782 and 2.468, longer articles (Page_N) have more citations. The OA citation advantage is only significant for the higher citation count ranges. Also, the number of coauthors (Auth_N) is less correlated with increased citations as the citation range gets higher. CERN has a citation advantage in this IF range. However, removing CERN articles does not change the pattern for the other variables.
Figure 9: The Exp(ß) values for logistic regressions (IF 4)
For journals with IFs between 2.468 and 29.957. The OA advantage is significant for the highest citation ranges. The increased citations for USA and Review articles are more significant.
Figure 10: The Exp(ß) values for logistic regressions (IF 5)
Overall, OA is correlated with a significant citation advantage for all journal IF intervals as well as for the sample as a whole. This advantage is greater for the higher citation citations. Moreover, there is no significant effect of a specific institution compared to the rest institutions, hence there is no need to exclude any specific institution from our sample.
When regressions are done for separately for the different IF ranges, the Age*OA interaction disappears, but OA and Age (as separate variables) are significant.
Discussion
This study confirms that the OA advantage is a statistically significant, independent positive increment, even when we control for just about every other variable one can think of (article age, journal impact factor, number of authors, number of pages, number of references cited, Review articles, USA author, Science/nonScience). All these other variables are of course correlated with citation counts, so the fact that OA continues to correlate with an independent positive increase in citation counts even when all the other correlates are partialled out is quite a strong outcome.
Moreover, the OA advantage is just as big when the deposit is mandated as it is when it is nonmandated. That makes it extremely unlikely that the OA advantage is all or mostly the result of an author selfselection bias. Indeed, articles from the four mandated institutions seem to have a further independent citation advantage of their own, but this is probably a temporary chance artifact of mandate compliance rates, which vary from about 60% to 90%. The effects are not because of institutional citation advantages, as institutions were also included among the independent predictors variables; moreover, the profile of results and their significance is not altered by removing CERN, the only one of the four institutions that might conceivably have biased the outcome because its papers were all in one field and tended to be of higher quality, hence citability.
Since, with the exception of CERN, articles, covering all the disciplines in the three other mandated institutional repositories are mostly not in fields that habitually selfarchive their unrefereed preprints well before publication (as many physicists and astrophysicists do), nor in fields that already have effective OA for their published postprints (as in astronomy), it is unlikely that the OA advantage is all or mostly just an early access advantage either. We can't ascertain this for sure, however, because we don't have reliable depositdate data, relative to publication date. In any case, an earlyaccess advantage in a preprintdepositing field translates into a generic OA advantage in a nonpreprintdepositing field in which postprints are accessible only to subscribers.
This study confirms that the OA advantage is greater for articles published in higherimpact journals, and it is also greater in the highercitation ranges for individual papers within each journalimpact level.
The Seglen effect, that 8090% of citations go to the top 1020% of articles, has been also confirmed. In other words, OA will not make an uncitable paper more citable and many papers are not worth citing. But, wherever there is tollbased accessdenial, OA will increase the usage and impact of the citable papers, probably in proportion with their importance and quality, hence citability.
