## Logistic regression by Impact Factor interval:
In order to compare articles belonging to comparable journals, we divided our sample into 4 quartile ranges by journal impact Factor (IF), each range covering 25% of the articles:
IF_1 : 0 ≤ IF < 0.633
IF_2 : 0.633 ≤ IF < 1.053
IF_3 : 1.035 ≤ IF < 1.782
IF_4 : 1.782 ≤ IF < 29.957
Only the top quartile contains journals with IFs from 1.782 to 29.957. As we are also interested in the variability within this quartile, we further subdivided it into two subgroups, each covering 12.5% of all the articles. Subdividing more minutely would generate would make the sample sizes too small to detect effects o interest. Finally, 5 ranges of IF are selected:
IF_1 : 0 ≤ IF < 0.633
IF_2 : 0.633 ≤ IF < 1.053
IF_3 : 1.035 ≤ IF < 1.782
IF_4 : 1.782 ≤ IF < 2.468
IF_5 : 2.468 ≤ IF ≤ 29.957
The same regression is done separately for each IF range by controlling all the variables (except IF). The following tables summarizes the values of **Exp(ß) **corresponding to the controlled variables for each IF range.
Our earlier remark also applies to these regressions: Exp(ß) values of variables have the same polarity and pattern whether or not we exclude self-citations from the citations count.
When articles are published in a low IF journal, article citation counts are positively correlated with Age, Ref_N, Auth_N, OA and M. The OA effect increases for higher citation count intervals. For the low article citation range, the Age*OA interaction is significant, but OA itself is not.
**Figure 6: The Exp(ß) values for logistic regressions (IF 1)**
For articles in journals with IFs between 0.633 and 1.053, the pattern is quite similar, except the Age*OA interaction is absent and OA itself (alongside Age, as separate variables) is significant.
*Figure 7: **The Exp(ß) values for logistic regressions (IF 2)*
For articles in journals with IFs between 1.053 and 1.782, the pattern is again quite similar. The USA and Review variables now also correlate with citation increase. In this IF range, some institutions (QUT, Southampton and CERN) have a small citation advantage. However, removing the articles from one of these institutions, does not change the pattern for the other variables.
**Figure 8: ****The Exp(ß) values for logistic regressions (IF 3)**
For journals with IFs between 1.782 and 2.468, longer articles (Page_N) have more citations. The OA citation advantage is only significant for the higher citation count ranges. Also, the number of co-authors (Auth_N) is less correlated with increased citations as the citation range gets higher. CERN has a citation advantage in this IF range. However, removing CERN articles does not change the pattern for the other variables.
**Figure 9: ****The Exp(ß) values for logistic regressions (IF 4)**
For journals with IFs between 2.468 and 29.957. The OA advantage is significant for the highest citation ranges. The increased citations for USA and Review articles are more significant.
*Figure 10: **The Exp(ß) values for logistic regressions (IF 5)*
Overall, OA is correlated with a significant citation advantage for all journal IF intervals as well as for the sample as a whole. This advantage is greater for the higher citation citations. Moreover, there is no significant effect of a specific institution compared to the rest institutions, hence there is no need to exclude any specific institution from our sample.
When regressions are done for separately for the different IF ranges, the Age*OA interaction disappears, but OA and Age (as separate variables) are significant.
## Discussion
This study confirms that the OA advantage is a statistically significant, independent positive increment, even when we control for just about every other variable one can think of (article age, journal impact factor, number of authors, number of pages, number of references cited, Review articles, USA author, Science/nonScience). All these other variables are of course correlated with citation counts, so the fact that OA continues to correlate with an independent positive increase in citation counts even when all the other correlates are partialled out is quite a strong outcome.
Moreover, the OA advantage is just as big when the deposit is mandated as it is when it is non-mandated. That makes it extremely unlikely that the OA advantage is all or mostly the result of an author self-selection bias. Indeed, articles from the four mandated institutions seem to have a further independent citation advantage of their own, but this is probably a temporary chance artifact of mandate compliance rates, which vary from about 60% to 90%. The effects are not because of institutional citation advantages, as institutions were also included among the independent predictors variables; moreover, the profile of results and their significance is not altered by removing CERN, the only one of the four institutions that might conceivably have biased the outcome because its papers were all in one field and tended to be of higher quality, hence citability.
Since, with the exception of CERN, articles, covering all the disciplines in the three other mandated institutional repositories are mostly not in fields that habitually self-archive their unrefereed preprints well before publication (as many physicists and astrophysicists do), nor in fields that already have effective OA for their published postprints (as in astronomy), it is unlikely that the OA advantage is all or mostly just an early access advantage either. We can't ascertain this for sure, however, because we don't have reliable deposit-date data, relative to publication date. In any case, an early-access advantage in a preprint-depositing field translates into a generic OA advantage in a non-preprint-depositing field in which postprints are accessible only to subscribers.
This study confirms that the OA advantage is greater for articles published in higher-impact journals, and it is also greater in the higher-citation ranges for individual papers within each journal-impact level.
The Seglen effect, that 80-90% of citations go to the top 10-20% of articles, has been also confirmed. In other words, OA will not make an uncitable paper more citable and many papers are not worth citing. But, wherever there is toll-based access-denial, OA will increase the usage and impact of the citable papers, probably in proportion with their importance and quality, hence citability.
**Compartilhe com seus amigos:** |