Just published : change and growth in open access journal publishing and charging trends 2011 – 2021

Morrison, H., Borges, L., Zhao, X., Kakou, T. L., & Shanbhoug, A. N. (2022). Change and growth in open access journal publishing and charging trends 2011–2021. Journal of the Association for Information Science and Technology, 1– 13. https://doi.org/10.1002/asi.24717

OA version is available through the University of Ottawa institutional repository here: https://ruor.uottawa.ca/handle/10393/44191

Housekeeping: effective Nov. 7, 2022, SKC posts are no longer available via Twitter.

Sustaining the Knowledge Commons: final report

This post concludes the 7-year Sustaining the Knowledge Commons (SKC) research program for which I gratefully acknowledge generous support from Canada’s Social Sciences and Humanities Research Council (SSHRC) through an Insight Development Grant (2014 – 2016), and Insight Grant (2016 – 2021). I also gratefully acknowledge the hard work, team spirit and initiative of the many members of the SKC team over the years – their names are listed on the About the Team page; bios reflect statuses the last time they participated in the project. Following are my key recommendations for funders (including libraries & policy-makers), takeways for future APC researchers, select portions of my final report to SSHRC, and my final thoughts and next directions.

Key recommendations for funders (including libraries & policy-makers):

  • Recommendation #1: Support small scholar-led publishers (e.g. journals and books published by or at universities and scholarly associations) to transition to open access because this sector can thrive with modest support (best economic choice) and is the sector in the best position to prioritize the values of academia over profit and achieve global equity in inclusion of scholars around the world in a global knowledge commons.
  • Recommendation #2: Support existing and emerging open access scholarly publishers reliant on open access article process charges (APCs) with caution and back-up built into policy. There are 2 main reasons for caution: there are already a large number of journals and publishers that are “no longer in DOAJ”, many of which are still publishing. While current APC publishers such as the Public Library of Science have earned top reputations for publishing quality scholarship, it is clear that the APC model has also opened a door to a for-profit sector with less than clear commitment to scholarly quality. The second reason for caution is evidence of price rises beyond inflation among commercial and professional not-for-profit APC based publishers. It is not clear that the economics of this model are sustainable. To put a back-up plan into policy, require that researchers deposit work in an open access repository. Meeting OA policy through open access publishing alone makes works available open access today with no guarantee for the future.
  • Recommendation #3: Look beyond traditional print-based formats such as journals and books. The open research SKC blog, featuring immediate release of the results of over 200 small research projects to inform decision-making in real time, and the OA APC dataverse, are illustrations of what we can do. Innovation should be a priority, not an afterthought.
  • Recommendation #4: Make global equity and inclusion a top priority in setting policy, including deciding which initiatives to support financially. The key question is: will this policy or initiative tend to facilitate a global knowledge commons that gives voice to all qualified researchers around the world, or will it further entrench existing interests?

Takeaways for future APC researchers:

  • The Sustaining the Knowledge Commons blog features a rich set of small research projects, many on individual APC-charging publishers, that are not available anywhere else. The blog will remain as is for some time and will be archived with the assistance of the University of Ottawa Library before it is decommissioned.
  • The most complete dataset in the OA APC dataverse is OA Main 2019. This is a unique contribution as journals once included (journal or publisher was once included in DOAJ) are retained from year to year. Data including APC amounts for several years derived from a number of sources is available for close to 20 thousand journals. This dataset is for serious researchers as it takes some time to read the documentation and understand the datapoints; misinterpretation would be easy given that the data is derived from multiple sources.
  • The dataset in the OA APC dataverse that includes journals for which we have data for the longest period of time is the 2011 – 2021 dataset. Most of the 2011 dataset in included in OA Main 2019, however in preparing analysis we found that some journals were missing as they had been removed from DOAJ prior to our first sampling (2014).
  • The published open data in the OA APC dataverse reflects a small portion of the data that we have collected and analyzed over the years. The reason for not publishing all of the data as open data is the complexity and extra work required to create publishable documentation. If you are looking for historical APC data for research purposes, don’t hesitate to ask what I (Heather Morrison) might have. No guarantees that what you need will match what I have.

Excerpts from the SSHRC Insight Grant final report

Summary: The purpose of the Sustaining the Knowledge Commons project was to conduct research to inform the process of transformation of the underlying economics of scholarly publishing from the demand (purchase / subscription) to the supply side (support for production) to achieve sustainable and globally equitable open access. The resource requirements for small scholar-led publishers project confirmed the modest financial needs of this sector, considered the best option to prioritize academic quality over profit. A longitudinal study of article processing charges (APCs) found that this model, working well in some sectors, nevertheless poses some challenges to academic quality as illustrated by a large number of APC-based journals and publishers in the category “no longer in the Directory of Open Access Journals”. The APC commercial and professional not-for-profit market is showing problematic signs of a tendency to increases prices beyond inflation, another reason to consider alternatives. One approach to analyzing open access policy and initiatives, based on Ostrom’s work Governing the Commons, was identified as useful to analyze policy and initiatives from the perspective of global equity (inclusion of all qualified scholars to contribute to our common knowledge). A key conclusion and recommendation is that the optimal way to achieve sustainable and equitable high quality academic publishing for traditional publication forms such as journals and books, prioritizing academic values over profit, is to transition economic support to prioritize small scholar-led publication, and in particularly the university sector. The open research approach employed in this project illustrates the benefits of going beyond traditional forms optimized for print. Major findings have been consistently quickly published on the course blog, supporting decision-makers engaged in the process of transition, and open data shared via the dataverse.

Outcomes: Sustaining the knowledge commons (SKC) has provided independent third-party evidence to support the growing non-commercial, scholar-led sector of scholarly publishing. SKC research demonstrates the desirability of supporting this sector from an economic point of view as overall less costly, more equitable, and in a good position to prioritize academic quality over profit. The internet has created an environment in which universities and scholarly societies can, with reasonable ease and modest support, create, sustain and globally disseminate their own publications. For example, in the Directory of Open Access Journals (DOAJ) as of December 2021, the country with the highest number of open access journals is Indonesia, with 1,896 titles, followed by, in order, the UK (1,885 titles), Brazil (1,636), the U.S. (969), Spain (882), Poland (785), and Iran (662). DOAJ is a far more diverse collection of titles, linguistically and culturally, than is found in typical library packages in countries like Canada. This evidence is useful to policy-makers such as research funders and services that support scholarly publishing such as libraries and library consortia.

Audiences: The primary audiences that can benefit from the research conducted by the Sustaining the Knowledge Commons project are the organizations ultimately responsible for funding the production and dissemination of scholarly works – universities and other research organizations, their libraries and library consortia, research funding agencies, scholarly societies, and individual academic researchers who support scholarly publication through their labour and research funding. Academics, students, and the general public benefit indirectly through open access to scholarly works; for example, when health care practitioners have access to the results of medical research, we all benefit from improved evidence-based practice.

Research products: https://sustainingknowledgecommons.org/ and https://dataverse.scholarsportal.info/dataverse/oaapc are, respectively, a research blog and open dataverse that demonstrate the open research approach employed in the Sustaining the Knowledge Commons project. These are the most comprehensive resources for outputs from this project. The blog features over 200 original research posts, of which most are brief original research pieces written by research assistants and associates under the supervision of the Principle Investigator. Only a small fraction of this output would be found in traditional research formats such as journal articles and books.

The dataverse: The https://dataverse.scholarsportal.info/dataverse/oaapc dataverse features open data and documentation from the longitudinal open access APC study that exemplifies the open data approach. The datasets are the most complete source of historical information for many journals and publishers that are no longer active, open access, and/or listed in the Directory of Open Access Journals, and will make it possible for future researchers to conduct robust longitudinal studies in future. OA Main 2019 final is the most complete dataset (close to 20,000 journals, up to 280 data points / journal, range 2011 – 2019). The most recent dataset is 2011_2021_APCs_open_data.

Final thoughts and new directions: finally, I would like to thank all of the readers of this blog and particularly those who took the time to comment, whether on the blog or on the listservs and other projects that I have participated in over the years, particularly the Global Open Access List, Scholcomm, the Radical Open Access list, and the Open Access Tracking Project, and everyone – all the authors, editors, publishers, research funders and activists – who have moved OA forward through its first generation. My perspective is that OA has now moved into a second generation that is quite different from the first and leadership is from the institutions and organizations that provide the support for scholarly publishing – universities & their libraries and library consortia, research funders and scholarly publishers, and is no longer reliant on individual activists like me. This is a good thing, an accomplishment in and of itself and one that bodes well for ongoing transition to full open access. On a personal note, while I remain available should my expertise (or datasets) be needed, it is my intention to shift my research to one or more areas more in need of attention, particularly in the area of information policy.

Cite as: Morrison, H. (2021). Sustaining the knowledge commons: final report. Sustaining the Knowledge Commons Dec. 22, 2021 https://sustainingknowledgecommons.org/2021/12/22/sustaining-the-knowledge-commons-final-report/

Housekeeping: SKC project status

The Sustaining the Knowledge Commons project was made possible through a SSHRC Insight Development Grant (2014 – 2016) and a SSHRC Insight Grant (2016 – 2021). SSHRC has graciously granted a one-year extension for project completion due to COVID. Between now and spring 2022, the work of SKC will focus on completing projects already started, blog wrap-up, and a final report and summary. Thanks to everyone who contributed to the SKC team over the years, read, shared and/or commented on posts.

Global University Rankings book launch Monday Sept. 20 4 p.m. EST

The book launch video is now available.

Please join us for a launch of the book “Global University Rankings”. The session will be recorded and made available for later viewing. I briefly introduced the book and my chapter in it in this post: https://sustainingknowledgecommons.org/2021/08/03/irrational-rationality-critique-of-metrics-based-evaluation-of-researchers-and-universities/

DOAJ publisher size analysis: Long-tail and APC charging trends

by : Xuan Zhao & Heather Morrison

Abstract

The 2021 DOAJ (The Directory of Open Access Journals) publisher group comprises a large number of very small publishers (typically with only one journal), and very few large publishers. Our purpose is to demonstrate publishers’ long-tail tendency and reveal its connection with the tendency of APC (article processing charge) or NO APC publications. As a result, we ascertain a “long-tail” of publisher size in all three groups and a “the smaller, the NO-APCer” tendency.

Introduction

Why does the relation between OA-APC (open access-) and publisher size need to be brought to the forefront? To answer this question, we need to understand a bit more about the context of OA publishing activities. As illustrated by Crow (2006), Edgar & Willinsky (2010) and Morrison (2012), from the mid-20th century onwards, the components of scholarly publishing began to shift: non-profit university and society publishers started to lose ground, while commercial publishers stepped into an era of rapid development. Unlike for-profit publishers with more robust survivability in commercialization, non-profit publishers, especially the smaller ones, are facing various internal and external challenges: “market consolidation”, “aggressive pricing”, “flat library budgets”, “migration to online distribution”, “structural constraints”, “undercapitalization”, etc., listed Crow [1]. We emphasize the “smaller ones” because “the vast majority of society and non–profit publishers run independent and very small journal publishing operations” [2]. Thus, the limitations mentioned above concern the plight of most society and non-profit publishers, especially for small publishers who prioritize academic quality or social needs, to the extent that they cannot balance profit and non-profit.

Several scholars proposed various recommended schemes. For example, Crow [3] suggests publishing cooperatives, which would allow small non-profit publishers “to remain independent while operating collectively to overcome both structural and strategic disadvantages”. Another solution is offered by Edgar & Willinsky (2010) and Morrison (2012): open access. Many studies have shown that open access could bring “growth
rates in new titles, participation rates from developing countries, and extremely low operating budgets”, and maximize “access to research and scholarship, as an alternative to traditional scholarly society and commercial publishing routes” [4].

Although the transition from offline to online open access publishing requires a human and material investment, it is increasingly an attractive option in the context of today’s widespread web presence. Until recent years, in addition to the large commercial publishers who dominate the major publishing markets, the global OA market is “marked by a very long-tail and extensive involvement by very small, often university or society publishers”, as Morrison pointed out in 2018 [5]. In Gold Open Access Journals 2011-2015. Crawford (2016) found that small journals are less likely to charge through abundant materials about the correlation between journal size, tendencies to charge/non-charge, and amounts of charges. In other words, the larger the journal, the higher the APC. We would like to corroborate these statements with our research of DOAJ 2021 metadata.

Definitions & Explanation

In this study, we consider DOAJ publishers who released 10 or fewer journals (at the time of being sampled) as relatively “small” publishers and those who released more than 100 journals as rather “large” publishers. The rest are grouped as “medium” publishers with 11-100 journals. These definitions only aim to better distinguish publishers of different sizes in our data scope.

We divide the DOAJ publishers into three primary groups: all publishers’ group, APC charging publishers’ group and NO APC publishers’ group. We use “mixed publishers” to describe publishers that appeared in both APC and NO APC lists. Our research is carried out from three aspects: observation of the three primary groups, observation of the “non-mixed” publishers’ group and observation of the “mixed” publishers’ group.

The data in this project was initially downloaded from DOAJ (Directory of Open Access Journals) metadata (15,691 journals, 4,292 APC journals, 11,399 NO APC journals), then cleaned up by our SKC (Sustaining the Knowledge Commons) team. The clean-up work revolved around correcting the wrong position of the data and creating a modified publisher name column for this exercise. During the work, we realized that creating a consistent publisher name list was challenging. As reported in Some Limitations of DOAJ Metadata for Research Purposes (Zhao, Borges & Morrison, 2021), there were a large number of variations and inconsistencies of publisher names, such as duplicates with differences in punctuation and/or characters (e.g. “Abant İzzet Baysal Üniversitesi” vs. “Abant İzzet Baysal University”), extra spaces at the beginning or the end of names (e.g. “Abant İzzet Baysal University” vs. “Abant İzzet Baysal University⎕”), invalid URLs, etc. More details can be found in the open dataset “DOAJ_metadata_2021_01_05_with_SKC_clean_up” (Zhao, Borges & Morrison, 2021).  

This research is only for the journals and publishers listed in DOAJ as of Jan.5, 2021. There are other fully open access journals and publishers not listed in DOAJ, or previously listed but then de-listed for some unknown reasons. We understand that it is challenging to create a precise list of publishers because of the complexity of publishers’ backgrounds (Morrison, 2019). In this study, we concentrate more on the trends rather than precise details. What’s more, we focus solely on whether the journals or publishers charge APC, not how much is charged.

Observation of the three primary groups

First, we separate the DOAJ publishers into three groups: all publishers’ group (Table 1 & Chart 1), APC’s group (Table 2 & Chart 2) and NO APC’s group (Table 3 & Chart 3).

Table 1 – ALL DOAJ publishers’ group (2021)

(Total DOAJ journals’ number: 15,691)

Chart 1 – ALL DOAJ publishers’ group (2021)

Table 2 – DOAJ APC group (2021)

(Total count of DOAJ APC journals: 4,292)

Chart 2 – DOAJ APC group (2021)

Table 3 – DOAJ NO APC group (2021)

(Total count of DOAJ NO APC journals: 11,399)

Chart 3 – DOAJ NO APC group (2021)

Individually, each group shows an evident “long-tail”. In the ALL publishers’ group (see Table 1 & Chart 1), among the 6,804 publishers identified in DOAJ, 1,349 published APC journals, and 5,807 published NO APC journals (the numbers do not add up to 6,804 because some of them are “mixed” publishers). 77% of this group are small publishers with only one journal publication. The small publishers still occupy the main part in the other two groups (see Table 2,3 & Chart 2,3), 76% for the APC group and 78% for the NO APC group.

In the second place, a comparison between APC and NO APC groups can be made. Although small publishers occupy a similar share in each of the three groups, we can notice a big difference in their numbers. As illustrated in Table 4 & Chart 4 below:

Table 4 – DOAJ ALL publishers – APC group vs NO APC group (2021)

Chart 4 – DOAJ ALL publishers – APC group vs NO APC group (2021)

In the range of “publishers with 1 journal”, the number of NO APC publishers (4,568) is about 4 times that of the APC publishers (1,034); in the range of 2-10, NO APC publishers are about 3 times more than the other; in the field of 11-25, the number is about 4 times more. However, for the publishers with 51+ journals, the number of APC publishers is more or equal to NO APC publishers. In the largest publishers’ range (200+ journals), there are 4 charging publishers and only 1 non-charging publisher.

Thus, without considering the “mixed” publishers’ situation, we assume that even both APC and NO APC groups showed a “long-tail” (76% of 1-journal-publishers in the APC group, and 78% of which in the NO APC group), small DOAJ publishers seemed more likely to publish non-charging journals; large DOAJ publishers seemed more likely to publish charging journals. We boldly name this tendency as “the smaller, the NO-APCer’ trend.

Besides, it is essential to notice some exceptions. Some large publishers release more NO APC journals (details in Table 7): Wolters Kluwer Medknow Publications has 46 APC journals and 161 NO APC journals; SpringerOpen has 96 APC journals and 114 NO APC journals; Sciendo has 44 APC journals and 257 NO APC journals. We will discuss more in the following sections.

Observation of the “non-mixed” publishers’ group

We identify 352 duplicated publishers by comparing the APC / NO APC group, which means 352 “mixed” publishers. For making the research results more rigorous, we exclude the “mixed” group and study the rest of the publishers. We find that the tendencies of “long-tail” and “the smaller, the NO-APCer” are still evident in the “non-mixed” group. Please see Table 5 & Chart 5 below:

Table 5 – “Non-mixed” publishers – APC group vs NO APC group (2021)

Chart 5 – “Non-mixed” publishers – APC group vs NO APC group (2021)

Among 6,452 identified “non-mixed” publishers, 82% are small publishers with only 1 journal. Comparing with the 77% 1-journal-publishers in the ALL publishers’ group (Table 1 & Chart 1), 82% is a similar “long-tail”.

“The smaller, the NO-APCer” trend is also evident. If we compare the percentages of APC and NO APC groups in this chart, 69% of 1-journal-publishers are non-charging, which is way more than 13% charging 1-journal-publishers.

Observation of the “mixed” publishers’ group

We study this group separately because, from the research above, we know that almost all the large DOAJ publishers (100+ journals) are “mixed” (except for Hindawi Limited with 229 journals which is a pure APC publisher based on our data scale). We are curious about whether the “long-tail” and “the smaller, the NO-APCer” trend also existed in this group.

The first discovery is an explicit “long-tail” because 75% of the “mixed” publishers are small. Please see Chart 6 below:

Chart 6 – DOAJ “mixed” publishers’ long-tail analysis (2021)

Then we see a recognizable “the smaller, the NO-APCer” trend. After a comparison between the count of APC journals and the count of NO APC journals published by the same “mixed” publisher, we identify three relations: “number of APC journals = number of NO APC journals”, “number of APC journals > number of NO APC journals” and “number of APC journals < number of NO APC journals”.

We consider the inequivalence (“>” and “< “) between the counts of APC journals and the counts of NO APC journals as “active” tendency indicators and the equivalence relation (“=”) as “inactive” elements. Thus, to highlight the tendency, we exclude all the “=” and only concentrate on”>” and “< “. By this step, Table 6 below has been created:

Table 6 – DOAJ “mixed” publishers’ trends (2021)

For those who release 3 journals, other than the 101-200 group who publishes more APC journals than NO APC journals, and the 200+ group with an “inactive” “=”, the other publishers with less journal volume are biased toward NO APC publication. At this point, we confirm “the smaller, the NO-APCer” trend in the “mixed” publishers’ group.

For further discussion, if we investigate the large “mixed” publishers’ group (100+ journals), as shown in Table – 7 below:

Table 7 – Investigation of “Mixed” Publishers with 100+ journals (2021)

In this group, 6 of them publish more APC journals than NO APC journals, while 3 of them publish more NO APC journals. The difference in counts of journals of those 6 publishers could be significant. For example, Wiley (133 charging journals > 8 non-charging journals), Taylor & Francis Group (143 charging journals > 21 non-charging journals), SAGE Publishing (151 charging journals > 23 non-charging journals), etc. Because of these enormous differences of counts, even though there are 166 “mixed” publishers publish more non-charging journals, which is way more than the other 92 who release more charging journals, the count of non-charging journals in total (2,608) is still very close to that of charging journals in total (2,583).

Discussion

APC trends can also be analyzed in terms of other influencing factors: publisher type, subject of journal, country of publication, etc. Researchers can perform more diverse analyses based on more layers of data, just as Crawford (2016) did. In in-progress research of SKC, Morrison and the research team (Morrison & al., 2021) investigated APC by publisher type (government, institute, non-profit, independent, society or institution, university press, commercial, society, university) according to DOAJ data in 2019. As a result, universities published the most significant number of no-fee journals (7,857, 75% of the 10,463 no-fee journals in total), and the society publishers came second (1,414). Commercial publishers stood out by having much more charging journals than no-fee journals (1,575 vs 275). Combined with our study, it can be speculated that most small DOAJ publishers are university or society publishers with a no-fee tendency. This discovery corroborates Morrison’s thoughts in 2018 (Morrison, 2018b). Besides, the tendency to charge fees of commercial publishers coincides with our study of large publishers’ group.

In addition, we must admit that if the amount of APC is included in the scope of the study, the results may change somewhat. Because some publishers charge modestly and some ask for very high prices (especially for-profit high prices), and it is unfair to mix them without careful investigation (Crawford, 2011). For publishers in the charging group, their listing does not mean that their fees are necessarily unreasonable. Therefore, it is necessary to emphasize that our study concentrates more on the rough trends of charging/no charging based on publisher size as a division.

For a more in-depth discussion, we add the perspective of longitudinal analysis. We focus our discussion on two contrasting groups, large and small publishers. From our study, we know that almost all the large DOAJ publishers (100+ journals) are “mixed”, and most of them are commercial publishers, including the four largest traditional commercial publishers (Elsevier, SpringerNature, which includes SpringerOpen and BMC, Taylor & Francis, and Wiley). Based on the research of the SKC (Morrison, 2018a), Elsevier, as the world’s largest scholarly publisher, are “mixed” by having a large number of non-charging journals in 2017. But despite attempts at strategies, Elsevier lost many non-charging journals produced in partnership with societies and universities in 2018. Now, as we can see in our study, they may have fewer non-charging journals.

From this point, we speculate that some publishers can conduct relatively more non-charging publications is probably because that they got support from universities and governments. For example, the large “mixed” publisher Sciendo has much more non-charging journals than charging journals based on our data. According to Pashaei & Morrison (2019), Sciendo added more than 300 OA journals in 2019, most of which were “published through collaboration with different universities and academic societies and institutions in Europe”. A recent study of OA diamond journals [6] also confirmed that the economy of these journals “largely depends on volunteers, universities and government” (Bosman, Frantsvåg, Kramer, Langlais & Proudman, 2021).

Even large publishers with relatively financial solid resources may losing journals due to financial problems or other reasons, so that we can imagine the more difficult situation for small publishers, especially for those with only 1 journal. Perhaps, the small non-profit publishers with limited financial resources should explore the possibilities of more no charging OA models, instead of going with the flow and just raising prices in the for-profit competition. It is important to maintain operations while guarding the freedom and fairness of academic publishing. Based on the current situation, we need more patience to establish a healthy competitive publishing environment.

Not only do small publishers need to figure out how to grow in the long run and attract more authors and readers, but authors can also reach out to small publishers and discover their value, and same for readers. Here comes another purpose of our study: to call attention to small publishers and encourage interaction between authors, readers, funding sources and small publishers. There are often misunderstandings between these groups that are harmful to the OA movement, as discussed by Peter Suber in an interview (Hulagabali & Suber, 2019). For example, “most OA journals charge APCs” and “most OA journals are low in quality”, which are widespread but not true. Our study helps to dispel these misunderstandings by demonstrating that many journals are entirely free and that many exist for academic purposes: just because they are small does not mean they are not of high quality.

In addition to the issues above, small publishers also face other challenges. For example, in terms of longevity of data preservation, small publishers are more likely to lose long-term access (Crawford, 2011, p. 32). On this point, DOAJ published an article in 2020 announcing that they would collaborate with the CLOCKSS Archive, Internet Archive, Keepers Registry/ISSN International Centre and Public Knowledge Project (PKP) to improve the preservation of small OA journals.

Conclusion

From the three observations above, we can conclude that no matter the “mixed” publishers are included or excluded in our research’s scale, the “long-tail” and “the smaller, the NO-APCer” trends are always evident. Small non-profit publishers, with such a large number, need to look for various breakthroughs if they want to survive and grow.

Notes

  1. Crow, 2006, “The market context for society publishers”.
  2. Crow, 2006, “The market context for society publishers”, with reference to the Ulrich’s Periodicals Directory in 2005, http://www.ulrichsweb.com/ulrichsweb/
  3. Crow, 2006, “Abstract”.
  4. Edgar & Willinsky, 2010, p. 1.
  5. Morrison, 2018b, “Abstract”.
  6. As indicated in The OA Diamond Journals Study. Part 1: Findings. Jeroen Bosman, Jan Erik Frantsvåg, Bianca Kramer, Pierre-Carl Langlais, Vanessa Proudman. (2021, March 9). http://doi.org/10.5281/zenodo.4558704. p. 8. “OA diamond journals” are “journals that publish without charging authors and readers, in contrast to APC Gold OA or subscription journals”.

References

Bosman, J., Frantsvåg, J., Kramer, B., Langlais, P.-C., & Proudman, V. (2021). OA Diamond Journals Study. Part 1: Findings. Zenodo. http://doi.org/10.5281/zenodo.4558704

Crawford, W. (2011). Open access: what you need to know now. American Library Association.

Crawford, W. (2020). Gold Open Access Journals 2011 – 2015. https://waltcrawford.name/goaj1115.pdf

Crow, R. (2006). Publishing cooperatives: An alternative for not-for-profit publishers. First Monday, 11(9). https://firstmonday.org/ojs/index.php/fm/article/view/1396/1314

Directory of Open Access Journals. https://doaj.org/

Directory of Open Access Journals. DOAJ to lead a collaboration to improve the preservation of open access journals. DOAJ News Service. https://blog.doaj.org/2020/11/05/doaj-to-lead-a-collaboration-to-improve-the-preservation-of-open-access-journals/

Edgar, B. D., & Willinsky, J. (2010). A survey of the scholarly journals using open journal systems. Scholarly and Research Communicationhttps://src-online.ca/index.php/src/article/view/24

Hulagabali, S. C., & Suber, P. (2019), Peter Suber: The largest obstacles to open access are unfamiliarity and misunderstanding of open access itself. Open Interview. https://openinterview.org/2019/06/29/peter-suber-the-largest-obstacles-to-open-access-are-unfamiliarity-and-misunderstanding-of-open-access-itself/

Morrison, H. (2012). Freedom for scholarship in the internet age [Simon Fraser University]. http://summit.sfu.ca/item/12537 

Morrison, H. (2018a). Elsevier in 2018: Decrease in number of fully OA journals. Sustaining the Knowledge Commons. Retrieved from https://sustainingknowledgecommons.org/2018/12/13/elsevier-in-2018-decrease-in-number-of-fully-oa-journals/

Morrison, H. (2018b). Global OA APCs (APC) 2010–2017: Major Trends (L. Chan & P. Mounier, Trans.). ELPUB 2018. https://hal.archives-ouvertes.fr/hal-01816699

Morrison, H. (2019). Publisher: N/A, or the complexity of understanding “the publisher” (method notes). Sustaining the Knowledge Commons. 2019. https://sustainingknowledgecommons.org/2019/08/22/publisher-n-a-or-the-complexity-of-understanding-the-publisher-method-notes/

Morrison, H. & al. (2021). A comparison of open access journals using article processing charges in 2011 and 2021. (In progress)

Pashaei, H., & Morrison, H. (2019). De Gruyter and Sciendo Open Access journals expanding in 2019. Sustaining the Knowledge Commons. Retrieved from https://sustainingknowledgecommons.org/2019/10/16/de-gruyter-and-sciendo-open-access-journals-expanding-in-2019/

Zhao, X., Borges, L., & Morrison, H. (2021). Some limitations of DOAJ metadata for research purposes. Sustaining the Knowledge Commonshttps://sustainingknowledgecommons.org/2021/02/10/some-limitations-of-doaj-metadata-for-research-purposes/

Open data references:

Directory of Open Access Journals; Zhao, X., Borges, L., & Morrison, H. (2021). “DOAJ_metadata_2021_01_05_with_SKC_clean_up”, Scholars Portal Dataverse, V1. https://doi.org/10.5683/SP2/G5LEXG

Cite as: Zhao, X. & Morrison, H. (2021). DOAJ publisher size analysis: Long-tail and APC charging trends. Sustaining the Knowledge Commons. https://sustainingknowledgecommons.org/2021/09/09/doaj-publisher-size-analysis-long-tail-and-apc-charging-trends

Irrational rationality: critique of metrics-based evaluation of researchers and universities

According to one of the most consulted of the global university rankings services, the QS World University Rankings 2022, the University of Toronto is the top ranked university in Canada. It shouldn’t take more than a brief pause to reflect on this statement to see the fiction in what is presented as objective empirical information (pseudoscience). In the real world, it is mid-June, 2021. The empirical “facts” on which QS is based are still in progress, in a year of pandemic with considerable uncertainty. It is not possible to complete data on 2021 until the year is over. Meanwhile, QS is already reporting stats for 2022; perhaps they are psychic?

Scratching slightly at the surface, anyone with even a little bit of familiarity with the universities in Canada is probably aware that the University of Toronto is currently under a rare Censure against the University of Toronto due to a “serious breach of the principles of academic freedom” in a hiring decision. Censure is a “rarely invoked sanction in which academic staff in Canada and internationally are asked to not accept appointments, speaking engagements or distinctions or honours at the University of Toronto, until satisfactory changes are made”. I don’t know the details of the QS algorithms, but I think it’s fair to speculate that neither support for academic freedom or a university’s ability to attract top faculty for appointments, speeches, distinctions or honours is factored in, or if factored in, weighted appropriately.

Digging just a little bit deeper, someone with a modicum of understanding of the university system in Canada and Ontario in particular would know that the University of Toronto is one of Ontario’s 23 public universities, all of which have programs approved and regularly reviewed for quality by the same government, and funded under the same formulae and provide the same economic support for students. Degrees at a particular level are considered equivalent locally and courses are often transferable between institutions. When not under censure, the University of Toronto is indeed a high quality university; so is the University of Ottawa, where I work, Carleton (the other Ottawa-based university), and all the other Ontario universities. Specific programs frequently undergo additional accreditation. My department offers a Master’s of Information Studies program that is accredited by the American Library Association (ALA). Both the Ontario government and ALA require actual data in their QA / accreditation process. This includes evidence of strategic planning, but not guesswork about future output.

If QS is this far off base in their assessment of universities in the largest province of a G7 country (the epitome of the Global North), how accurate is QS and other global university rankings in the Global South? According to Stack (2021) and the authors of the newly released book Global University Rankings and the Politics of Knowledge http://hdl.handle.net/2429/78483 global university rankings such as QS and THE and the push for the Global South to develop globally competitive “world class universities” are more about reproducing colonial relations, marketizing higher education and commercializing research than assuring high quality education. The attention paid to such rankings distracts universities and even countries from what matters locally. As Chou points out, the focus on rankings leads scholars in Taiwan to publish in English rather than Mandarin although Mandarin is the local language. A focus on publishing in international, English language journals creates a disincentive to conduct research of local importance almost everywhere.

My chapter in this work focuses on the intersection of critique on metrics-based evaluation of research and how this feeds into the university rankings system. The first part of the chapter Dysfunction in knowledge creation and moving beyond provides a brief history and context of bibliometrics and the development of traditional and new metrics-based approaches and major critique and advocacy efforts to change practice (the San Francisco Declaration on Research Assessment (DORA) and the Leiden Manifesto). The unique contribution of this chapter is critique of the underlying belief behind both traditional and alternative metrics-based approaches to assessing research and researchers, that is, the assumption that impact is good and an indicator of quality research and therefore it makes sense to measure impact, with the only questions being whether particular technical measures of impact are accurate or not. For example, if impact is necessarily good, then the retracted study by Wakefield et al. that falsely correlated vaccination with autism is good research by any metric – many academic citations both before and after publication, citations in popular and social media and arguably a factor in the real-world impact of the anti-vaccination movement and the subsequent return of preventable illnesses like measles and a factor in the challenge of fighting COVID through vaccination. An alternative approach is suggested, using the traditional University of Ottawa’s collective agreement with APUO (union of full-time professors) as a means of evaluation that considers many different types of publications and considers quantity of publication in a way that gives evaluators the flexibility to take into account the kind of research and research output.

References

Morrison, H. (2021). What counts in research? Dysfunction in knowledge creation and moving beyond. http://ruor.uottawa.ca/handle/10393/39088 In: Stack, M. (2021). Global University Rankings and the Politics of Knowledge, pp. 109 – 130. http://hdl.handle.net/2429/78483

Stack, M. (2021). Global University Rankings and the Politics of Knowledge. http://hdl.handle.net/2429/78483

Open access article processing charges 2011 – 2021

by: Heather Morrison, Luan Borges, Xuan Zhao, Tanoh Laurent Kakou & Amit Nataraj Shanbhoug

Abstract

This study examines trends in open access article processing charges (APCs) from 2011 – 2021, building on a 2011 study by Solomon & Björk (2012). Two methods are employed, a modified replica and a status update of the 2011 journals. Data is drawn from multiple sources and datasets are available as open data (Morrison et al, 2021). Most journals do not charge APCs; this has not changed. The global average per-journal APC increased slightly, from 906 USD to 958 USD, while the per-article average increased from 904 USD to 1,626 USD, indicating that authors choose to publish in more expensive journals. Publisher size, type, impact metrics and subject affect charging tendencies, average APC and pricing trends. About half the journals from the 2011 sample are no longer listed in DOAJ in 2021, due to ceased publication or publisher de-listing. Conclusions include a caution about the potential of the APC model to increase costs beyond inflation, and a suggestion that support for the university sector, responsible for the majority of journals, nearly half the articles, with a tendency not to charge and very low average APCs, may be the most promising approach to achieve economically sustainable no-fee OA journal publishing.

A preprint of the full article is available here: https://ruor.uottawa.ca/handle/10393/42327

The two base datasets and their documentation are available as open data:

Morrison, Heather et al., 2021, “2011 – 2021 OA APCs”, https://doi.org/10.5683/SP2/84PNSG, Scholars Portal Dataverse, V1

Citation: cite the original URL rather than this blogpost URL (article); if citing data, use the citation above.

Morrison, H., Borges, L., Zhao, X., Kakou, T.L., Shanbhoug, A.M. (2021). Open access article processing charges 2020 – 2021. Preprint. Sustaining the Knowledge Commons. https://ruor.uottawa.ca/handle/10393/42327

Improving the DOAJ metadata – Why and how

by: Xuan Zhao & Heather Morrison

Abstract

The Directory of Open Access Journals (DOAJ, http://doaj.org/) is an essential world-wide open access service (16,134 journals listed, as of March 29, 2021), which promotes quality, peer-reviewed open access journals. The journals included can get higher and broader visibility. To make the most of this service, journal editors need to pay attention to the accuracy of their entries in the DOAJ metadata (journal-title, publisher information, location information, subject, language, URLs, etc.). This post aims to explain the benefits for journals of improving the quality of metadata and what journal editors can do. 

Our discussion is mainly based on recent research of the Sustaining the Knowledge Commons team and cites some other researchers’ findings. 

For journals, what are the benefits of improving the DOAJ metadata?

As detailed on the DOAJ website (DOAJ, https://doaj.org/apply/why-index/), there are five benefits for journals indexed in DOAJ, and accordingly, five reasons to improve the metadata: 

  1. “Reputation and prominence”

“DOAJ is the most important community-driven, open access service in the world and has a reputation for advocating best practices and standards in open access. By indexing your journal in DOAJ, its reputation and prominence will be enhanced.”

We assume that journals with accurate and precise entries can give a serious and active impression, helping them maintain the reputation. 

  1. “Standards and best practice”

“DOAJ’s basic criteria for inclusion have become the accepted way of measuring an open access journal’s adherence to standards in scholarly publishing. We can help you adopt a range of ethical and quality standards, making your journals more attractive publishing channels. DOAJ is committed to combatting questionable publishers and questionable publishing practices, helping to protect researchers from becoming trapped by unethical journals.”

As open access journals are listed in a quality standards system like DOAJ, it is important to make sure that their information is correct to distinguish them from the questionable journals undoubtedly. 

  1. “Funding and compliance”

“Open access publication funds often require that authors who want funding must publish in journals that are included in DOAJ. Indexing in DOAJ makes your journals compliant with many initiatives and programmes around the world, for example Plan S in Europe or Capes/Qualis in Brazil.”

With correct entries in metadata, the DOAJ journals can be more easily discovered by foundations, related programmes and organizations.

  1. “Discoverability and visibility”

“DOAJ metadata is free for anyone to collect and use, which means it is easily incorporated into search engines and discovery services. It is then propagated across the internet. If you provide us with article metadata for your journal, this will be supplied to all the major aggregators and the many research organisations and university library portals who use our widgets, RSS feeds, API and other services. Indexing your journal in DOAJ is likely to increase traffic to your website and give greater exposure to your published content. Levels of traffic to a journal website typically increase threefold after inclusion in DOAJ. Your journal’s visibility in search engines, such as Google, will improve.”

Indexing journals in DOAJ means they are more easily discovered and cited by other researchers. Correcting metadata will help raise the chances that people working in the same area will find the relevant research they need.

  1. “International coverage”

“Our database includes more open access journals from a diverse list of countries than any of the other major indexing services. We have a global editorial team via a network of Managing Editors, Ambassadors and volunteers, so we will do our best to offer local support in your language. We promise you that information about your journal will be seen around the world.”

The DOAJ journals are aimed at readers from all over the world and may be seen by people who are not proficient in the journals’ language. In this case, journal editors need to ensure the correctness of data entry so that readers can read with confidence. 

What’s more, a higher quality database will be more valuable for researchers and promote the entire OA ecosystem. Especially for services like university libraries, which tend to keep up with the latest content and take advantage of metadata corrections. 

In brief, keeping the entries of DOAJ metadata correct reinforces the advantages for journals mentioned above and benefits the users of DOAJ. 

As journal editors, what can we do?

As demonstrated in a study of the SKC (Zhao, Borges & Morrison, 2021), “as of January 5, 2021, only 30% of DOAJ journals have a ‘last update’ date within the previous year (2020)”, which means only 30% of DOAJ journals fully or partially updated their information in DOAJ system. To make the best use of DOAJ, journal editors should regularly check their entries to ensure that their data is correct and up to date. For example, if journal URLs are not kept up to date, an incorrect URL means, at best, that the journal cannot be found. Crawford (2016), in a study of DOAJ journals, found journals flagged that were as malware (or as containing malware) by Mal- warebytes, Windows Defender, McAfee Site Advisor or Office 2013. 

Most of the visible inconsistencies in the metadata are input errors or location errors (listed below). Most of the input errors are “small differences in punctuation and/or characters, extra spaces at the beginning and/or at the end”, as reported by SKC (Zhao, Borges & Morrison, 2021). Combined with the findings of Crawford (2016), we list the data to be modified by categories as follows:

  • Input error or location error in:

wrong column, journal title, special character, keywords, copyright information URL, plagiarism information URL, URL for journal’s instructions for authors, other submission fees information URL, preservation services, preservation service: national library, preservation information URL, deposit policy directory, persistent article identifiers, URL for journal’s open access statement, etc. 

  • Publisher name duplicates:

Extra space or short of space, minor detail (e.g. non-English character in one but not the other), minor difference in punctuations and/or characters (e.g. “Abant İzzet Baysal Üniversitesi” vs. “Abant İzzet Baysal University”), abbreviation in one but not the other (e.g. “Asociación Interuniversitaria de Investigación Pedagógía” vs. “Asociación Interuniversitaria de Investigacion Pedagogica (AIDIPE)”), etc.

  • “APC-charging journals that don’t clearly state the amount charged” (Crawford, 2016)

Sometimes it is hard to indicate “who is the publisher”. We list some situations below:

  • When there are branch publishers under one publisher, and all of them are recorded in DOAJ, especially when their journals’ websites do not have any clear indications ;
  • When a publisher has more than one active names (perhaps due to different sponsors of one publisher, or the nature of commercial publishers), but their journals’ websites do not have any clear indications ;
  • When journals changed their websites but didn’t renew the URLs in the DOAJ database;
  • Invalid URLs;
  • Unmatched publisher name/journal name and URLs.

DOAJ also provides article-level search and is working to encourage more journals to provide article-level metadata. It makes both the journal-level and article-level metadata available for anyone to download. (DOAJ, https://doaj.org/docs/public-data-dump/) Thus, it would be better if journal editors can ensure the correctness of the articles’ information. 

References

Crawford, W. (2016). Gold Open Access Journals 2011 – 2015https://waltcrawford.name/goaj1115.pdf

Directory of Open Access Journals. Retrieved March 29, 2021, from http://doaj.org/

Public data dump. Directory of Open Access Journals. Retrieved March 29, 2021, from https://doaj.org/docs/public-data-dump/

Why index your journal in DOAJ? Directory of Open Access Journals. Retrieved March 29, 2021, from https://doaj.org/apply/why-index/

Zhao, X., Borges, L., & Morrison, H. (2021). Some limitations of DOAJ metadata for research purposes. Sustaining the Knowledge Commonshttps://sustainingknowledgecommons.org/2021/02/10/some-limitations-of-doaj-metadata-for-research-purposes/

Some limitations of DOAJ metadata for research purposes

by: Xuan Zhao, Luan Borges, & Heather Morrison

Abstract

The Directory of Open Access Journals http://doaj.org is an excellent service that fulfills many important functions, in particular facilitating access to a vetted collection of over 15,000 freely available peer-reviewed journals. The DOAJ search services and metadata download are very useful for researchers as well. The purpose of this post is to alert researchers to some of the limitations of the DOAJ metadata that researchers need to take into account to avoid drawing erroneous conclusions. First, when downloading DOAJ metadata, it is necessary to open the .csv file in Unicode in order to retain non-English characters. We open in Open Office for this reason, then save as an excel file. The nature of the metadata means that some data is inserted in the wrong column; clean-up, as discussed below, is necessary before data analysis. When journal editors or others working on their behalf enter metadata into DOAJ, research is not the primary purpose of this exercise; for this reason, in-depth assessment and corrections may be necessary before analysis. Below, we present publisher size analysis as an example of what researchers may encounter. Finally, because the main purpose of DOAJ is connecting readers with content, the metadata of interest to a particular research project may not be up to date. As demonstrated below, as of Jan. 5, 2021, only 30% of DOAJ journals have a “last update” date within the previous year (2020). We do not know whether the “last update” date reflects a full or partial metadata review. We illustrate the potential impact on research results with the example of the SKC longitudinal APC study. Of the 4,292 DOAJ journals that responded “yes” to the APC question, only 30% have a last update date of 2020 or 2021. Even with this 30% of journals, we have no way of knowing whether the APC status and/or amount per se was updated, or only other unrelated metadata. This means that if we compare 2019 prices obtained from publisher websites in 2019 with 2021 DOAJ APC metadata, we will almost certainly get incorrect results, for example falsely assuming that matching APC amounts means no change in the prices. DOAJ provides rich and useful metadata for the researcher and the research question “is this journal listed in DOAJ?” is of value in and of itself. For this reason, we intend to continue using DOAJ metadata in addition to data derived from other sources, particularly data derived directly from publisher websites. See below to a link to an open data version of the DOAJ metadata reflecting the corrections explained in this post.

Details

Correcting for displaced observations

As previously mentioned, the first step to confidently use the DOAJ metadata for analysis and research is identifying and correcting data inserted in the wrong column, herein also called displaced observations. 

Below we can see an example of a displaced observation from the DOAJ metadata. Column BB has no assigned variable while containing some observations, apparently displaced one column to the right. 

Table 1 – An example of misplaced data from 2021 DOAJ metadata

Users may follow different steps to correct for displaced data. Here we explain in more detail how we have identified these displacements and corrected them.  

Before proceeding with any analysis, it is important to get familiarized with the DOAJ metadata first. We recommend users to read the DOAJ Guide to applying, available online, because the metadata reflects responses to questions asked in the application process. The DOAJ metadata, as of 5 Jan. 2021, possesses 53 variables ranging from Journal Title to Country to Most recent article added. It may be helpful to start correcting observations from variables with easily identifiable responses, such as « Country » or « Country of Publisher », or variables that allow only two types of answers (i.e Yes or No), such as Author holds copyright without restrictions and APC. It is recommended to create a pivot table to identify displaced observations, repeating this process until no observations are identified in a wrong column. 

When cleaning-up the DOAJ metadata, users will notice that in some cases only one observation was displaced; in other cases, an entire row was displaced beginning on a specific variable. In the example highlighted in yellow below, all observations beginning at variable Publisher were displaced one column to the right. 

Table 2 – Line 36 illustrates an example of an entire row with displaced observations

Data entry inconsistencies

When correcting for displaced observations, we have also identified some inconsistencies in the way observations are registered in the DOAJ metadata. The table below lists the main visible inconsistencies found for some variables. In the majority of instances, the inconsistencies will not impact DOAJ users looking up information for a particular journal. However, it is important to take into account these inconsistencies before proceeding to any automated statistical analysis. For example, DOAJ metadata as is can be used to identify the number of journals with persistent article identifiers, but automated counting of DOI v. ARK or other approaches would require some advance data manipulation.

VariableExample
Alternative titleSome journals alternative titles may be registered as a number. Some examples are  “2300-6633” and “0”. 
KeywordsSome observations have some special characters as follows: 
6.         rheology, tribology, hydrodynamics, thermodynamics, mechanics of structures, mechatronics. 
           water cycles, water environment, water treatment and reuse, water resource, water quality, hydrology
 •          natural sciences, •      environmental sciences, •      social sciences, agricultural sciences, veterinary medicine, medical sciences
Copyright information URLSome URLs lack a letter « h » at the beginning or the end. The example below illustrates this small error. There should be an “h” at the beginning and an  “l” at the end of the link. ttp://www.emeraldgrouppublishing.com/services/publishing/jiuc/authors.htm
Plagiarism information URLSome URLs lack a letter « h » at the beginning or the end. The example below illustrates this small error. There should be an « h » at the beginning and an  « l » at the end of the link.
ttp://www.emeraldgrouppublishing.com/services/publishing/jiuc/authors.htm
URL for journal’s instructions for authorsSome URLs lack a letter « h » at the beginning or the end. The example below illustrates this small error. There should be an « h » at the beginning of the URL
ttps://revistas.unasp.edu.br/LifestyleJournal/about/submissions
Other submission fees information URLSome URLs have extra letters. The example below, for instance, has a letter « i » at the beginning of the URL
ihttps://journals.univie.ac.at/index.php/voebm/m/index
Some URLs lack a letter « h » at the beginning or the end. The example below illustrates this small error. There should be an « h » at the beginning of the URL
ttp://psr.ui.ac.id/index.php/journal/about/submissions#authorGuidelines ttps://www.karger.com/Journal/Guidelines/261897#sec62
Preservation ServicesPreservation services can be registered as a name or a website
Preservation Service: national libraryPreservation services – national library can be registered as a name or a website
Preservation information URLSome URLs lack a letter « h » at the beginning or the end. The example below, for instance, has a small error. There should be an « h » at the beginning of the URL
tps://periodicos.uff.br/revistagenero/about/editorialPolicies#focusAndScope ttp://ejournal.stkip-pgri-sumbar.ac.id/index.php/economica
Deposit policy directoryDeposit policy directory can be registered as a name or a website
Persistent article identifiersPersistent article identifiers can be registered as an acronym (UDC, DOI, ARK), but also as a website, such as dc.identifier.uri (DSpaceUnipr) or NBN http://www.depositolegale.it/national-bibliography-number/
Another example is the occurrences UDC and UDC (Universal decimal Classification), which are equivalents but were registered differently
URL for journal’s Open Access statementSome URLs lack a letter « h » at the beginning or at the end, or they have an extra h at the beginning of the URL. The example below has an extra letter « h » at the beginning of the URL. 
hhttp://www.revistas.usp.br/gestaodeprojetos/about
Table 3 – Visible inconsistencies identified in the DOAJ metadata

Publisher’s names duplicates investigation and clean-up

The purpose of this project is preparation to develop a rough picture of publisher size to compare with Solomon & Björk’s findings (2012). In order to better perform publisher size analysis, we have specifically investigated the publisher duplicates and corrected most of the obvious errors, such as small differences in punctuation and/or characters, extra spaces at the beginning and/or at the end, and minor differences in entering the publisher name when it is the same, etc. (Please see examples in Table 4 – Investigative Strategies – Publisher Names Duplicates).

The process of clean-up was divided into three stages. Firstly, we created a pivot table for the publisher column to identify the entries in rows which were slightly different but weren’t gathered. Secondly, when potential duplicates were found, we conducted an investigation to confirm duplicates and/or to decide which name to keep (in priority order: use the name with the most journal entries; correct name with obvious typo; use the first name listed). Please see the investigative strategies below:

Table 4 – Investigative Strategies – Publisher Names Duplicates

Thirdly, after identifying inconsistencies in publisher names, we created a table (please see Table 5 – Corrections GatheringPublisher Names Duplicates) to register all the corrections on the variable Publisher. About 500 inconsistencies were corrected. Thus, the number of publishers in the pivot table has decreased from 7218 entries (data resource: pivot table based on DOAJ metadata) to 6804 entries (data resource: pivot table based on the cleaned-up version of database).

Table 5 – Corrections GatheringPublisher Names Duplicates

As illustrated in the two tables above, there were different types of data inconsistencies. In order to respect metadata to the greatest extent, we acted prudently when making decisions. In some minor variation cases, we tried to click on the URLs to check publisher websites and to collect convincing evidence. However, we met some intricate complex challenges.

One of the challenges was the language. Due to the massiveness and the wide-range of publishers (124 countries, 80 languages, DOAJ, 7 Feb. 2021) [https://doaj.org/], we were unable to identify all of the sources of information. Besides, when there were invalid URLs or unmatched information, it was difficult to seek out any precision. What’s more, among 7218 entries of publisher names, some of the potential duplicates weren’t gathered because of their different beginning words. For example, “Editora da Universidade Estadual de Maringá (Eduem)” vs. “Eduem – Editora da Universidade Estadual de Maringá” and “Academica Brâncuşi” vs. “Editura Academica Brâncuşi”. They were usually far apart and hard to be detected. More details can be found in the Table 6 below:

Different beginning words (examples)“Academica Brâncuşi” vs. “Editura Academica Brâncuşi”;
“Alexandru Ioan Cuza University of Iaşi” vs. “Editura Universităţii ‘Alexandru Ioan Cuza’ Iaşi”;
“Editora da Universidade Estadual de Maringá (Eduem)” vs. “Eduem – Editora da Universidade Estadual de Maringá”
Table 6 – (1)

Unmatched publisher names (examples):

Original publisher namesPossible correct namesURLs
Canadian Society for the Study of Education.The Canadian Association for Curriculum Studieshttps://jcacs.journals.yorku.ca/index.php/jcacs/index
Badan Penelitian dan Pengembangan KesehatanURL directs to a new web link:
https://ejournal2.litbang.kemkes.go.id/index.php/jki/index
whose publisher name is:
Pusat Penelitian dan Pengembangan Biomedis dan Teknologi Dasar Kesehatan
http://ejournal.litbang.kemkes.go.id/index.php/jki
Shaheed Beheshti University of Medical Sciences and Health ServicesKowsarmedicalhttp://journals.sbmu.ac.ir/jme
Table 6 – (2)

Invalid URLs (examples):

Original publisher namesOriginal URLs (invalid)
Alborz University of Medical Sciences
(URLs wrongly directs to a website whose contents are meaningless; when we searched the journal title, we were directed to this website : https://enterpathog.abzums.ac.ir/)
http://enterpathog.com/?page=home ; https://jehe.abzums.ac.ir/index.php?slc_lang=en&sid=1
Instituto Nacional de Salud (INS)http://revistas.ins.gov.py/index.php/rspp/
Instituto Superior de Ciências de Educação do Huambohttp://revista.isced-hbo.ed.ao/rop/index.php/ROP/index
Table 6 – (3)

Given the barriers and challenges mentioned above, we can draw a conclusion to the limitations of publisher names clean-up project. Precision is not possible in this project because the question “who is the publisher” is complex. Instead of making any definitive claims about publisher size, we are primarily interested in whether the long tail effect (a few big publishers, a few more middle-sized, most very small) reported by Solomon & Björk (2012) can still be observed in DOAJ in 2021.

DOAJ metadata update analysis

The following analysis was conducted to determine whether DOAJ metadata on article processing charges (APCs) – charging status and amount – would be sufficient for SKC’s longitudinal study on APC trends over time. The answer is clearly no. The metadata for the vast majority of journals in DOAJ (overall and APC charging) has not been updated for more than a year, and it is unknown whether the most recent update would have included an update to APC or other metadata. We will continue to use DOAJ metadata as it is rich and the question “is this journal listed in DOAJ” is of value in and of itself, however for price comparisons we cannot rely on this data as it would likely result in erroneous conclusions.

DOAJ journals by year of last update.

This chart illustrates the percentage of DOAJ journals last update by year. Detailed figures are in the table below. Note that just under half the journals were last updated 2 or more years ago (2018 or earlier).

DOAJ last update as of Jan. 5, 2021
Year# journals last updated % journals last updated
20152942%
20161,4699%
20172,86418%
20182,95119%
20193,41222%
20204,66230%
2021390%
Total15,691100%
Table 7

DOAJ APC charging journals by year of last update

The chart above illustrates the percentage of journals that answered “yes” to the DOAJ question about charging APCs by year of last update. The table below provides the detailed figures. Note that only 30% of DOAJ journals that charge APCs were updated in the past year (2020 or 2021). It is also unknown whether in these cases the last update was a thorough review of the metadata, or might have been an update of non-APC data.

DOAJ last update APC journals only Jan. 5, 2021
Year of last udpate# of journals last updated% journals last updated
2015471%
20162386%
201749912%
201893022%
20191,28630%
20201,27630%
2021160%
Total4,292100%
Table 8

A version of the Jan. 5, 2021 DOAJ metadata file reflecting the corrections explained below is available as open data here:

Directory of Open Access Journals; Zhao, Xuan; Borges, Luan; Morrison, Heather, 2021, “DOAJ_metadata_2021_01_05_with_SKC_clean_up”, https://doi.org/10.5683/SP2/G5LEXG, Scholars Portal Dataverse, V1

References

The Directory of Open Access Journals (DOAJ) online: https://doaj.org/

Solomon, D. J., & Björk, B. (2012). A study of open access journals using article processing charges. Journal of the American Society for Information Science and Technology63(8), 1485–1495. https://doi.org/10.1002/asi.22673

Cite as: Zhao, X., Borges, L., & Morrison, H. (2021). Some limitations of DOAJ metadata for research purposes. Sustaining the Knowledge Commons. https://sustainingknowledgecommons.org/2021/02/10/some-limitations-of-doaj-metadata-for-research-purposes/

Preservation of Digital Blog-Posts

A Literature Review, January 2021

The goal of this literature review was to gain an understanding of the current status of research on the topic of digital blog preservation. After conducting a series of searching within the database LISTA (Library, Information Science, and Technology Abstracts), one can determine that there are little to no recent developments in technology or research specifically for the access/preservation of digital blog posts.

Unsurprisingly, much of the scholarly conversation about blog/microblog preservation took place between 2002 and 2010. 

Thoughts on Blog Preservation

Despite the varying opinions that blogs are either easier or more difficult to preserve than other digital communications, scholars agree that blogs and microblogs have unique qualities that deserve scholarly discussion.  

According to Patsy Baudoin, many blogging websites utilize software that automatically preserves the sequencing of posts (2008). This innate quality of the software supports the archiving principles of “original order” and “provenance”. However intelligent the blogging software appears to be, blogs and other user-generated content are especially vulnerable to link rot (Banks, 2010).

Blogs can become complex to preserve because they may contain various file formats, media, or have several owners (Baudoin, 2008). To add to this sentiment, Grimard (2005) states that the variety of formats adds to the “opaqueness” of digital records (opaqueness referring to the unnatural structure of electronic information that is only computer-readable).

To maintain the integrity of the blog during the preservation process, the digital archivist would have to consider preserving the additional external links within the original blog post. Furthermore, copyright can be an issue in certain blog preservation circumstances, as there have been several cases brought to the US Supreme Court (Chen, 2005).

Preservation Technology

Open-source technologic advancements in blog preservation have been disappointing at best. According to Caroline Young, there have been several programs for blog preservation that have essentially failed soon after conception (2013).

Some examples are PANDORA by the National Library of Australia, and ArchivePress by the University of London’s Computer Centre and British Library Digital Preservation department. Young mentions a developing blog preservation software called BlogForever, which was still in development in 2013. Now, it seems to be available for use and claims to be a new system to harvest, preserve, manage and reuse blog content.

Young (2013), Banks (2010), Rosenthal (2016), and Chen (2010) all highlight the impact made by the introduction of the Internet Archive’s Wayback Machine. The Wayback Machine has simproved the landscape of digital preservation of grey literature like bog posts; however, it is not without its challenges. Much like other archiving software, it has difficulty with images and audio files. 

Solutions to the Preservation Problem

Though an older article, Grimard (2005) offers some solutions to digital preservation that are still relevant. One important recommendation is to standardize the format of the information. The recommendation is echoed by Young (2013). Both authors emphasize the importance of converting files to the most usable format. Since file formats are simply a set of conventions that software developers can change and alter, they may become obsolete. Young describes the universal XML format as being hierarchical and organized logically. 

LOCKSS is a blog preservation software mentioned in both Leroy (2018) and Rosenthal (2016). It is an open-source software designed with libraries in mind. It also claims to preserve animations, data sets, images, audio, and text content.

Conclusion

The scholarly conversation on the preservation and conservation of blog content has slowed in the past decade. This could be because the options currently available are adequate for the need of blog preservation.

Blogs and microblogs are comprised of various formats that can contribute to the challenges in digital preservation. According to research in the early 2010s, images, animations, and audio files, which blogs usually contain, are difficult to preserve with the Wayback Machine. This may have improved in the more recent years.

There are also preservation software options like the LOCKSS and BlogForever that seems to be more targeted toward archiving blog content than the Wayback Machine is.

Reference List

Chen, X. (2010). Blog Archiving Issues: A Look at Blogs on Major Events and Popular Blogs. Internet Reference Services Quarterly15(1), 21–33. https://doi.org/10.1080/10875300903529571

Baudoin, P. (2008). On Preserving Blogs for Future Generations. The Serials Librarian53(4), 59–61. https://doi.org/10.1300/J123v53n04_04

Farace, D., & Schöpfel, J. (Eds.). (2010). Chapter 14. Blog Posts and Tweets: The Next Frontier for Grey Literature. In Grey Literature in Library and Information Studies (pp. 217–226). K. G. Saur. https://doi.org/10.1515/9783598441493.2.217

Grimard, J. (2005). Managing the Long-term Preservation of Electronic Archives or Preserving the Medium and the Message. Archivaria, 153–167.

Leroy, A. (2018). LOCKSS Distributed Digital Preservation Networks. Université libre de Bruxelles. Belgium. ISSN, 9. https://nusl.techlib.cz/en/conference/conference-proceedings

Rosenthal, D. S. H. (2017). The medium-term prospects for long-term storage systems. Library Hi Tech35(1), 11–31. http://dx.doi.org.proxy.bib.uottawa.ca/10.1108/LHT-11-2016-0128

Young, C. (2013). Oh My Blawg! Who Will Save the Legal Blogs? Law Library Journal105(4), 493–503.

Cite as: Pelland, K. (2021). Preservation of digital blog-posts. Sustaining the Knowledge Commons. https://sustainingknowledgecommons.org/2021/01/29/preservation-of-digital-blog-posts/