The 2021 DOAJ (The Directory of Open Access Journals) publisher group comprises a large number of very small publishers (typically with only one journal), and very few large publishers. Our purpose is to demonstrate publishers’ long-tail tendency and reveal its connection with the tendency of APC (article processing charge) or NO APC publications. As a result, we ascertain a “long-tail” of publisher size in all three groups and a “the smaller, the NO-APCer” tendency.
Why does the relation between OA-APC (open access-) and publisher size need to be brought to the forefront? To answer this question, we need to understand a bit more about the context of OA publishing activities. As illustrated by Crow (2006), Edgar & Willinsky (2010) and Morrison (2012), from the mid-20th century onwards, the components of scholarly publishing began to shift: non-profit university and society publishers started to lose ground, while commercial publishers stepped into an era of rapid development. Unlike for-profit publishers with more robust survivability in commercialization, non-profit publishers, especially the smaller ones, are facing various internal and external challenges: “market consolidation”, “aggressive pricing”, “flat library budgets”, “migration to online distribution”, “structural constraints”, “undercapitalization”, etc., listed Crow . We emphasize the “smaller ones” because “the vast majority of society and non–profit publishers run independent and very small journal publishing operations” . Thus, the limitations mentioned above concern the plight of most society and non-profit publishers, especially for small publishers who prioritize academic quality or social needs, to the extent that they cannot balance profit and non-profit.
Several scholars proposed various recommended schemes. For example, Crow  suggests publishing cooperatives, which would allow small non-profit publishers “to remain independent while operating collectively to overcome both structural and strategic disadvantages”. Another solution is offered by Edgar & Willinsky (2010) and Morrison (2012): open access. Many studies have shown that open access could bring “growth rates in new titles, participation rates from developing countries, and extremely low operating budgets”, and maximize “access to research and scholarship, as an alternative to traditional scholarly society and commercial publishing routes” .
Although the transition from offline to online open access publishing requires a human and material investment, it is increasingly an attractive option in the context of today’s widespread web presence. Until recent years, in addition to the large commercial publishers who dominate the major publishing markets, the global OA market is “marked by a very long-tail and extensive involvement by very small, often university or society publishers”, as Morrison pointed out in 2018 . In Gold Open Access Journals 2011-2015. Crawford (2016) found that small journals are less likely to charge through abundant materials about the correlation between journal size, tendencies to charge/non-charge, and amounts of charges. In other words, the larger the journal, the higher the APC. We would like to corroborate these statements with our research of DOAJ 2021 metadata.
Definitions & Explanation
In this study, we consider DOAJ publishers who released 10 or fewer journals (at the time of being sampled) as relatively “small” publishers and those who released more than 100 journals as rather “large” publishers. The rest are grouped as “medium” publishers with 11-100 journals. These definitions only aim to better distinguish publishers of different sizes in our data scope.
We divide the DOAJ publishers into three primary groups: all publishers’ group, APC charging publishers’ group and NO APC publishers’ group. We use “mixed publishers” to describe publishers that appeared in both APC and NO APC lists. Our research is carried out from three aspects: observation of the three primary groups, observation of the “non-mixed” publishers’ group and observation of the “mixed” publishers’ group.
The data in this project was initially downloaded from DOAJ (Directory of Open Access Journals) metadata (15,691 journals, 4,292 APC journals, 11,399 NO APC journals), then cleaned up by our SKC (Sustaining the Knowledge Commons) team. The clean-up work revolved around correcting the wrong position of the data and creating a modified publisher name column for this exercise. During the work, we realized that creating a consistent publisher name list was challenging. As reported in Some Limitations of DOAJ Metadata for Research Purposes (Zhao, Borges & Morrison, 2021), there were a large number of variations and inconsistencies of publisher names, such as duplicates with differences in punctuation and/or characters (e.g. “Abant İzzet Baysal Üniversitesi” vs. “Abant İzzet Baysal University”), extra spaces at the beginning or the end of names (e.g. “Abant İzzet Baysal University” vs. “Abant İzzet Baysal University⎕”), invalid URLs, etc. More details can be found in the open dataset “DOAJ_metadata_2021_01_05_with_SKC_clean_up” (Zhao, Borges & Morrison, 2021).
This research is only for the journals and publishers listed in DOAJ as of Jan.5, 2021. There are other fully open access journals and publishers not listed in DOAJ, or previously listed but then de-listed for some unknown reasons. We understand that it is challenging to create a precise list of publishers because of the complexity of publishers’ backgrounds (Morrison, 2019). In this study, we concentrate more on the trends rather than precise details. What’s more, we focus solely on whether the journals or publishers charge APC, not how much is charged.
Observation of the three primary groups
First, we separate the DOAJ publishers into three groups: all publishers’ group (Table 1 & Chart 1), APC’s group (Table 2 & Chart 2) and NO APC’s group (Table 3 & Chart 3).
Table 1 – ALL DOAJ publishers’ group (2021)
(Total DOAJ journals’ number: 15,691)
Chart 1 – ALL DOAJ publishers’ group (2021)
Table 2 – DOAJ APC group (2021)
(Total count of DOAJ APC journals: 4,292)
Chart 2 – DOAJ APC group (2021)
Table 3 – DOAJ NO APC group (2021)
(Total count of DOAJ NO APC journals: 11,399)
Chart 3 – DOAJ NO APC group (2021)
Individually, each group shows an evident “long-tail”. In the ALL publishers’ group (see Table 1 & Chart 1), among the 6,804 publishers identified in DOAJ, 1,349 published APC journals, and 5,807 published NO APC journals (the numbers do not add up to 6,804 because some of them are “mixed” publishers). 77% of this group are small publishers with only one journal publication. The small publishers still occupy the main part in the other two groups (see Table 2,3 & Chart 2,3), 76% for the APC group and 78% for the NO APC group.
In the second place, a comparison between APC and NO APC groups can be made. Although small publishers occupy a similar share in each of the three groups, we can notice a big difference in their numbers. As illustrated in Table 4 & Chart 4 below:
Table 4 – DOAJ ALL publishers – APC group vs NO APC group (2021)
Chart 4 – DOAJ ALL publishers – APC group vs NO APC group (2021)
In the range of “publishers with 1 journal”, the number of NO APC publishers (4,568) is about 4 times that of the APC publishers (1,034); in the range of 2-10, NO APC publishers are about 3 times more than the other; in the field of 11-25, the number is about 4 times more. However, for the publishers with 51+ journals, the number of APC publishers is more or equal to NO APC publishers. In the largest publishers’ range (200+ journals), there are 4 charging publishers and only 1 non-charging publisher.
Thus, without considering the “mixed” publishers’ situation, we assume that even both APC and NO APC groups showed a “long-tail” (76% of 1-journal-publishers in the APC group, and 78% of which in the NO APC group), small DOAJ publishers seemed more likely to publish non-charging journals; large DOAJ publishers seemed more likely to publish charging journals. We boldly name this tendency as “the smaller, the NO-APCer’ trend.
Besides, it is essential to notice some exceptions. Some large publishers release more NO APC journals (details in Table 7): Wolters Kluwer Medknow Publications has 46 APC journals and 161 NO APC journals; SpringerOpen has 96 APC journals and 114 NO APC journals; Sciendo has 44 APC journals and 257 NO APC journals. We will discuss more in the following sections.
Observation of the “non-mixed” publishers’ group
We identify 352 duplicated publishers by comparing the APC / NO APC group, which means 352 “mixed” publishers. For making the research results more rigorous, we exclude the “mixed” group and study the rest of the publishers. We find that the tendencies of “long-tail” and “the smaller, the NO-APCer” are still evident in the “non-mixed” group. Please see Table 5 & Chart 5 below:
Table 5 – “Non-mixed” publishers – APC group vs NO APC group (2021)
Chart 5 – “Non-mixed” publishers – APC group vs NO APC group (2021)
Among 6,452 identified “non-mixed” publishers, 82% are small publishers with only 1 journal. Comparing with the 77% 1-journal-publishers in the ALL publishers’ group (Table 1 & Chart 1), 82% is a similar “long-tail”.
“The smaller, the NO-APCer” trend is also evident. If we compare the percentages of APC and NO APC groups in this chart, 69% of 1-journal-publishers are non-charging, which is way more than 13% charging 1-journal-publishers.
Observation of the “mixed” publishers’ group
We study this group separately because, from the research above, we know that almost all the large DOAJ publishers (100+ journals) are “mixed” (except for Hindawi Limited with 229 journals which is a pure APC publisher based on our data scale). We are curious about whether the “long-tail” and “the smaller, the NO-APCer” trend also existed in this group.
The first discovery is an explicit “long-tail” because 75% of the “mixed” publishers are small. Please see Chart 6 below:
Then we see a recognizable “the smaller, the NO-APCer” trend. After a comparison between the count of APC journals and the count of NO APC journals published by the same “mixed” publisher, we identify three relations: “number of APC journals = number of NO APC journals”, “number of APC journals > number of NO APC journals” and “number of APC journals < number of NO APC journals”.
We consider the inequivalence (“>” and “< “) between the counts of APC journals and the counts of NO APC journals as “active” tendency indicators and the equivalence relation (“=”) as “inactive” elements. Thus, to highlight the tendency, we exclude all the “=” and only concentrate on”>” and “< “. By this step, Table 6 below has been created:
Table 6 – DOAJ “mixed” publishers’ trends (2021)
For those who release 3 journals, other than the 101-200 group who publishes more APC journals than NO APC journals, and the 200+ group with an “inactive” “=”, the other publishers with less journal volume are biased toward NO APC publication. At this point, we confirm “the smaller, the NO-APCer” trend in the “mixed” publishers’ group.
For further discussion, if we investigate the large “mixed” publishers’ group (100+ journals), as shown in Table – 7 below:
Table 7 – Investigation of “Mixed” Publishers with 100+ journals (2021)
In this group, 6 of them publish more APC journals than NO APC journals, while 3 of them publish more NO APC journals. The difference in counts of journals of those 6 publishers could be significant. For example, Wiley (133 charging journals > 8 non-charging journals), Taylor & Francis Group (143 charging journals > 21 non-charging journals), SAGE Publishing (151 charging journals > 23 non-charging journals), etc. Because of these enormous differences of counts, even though there are 166 “mixed” publishers publish more non-charging journals, which is way more than the other 92 who release more charging journals, the count of non-charging journals in total (2,608) is still very close to that of charging journals in total (2,583).
APC trends can also be analyzed in terms of other influencing factors: publisher type, subject of journal, country of publication, etc. Researchers can perform more diverse analyses based on more layers of data, just as Crawford (2016) did. In in-progress research of SKC, Morrison and the research team (Morrison & al., 2021) investigated APC by publisher type (government, institute, non-profit, independent, society or institution, university press, commercial, society, university) according to DOAJ data in 2019. As a result, universities published the most significant number of no-fee journals (7,857, 75% of the 10,463 no-fee journals in total), and the society publishers came second (1,414). Commercial publishers stood out by having much more charging journals than no-fee journals (1,575 vs 275). Combined with our study, it can be speculated that most small DOAJ publishers are university or society publishers with a no-fee tendency. This discovery corroborates Morrison’s thoughts in 2018 (Morrison, 2018b). Besides, the tendency to charge fees of commercial publishers coincides with our study of large publishers’ group.
In addition, we must admit that if the amount of APC is included in the scope of the study, the results may change somewhat. Because some publishers charge modestly and some ask for very high prices (especially for-profit high prices), and it is unfair to mix them without careful investigation (Crawford, 2011). For publishers in the charging group, their listing does not mean that their fees are necessarily unreasonable. Therefore, it is necessary to emphasize that our study concentrates more on the rough trends of charging/no charging based on publisher size as a division.
For a more in-depth discussion, we add the perspective of longitudinal analysis. We focus our discussion on two contrasting groups, large and small publishers. From our study, we know that almost all the large DOAJ publishers (100+ journals) are “mixed”, and most of them are commercial publishers, including the four largest traditional commercial publishers (Elsevier, SpringerNature, which includes SpringerOpen and BMC, Taylor & Francis, and Wiley). Based on the research of the SKC (Morrison, 2018a), Elsevier, as the world’s largest scholarly publisher, are “mixed” by having a large number of non-charging journals in 2017. But despite attempts at strategies, Elsevier lost many non-charging journals produced in partnership with societies and universities in 2018. Now, as we can see in our study, they may have fewer non-charging journals.
From this point, we speculate that some publishers can conduct relatively more non-charging publications is probably because that they got support from universities and governments. For example, the large “mixed” publisher Sciendo has much more non-charging journals than charging journals based on our data. According to Pashaei & Morrison (2019), Sciendo added more than 300 OA journals in 2019, most of which were “published through collaboration with different universities and academic societies and institutions in Europe”. A recent study of OA diamond journals  also confirmed that the economy of these journals “largely depends on volunteers, universities and government” (Bosman, Frantsvåg, Kramer, Langlais & Proudman, 2021).
Even large publishers with relatively financial solid resources may losing journals due to financial problems or other reasons, so that we can imagine the more difficult situation for small publishers, especially for those with only 1 journal. Perhaps, the small non-profit publishers with limited financial resources should explore the possibilities of more no charging OA models, instead of going with the flow and just raising prices in the for-profit competition. It is important to maintain operations while guarding the freedom and fairness of academic publishing. Based on the current situation, we need more patience to establish a healthy competitive publishing environment.
Not only do small publishers need to figure out how to grow in the long run and attract more authors and readers, but authors can also reach out to small publishers and discover their value, and same for readers. Here comes another purpose of our study: to call attention to small publishers and encourage interaction between authors, readers, funding sources and small publishers. There are often misunderstandings between these groups that are harmful to the OA movement, as discussed by Peter Suber in an interview (Hulagabali & Suber, 2019). For example, “most OA journals charge APCs” and “most OA journals are low in quality”, which are widespread but not true. Our study helps to dispel these misunderstandings by demonstrating that many journals are entirely free and that many exist for academic purposes: just because they are small does not mean they are not of high quality.
In addition to the issues above, small publishers also face other challenges. For example, in terms of longevity of data preservation, small publishers are more likely to lose long-term access (Crawford, 2011, p. 32). On this point, DOAJ published an article in 2020 announcing that they would collaborate with the CLOCKSS Archive, Internet Archive, Keepers Registry/ISSN International Centre and Public Knowledge Project (PKP) to improve the preservation of small OA journals.
From the three observations above, we can conclude that no matter the “mixed” publishers are included or excluded in our research’s scale, the “long-tail” and “the smaller, the NO-APCer” trends are always evident. Small non-profit publishers, with such a large number, need to look for various breakthroughs if they want to survive and grow.
Crow, 2006, “The market context for society publishers”.
As indicated in The OA Diamond Journals Study. Part 1: Findings. Jeroen Bosman, Jan Erik Frantsvåg, Bianca Kramer, Pierre-Carl Langlais, Vanessa Proudman. (2021, March 9). http://doi.org/10.5281/zenodo.4558704. p. 8. “OA diamond journals” are “journals that publish without charging authors and readers, in contrast to APC Gold OA or subscription journals”.
According to one of the most consulted of the global university rankings services, the QS World University Rankings 2022, the University of Toronto is the top ranked university in Canada. It shouldn’t take more than a brief pause to reflect on this statement to see the fiction in what is presented as objective empirical information (pseudoscience). In the real world, it is mid-June, 2021. The empirical “facts” on which QS is based are still in progress, in a year of pandemic with considerable uncertainty. It is not possible to complete data on 2021 until the year is over. Meanwhile, QS is already reporting stats for 2022; perhaps they are psychic?
Scratching slightly at the surface, anyone with even a little bit of familiarity with the universities in Canada is probably aware that the University of Toronto is currently under a rare Censure against the University of Toronto due to a “serious breach of the principles of academic freedom” in a hiring decision. Censure is a “rarely invoked sanction in which academic staff in Canada and internationally are asked to not accept appointments, speaking engagements or distinctions or honours at the University of Toronto, until satisfactory changes are made”. I don’t know the details of the QS algorithms, but I think it’s fair to speculate that neither support for academic freedom or a university’s ability to attract top faculty for appointments, speeches, distinctions or honours is factored in, or if factored in, weighted appropriately.
Digging just a little bit deeper, someone with a modicum of understanding of the university system in Canada and Ontario in particular would know that the University of Toronto is one of Ontario’s 23 public universities, all of which have programs approved and regularly reviewed for quality by the same government, and funded under the same formulae and provide the same economic support for students. Degrees at a particular level are considered equivalent locally and courses are often transferable between institutions. When not under censure, the University of Toronto is indeed a high quality university; so is the University of Ottawa, where I work, Carleton (the other Ottawa-based university), and all the other Ontario universities. Specific programs frequently undergo additional accreditation. My department offers a Master’s of Information Studies program that is accredited by the American Library Association (ALA). Both the Ontario government and ALA require actual data in their QA / accreditation process. This includes evidence of strategic planning, but not guesswork about future output.
If QS is this far off base in their assessment of universities in the largest province of a G7 country (the epitome of the Global North), how accurate is QS and other global university rankings in the Global South? According to Stack (2021) and the authors of the newly released book Global University Rankings and the Politics of Knowledgehttp://hdl.handle.net/2429/78483 global university rankings such as QS and THE and the push for the Global South to develop globally competitive “world class universities” are more about reproducing colonial relations, marketizing higher education and commercializing research than assuring high quality education. The attention paid to such rankings distracts universities and even countries from what matters locally. As Chou points out, the focus on rankings leads scholars in Taiwan to publish in English rather than Mandarin although Mandarin is the local language. A focus on publishing in international, English language journals creates a disincentive to conduct research of local importance almost everywhere.
My chapter in this work focuses on the intersection of critique on metrics-based evaluation of research and how this feeds into the university rankings system. The first part of the chapter Dysfunction in knowledge creation and moving beyond provides a brief history and context of bibliometrics and the development of traditional and new metrics-based approaches and major critique and advocacy efforts to change practice (the San Francisco Declaration on Research Assessment (DORA) and the Leiden Manifesto). The unique contribution of this chapter is critique of the underlying belief behind both traditional and alternative metrics-based approaches to assessing research and researchers, that is, the assumption that impact is good and an indicator of quality research and therefore it makes sense to measure impact, with the only questions being whether particular technical measures of impact are accurate or not. For example, if impact is necessarily good, then the retracted study by Wakefield et al. that falsely correlated vaccination with autism is good research by any metric – many academic citations both before and after publication, citations in popular and social media and arguably a factor in the real-world impact of the anti-vaccination movement and the subsequent return of preventable illnesses like measles and a factor in the challenge of fighting COVID through vaccination. An alternative approach is suggested, using the traditional University of Ottawa’s collective agreement with APUO (union of full-time professors) as a means of evaluation that considers many different types of publications and considers quantity of publication in a way that gives evaluators the flexibility to take into account the kind of research and research output.
by: Heather Morrison, Luan Borges, Xuan Zhao, Tanoh Laurent Kakou & Amit Nataraj Shanbhoug
This study examines trends in open access article processing charges (APCs) from 2011 – 2021, building on a 2011 study by Solomon & Björk (2012). Two methods are employed, a modified replica and a status update of the 2011 journals. Data is drawn from multiple sources and datasets are available as open data (Morrison et al, 2021). Most journals do not charge APCs; this has not changed. The global average per-journal APC increased slightly, from 906 USD to 958 USD, while the per-article average increased from 904 USD to 1,626 USD, indicating that authors choose to publish in more expensive journals. Publisher size, type, impact metrics and subject affect charging tendencies, average APC and pricing trends. About half the journals from the 2011 sample are no longer listed in DOAJ in 2021, due to ceased publication or publisher de-listing. Conclusions include a caution about the potential of the APC model to increase costs beyond inflation, and a suggestion that support for the university sector, responsible for the majority of journals, nearly half the articles, with a tendency not to charge and very low average APCs, may be the most promising approach to achieve economically sustainable no-fee OA journal publishing.
The Directory of Open Access Journals (DOAJ, http://doaj.org/) is an essential world-wide open access service (16,134 journals listed, as of March 29, 2021), which promotes quality, peer-reviewed open access journals. The journals included can get higher and broader visibility. To make the most of this service, journal editors need to pay attention to the accuracy of their entries in the DOAJ metadata (journal-title, publisher information, location information, subject, language, URLs, etc.). This post aims to explain the benefits for journals of improving the quality of metadata and what journal editors can do.
For journals, what are the benefits of improving the DOAJ metadata?
As detailed on the DOAJ website (DOAJ, https://doaj.org/apply/why-index/), there are five benefits for journals indexed in DOAJ, and accordingly, five reasons to improve the metadata:
“Reputation and prominence”
“DOAJ is the most important community-driven, open access service in the world and has a reputation for advocating best practices and standards in open access. By indexing your journal in DOAJ, its reputation and prominence will be enhanced.”
We assume that journals with accurate and precise entries can give a serious and active impression, helping them maintain the reputation.
“Standards and best practice”
“DOAJ’s basic criteria for inclusion have become the accepted way of measuring an open access journal’s adherence to standards in scholarly publishing. We can help you adopt a range of ethical and quality standards, making your journals more attractive publishing channels. DOAJ is committed to combatting questionable publishers and questionable publishing practices, helping to protect researchers from becoming trapped by unethical journals.”
As open access journals are listed in a quality standards system like DOAJ, it is important to make sure that their information is correct to distinguish them from the questionable journals undoubtedly.
“Funding and compliance”
“Open access publication funds often require that authors who want funding must publish in journals that are included in DOAJ. Indexing in DOAJ makes your journals compliant with many initiatives and programmes around the world, for example Plan S in Europe or Capes/Qualis in Brazil.”
With correct entries in metadata, the DOAJ journals can be more easily discovered by foundations, related programmes and organizations.
“Discoverability and visibility”
“DOAJ metadata is free for anyone to collect and use, which means it is easily incorporated into search engines and discovery services. It is then propagated across the internet. If you provide us with article metadata for your journal, this will be supplied to all the major aggregators and the many research organisations and university library portals who use our widgets, RSS feeds, API and other services. Indexing your journal in DOAJ is likely to increase traffic to your website and give greater exposure to your published content. Levels of traffic to a journal website typically increase threefold after inclusion in DOAJ. Your journal’s visibility in search engines, such as Google, will improve.”
Indexing journals in DOAJ means they are more easily discovered and cited by other researchers. Correcting metadata will help raise the chances that people working in the same area will find the relevant research they need.
“Our database includes more open access journals from a diverse list of countries than any of the other major indexing services. We have a global editorial team via a network of Managing Editors, Ambassadors and volunteers, so we will do our best to offer local support in your language. We promise you that information about your journal will be seen around the world.”
The DOAJ journals are aimed at readers from all over the world and may be seen by people who are not proficient in the journals’ language. In this case, journal editors need to ensure the correctness of data entry so that readers can read with confidence.
What’s more, a higher quality database will be more valuable for researchers and promote the entire OA ecosystem. Especially for services like university libraries, which tend to keep up with the latest content and take advantage of metadata corrections.
In brief, keeping the entries of DOAJ metadata correct reinforces the advantages for journals mentioned above and benefits the users of DOAJ.
As journal editors, what can we do?
As demonstrated in a study of the SKC (Zhao, Borges & Morrison, 2021), “as of January 5, 2021, only 30% of DOAJ journals have a ‘last update’ date within the previous year (2020)”, which means only 30% of DOAJ journals fully or partially updated their information in DOAJ system. To make the best use of DOAJ, journal editors should regularly check their entries to ensure that their data is correct and up to date. For example, if journal URLs are not kept up to date, an incorrect URL means, at best, that the journal cannot be found. Crawford (2016), in a study of DOAJ journals, found journals flagged that were as malware (or as containing malware) by Mal- warebytes, Windows Defender, McAfee Site Advisor or Office 2013.
Most of the visible inconsistencies in the metadata are input errors or location errors (listed below). Most of the input errors are “small differences in punctuation and/or characters, extra spaces at the beginning and/or at the end”, as reported by SKC (Zhao, Borges & Morrison, 2021). Combined with the findings of Crawford (2016), we list the data to be modified by categories as follows:
Input error or location error in:
wrong column, journal title, special character, keywords, copyright information URL, plagiarism information URL, URL for journal’s instructions for authors, other submission fees information URL, preservation services, preservation service: national library, preservation information URL, deposit policy directory, persistent article identifiers, URL for journal’s open access statement, etc.
Publisher name duplicates:
Extra space or short of space, minor detail (e.g. non-English character in one but not the other), minor difference in punctuations and/or characters (e.g. “Abant İzzet Baysal Üniversitesi” vs. “Abant İzzet Baysal University”), abbreviation in one but not the other (e.g. “Asociación Interuniversitaria de Investigación Pedagógía” vs. “Asociación Interuniversitaria de Investigacion Pedagogica (AIDIPE)”), etc.
“APC-charging journals that don’t clearly state the amount charged” (Crawford, 2016)
Sometimes it is hard to indicate “who is the publisher”. We list some situations below:
When there are branch publishers under one publisher, and all of them are recorded in DOAJ, especially when their journals’ websites do not have any clear indications ;
When a publisher has more than one active names (perhaps due to different sponsors of one publisher, or the nature of commercial publishers), but their journals’ websites do not have any clear indications ;
When journals changed their websites but didn’t renew the URLs in the DOAJ database;
Unmatched publisher name/journal name and URLs.
DOAJ also provides article-level search and is working to encourage more journals to provide article-level metadata. It makes both the journal-level and article-level metadata available for anyone to download. (DOAJ, https://doaj.org/docs/public-data-dump/) Thus, it would be better if journal editors can ensure the correctness of the articles’ information.
The Directory of Open Access Journals http://doaj.org is an excellent service that fulfills many important functions, in particular facilitating access to a vetted collection of over 15,000 freely available peer-reviewed journals. The DOAJ search services and metadata download are very useful for researchers as well. The purpose of this post is to alert researchers to some of the limitations of the DOAJ metadata that researchers need to take into account to avoid drawing erroneous conclusions. First, when downloading DOAJ metadata, it is necessary to open the .csv file in Unicode in order to retain non-English characters. We open in Open Office for this reason, then save as an excel file. The nature of the metadata means that some data is inserted in the wrong column; clean-up, as discussed below, is necessary before data analysis. When journal editors or others working on their behalf enter metadata into DOAJ, research is not the primary purpose of this exercise; for this reason, in-depth assessment and corrections may be necessary before analysis. Below, we present publisher size analysis as an example of what researchers may encounter. Finally, because the main purpose of DOAJ is connecting readers with content, the metadata of interest to a particular research project may not be up to date. As demonstrated below, as of Jan. 5, 2021, only 30% of DOAJ journals have a “last update” date within the previous year (2020). We do not know whether the “last update” date reflects a full or partial metadata review. We illustrate the potential impact on research results with the example of the SKC longitudinal APC study. Of the 4,292 DOAJ journals that responded “yes” to the APC question, only 30% have a last update date of 2020 or 2021. Even with this 30% of journals, we have no way of knowing whether the APC status and/or amount per se was updated, or only other unrelated metadata. This means that if we compare 2019 prices obtained from publisher websites in 2019 with 2021 DOAJ APC metadata, we will almost certainly get incorrect results, for example falsely assuming that matching APC amounts means no change in the prices. DOAJ provides rich and useful metadata for the researcher and the research question “is this journal listed in DOAJ?” is of value in and of itself. For this reason, we intend to continue using DOAJ metadata in addition to data derived from other sources, particularly data derived directly from publisher websites. See below to a link to an open data version of the DOAJ metadata reflecting the corrections explained in this post.
Correcting for displaced observations
As previously mentioned, the first step to confidently use the DOAJ metadata for analysis and research is identifying and correcting data inserted in the wrong column, herein also called displaced observations.
Below we can see an example of a displaced observation from the DOAJ metadata. Column BB has no assigned variable while containing some observations, apparently displaced one column to the right.
Users may follow different steps to correct for displaced data. Here we explain in more detail how we have identified these displacements and corrected them.
Before proceeding with any analysis, it is important to get familiarized with the DOAJ metadata first. We recommend users to read the DOAJ Guide to applying, available online, because the metadata reflects responses to questions asked in the application process. The DOAJ metadata, as of 5 Jan. 2021, possesses 53 variables ranging from Journal Title to Country to Most recent article added. It may be helpful to start correcting observations from variables with easily identifiable responses, such as « Country » or « Country of Publisher », or variables that allow only two types of answers (i.e Yes or No), such as Author holds copyright without restrictions and APC. It is recommended to create a pivot table to identify displaced observations, repeating this process until no observations are identified in a wrong column.
When cleaning-up the DOAJ metadata, users will notice that in some cases only one observation was displaced; in other cases, an entire row was displaced beginning on a specific variable. In the example highlighted in yellow below, all observations beginning at variable Publisher were displaced one column to the right.
Data entry inconsistencies
When correcting for displaced observations, we have also identified some inconsistencies in the way observations are registered in the DOAJ metadata. The table below lists the main visible inconsistencies found for some variables. In the majority of instances, the inconsistencies will not impact DOAJ users looking up information for a particular journal. However, it is important to take into account these inconsistencies before proceeding to any automated statistical analysis. For example, DOAJ metadata as is can be used to identify the number of journals with persistent article identifiers, but automated counting of DOI v. ARK or other approaches would require some advance data manipulation.
Some journals alternative titles may be registered as a number. Some examples are “2300-6633” and “0”.
Some observations have some special characters as follows: 6. rheology, tribology, hydrodynamics, thermodynamics, mechanics of structures, mechatronics. water cycles, water environment, water treatment and reuse, water resource, water quality, hydrology • natural sciences, • environmental sciences, • social sciences, agricultural sciences, veterinary medicine, medical sciences
Copyright information URL
Some URLs lack a letter « h » at the beginning or the end. The example below illustrates this small error. There should be an “h” at the beginning and an “l” at the end of the link. ttp://www.emeraldgrouppublishing.com/services/publishing/jiuc/authors.htm
Plagiarism information URL
Some URLs lack a letter « h » at the beginning or the end. The example below illustrates this small error. There should be an « h » at the beginning and an « l » at the end of the link. ttp://www.emeraldgrouppublishing.com/services/publishing/jiuc/authors.htm
URL for journal’s instructions for authors
Some URLs lack a letter « h » at the beginning or the end. The example below illustrates this small error. There should be an « h » at the beginning of the URL ttps://revistas.unasp.edu.br/LifestyleJournal/about/submissions
Other submission fees information URL
Some URLs have extra letters. The example below, for instance, has a letter « i » at the beginning of the URL ihttps://journals.univie.ac.at/index.php/voebm/m/index Some URLs lack a letter « h » at the beginning or the end. The example below illustrates this small error. There should be an « h » at the beginning of the URL ttp://psr.ui.ac.id/index.php/journal/about/submissions#authorGuidelines ttps://www.karger.com/Journal/Guidelines/261897#sec62
Preservation services can be registered as a name or a website
Preservation Service: national library
Preservation services – national library can be registered as a name or a website
Preservation information URL
Some URLs lack a letter « h » at the beginning or the end. The example below, for instance, has a small error. There should be an « h » at the beginning of the URL tps://periodicos.uff.br/revistagenero/about/editorialPolicies#focusAndScope ttp://ejournal.stkip-pgri-sumbar.ac.id/index.php/economica
Deposit policy directory
Deposit policy directory can be registered as a name or a website
Persistent article identifiers
Persistent article identifiers can be registered as an acronym (UDC, DOI, ARK), but also as a website, such as dc.identifier.uri (DSpaceUnipr) or NBN http://www.depositolegale.it/national-bibliography-number/. Another example is the occurrences UDC and UDC (Universal decimal Classification), which are equivalents but were registered differently
URL for journal’s Open Access statement
Some URLs lack a letter « h » at the beginning or at the end, or they have an extra h at the beginning of the URL. The example below has an extra letter « h » at the beginning of the URL. hhttp://www.revistas.usp.br/gestaodeprojetos/about
Table 3 – Visible inconsistencies identified in the DOAJ metadata
Publisher’s names duplicates investigation and clean-up
The purpose of this project is preparation to develop a rough picture of publisher size to compare with Solomon & Björk’s findings (2012). In order to better perform publisher size analysis, we have specifically investigated the publisher duplicates and corrected most of the obvious errors, such as small differences in punctuation and/or characters, extra spaces at the beginning and/or at the end, and minor differences in entering the publisher name when it is the same, etc. (Please see examples in Table 4 – Investigative Strategies – Publisher Names Duplicates).
The process of clean-up was divided into three stages. Firstly, we created a pivot table for the publisher column to identify the entries in rows which were slightly different but weren’t gathered. Secondly, when potential duplicates were found, we conducted an investigation to confirm duplicates and/or to decide which name to keep (in priority order: use the name with the most journal entries; correct name with obvious typo; use the first name listed). Please see the investigative strategies below:
Thirdly, after identifying inconsistencies in publisher names, we created a table (please see Table 5 – Corrections Gathering – Publisher Names Duplicates) to register all the corrections on the variable Publisher. About 500 inconsistencies were corrected. Thus, the number of publishers in the pivot table has decreased from 7218 entries (data resource: pivot table based on DOAJ metadata) to 6804 entries (data resource: pivot table based on the cleaned-up version of database).
As illustrated in the two tables above, there were different types of data inconsistencies. In order to respect metadata to the greatest extent, we acted prudently when making decisions. In some minor variation cases, we tried to click on the URLs to check publisher websites and to collect convincing evidence. However, we met some intricate complex challenges.
One of the challenges was the language. Due to the massiveness and the wide-range of publishers (124 countries, 80 languages, DOAJ, 7 Feb. 2021) [https://doaj.org/], we were unable to identify all of the sources of information. Besides, when there were invalid URLs or unmatched information, it was difficult to seek out any precision. What’s more, among 7218 entries of publisher names, some of the potential duplicates weren’t gathered because of their different beginning words. For example, “Editora da Universidade Estadual de Maringá (Eduem)” vs. “Eduem – Editora da Universidade Estadual de Maringá” and “Academica Brâncuşi” vs. “Editura Academica Brâncuşi”. They were usually far apart and hard to be detected. More details can be found in the Table 6 below:
Different beginning words (examples)
“Academica Brâncuşi” vs. “Editura Academica Brâncuşi”; “Alexandru Ioan Cuza University of Iaşi” vs. “Editura Universităţii ‘Alexandru Ioan Cuza’ Iaşi”; “Editora da Universidade Estadual de Maringá (Eduem)” vs. “Eduem – Editora da Universidade Estadual de Maringá”
Alborz University of Medical Sciences (URLs wrongly directs to a website whose contents are meaningless; when we searched the journal title, we were directed to this website : https://enterpathog.abzums.ac.ir/)
Given the barriers and challenges mentioned above, we can draw a conclusion to the limitations of publisher names clean-up project. Precision is not possible in this project because the question “who is the publisher” is complex. Instead of making any definitive claims about publisher size, we are primarily interested in whether the long tail effect (a few big publishers, a few more middle-sized, most very small) reported by Solomon & Björk (2012) can still be observed in DOAJ in 2021.
DOAJ metadata update analysis
The following analysis was conducted to determine whether DOAJ metadata on article processing charges (APCs) – charging status and amount – would be sufficient for SKC’s longitudinal study on APC trends over time. The answer is clearly no. The metadata for the vast majority of journals in DOAJ (overall and APC charging) has not been updated for more than a year, and it is unknown whether the most recent update would have included an update to APC or other metadata. We will continue to use DOAJ metadata as it is rich and the question “is this journal listed in DOAJ” is of value in and of itself, however for price comparisons we cannot rely on this data as it would likely result in erroneous conclusions.
DOAJ journals by year of last update.
This chart illustrates the percentage of DOAJ journals last update by year. Detailed figures are in the table below. Note that just under half the journals were last updated 2 or more years ago (2018 or earlier).
DOAJ last update as of Jan. 5, 2021
# journals last updated
% journals last updated
DOAJ APC charging journals by year of last update
The chart above illustrates the percentage of journals that answered “yes” to the DOAJ question about charging APCs by year of last update. The table below provides the detailed figures. Note that only 30% of DOAJ journals that charge APCs were updated in the past year (2020 or 2021). It is also unknown whether in these cases the last update was a thorough review of the metadata, or might have been an update of non-APC data.
DOAJ last update APC journals only Jan. 5, 2021
Year of last udpate
# of journals last updated
% journals last updated
A version of the Jan. 5, 2021 DOAJ metadata file reflecting the corrections explained below is available as open data here:
Directory of Open Access Journals; Zhao, Xuan; Borges, Luan; Morrison, Heather, 2021, “DOAJ_metadata_2021_01_05_with_SKC_clean_up”, https://doi.org/10.5683/SP2/G5LEXG, Scholars Portal Dataverse, V1
Solomon, D. J., & Björk, B. (2012). A study of open access journals using article processing charges. Journal of the American Society for Information Science and Technology, 63(8), 1485–1495. https://doi.org/10.1002/asi.22673
The goal of this literature review was to gain an understanding of the current status of research on the topic of digital blog preservation. After conducting a series of searching within the database LISTA (Library, Information Science, and Technology Abstracts), one can determine that there are little to no recent developments in technology or research specifically for the access/preservation of digital blog posts.
Unsurprisingly, much of the scholarly conversation about blog/microblog preservation took place between 2002 and 2010.
Thoughts on Blog Preservation
Despite the varying opinions that blogs are either easier or more difficult to preserve than other digital communications, scholars agree that blogs and microblogs have unique qualities that deserve scholarly discussion.
According to Patsy Baudoin, many blogging websites utilize software that automatically preserves the sequencing of posts (2008). This innate quality of the software supports the archiving principles of “original order” and “provenance”. However intelligent the blogging software appears to be, blogs and other user-generated content are especially vulnerable to link rot (Banks, 2010).
Blogs can become complex to preserve because they may contain various file formats, media, or have several owners (Baudoin, 2008). To add to this sentiment, Grimard (2005) states that the variety of formats adds to the “opaqueness” of digital records (opaqueness referring to the unnatural structure of electronic information that is only computer-readable).
To maintain the integrity of the blog during the preservation process, the digital archivist would have to consider preserving the additional external links within the original blog post. Furthermore, copyright can be an issue in certain blog preservation circumstances, as there have been several cases brought to the US Supreme Court (Chen, 2005).
Open-source technologic advancements in blog preservation have been disappointing at best. According to Caroline Young, there have been several programs for blog preservation that have essentially failed soon after conception (2013).
Some examples are PANDORA by the National Library of Australia, and ArchivePress by the University of London’s Computer Centre and British Library Digital Preservation department. Young mentions a developing blog preservation software called BlogForever, which was still in development in 2013. Now, it seems to be available for use and claims to be a new system to harvest, preserve, manage and reuse blog content.
Young (2013), Banks (2010), Rosenthal (2016), and Chen (2010) all highlight the impact made by the introduction of the Internet Archive’s Wayback Machine. The Wayback Machine has simproved the landscape of digital preservation of grey literature like bog posts; however, it is not without its challenges. Much like other archiving software, it has difficulty with images and audio files.
Solutions to the Preservation Problem
Though an older article, Grimard (2005) offers some solutions to digital preservation that are still relevant. One important recommendation is to standardize the format of the information. The recommendation is echoed by Young (2013). Both authors emphasize the importance of converting files to the most usable format. Since file formats are simply a set of conventions that software developers can change and alter, they may become obsolete. Young describes the universal XML format as being hierarchical and organized logically.
LOCKSS is a blog preservation software mentioned in both Leroy (2018) and Rosenthal (2016). It is an open-source software designed with libraries in mind. It also claims to preserve animations, data sets, images, audio, and text content.
The scholarly conversation on the preservation and conservation of blog content has slowed in the past decade. This could be because the options currently available are adequate for the need of blog preservation.
Blogs and microblogs are comprised of various formats that can contribute to the challenges in digital preservation. According to research in the early 2010s, images, animations, and audio files, which blogs usually contain, are difficult to preserve with the Wayback Machine. This may have improved in the more recent years.
There are also preservation software options like the LOCKSS and BlogForever that seems to be more targeted toward archiving blog content than the Wayback Machine is.
Farace, D., & Schöpfel, J. (Eds.). (2010). Chapter 14. Blog Posts and Tweets: The Next Frontier for Grey Literature. In Grey Literature in Library and Information Studies (pp. 217–226). K. G. Saur. https://doi.org/10.1515/9783598441493.2.217
Grimard, J. (2005). Managing the Long-term Preservation of Electronic Archives or Preserving the Medium and the Message. Archivaria, 153–167.
While many aspects of our lives and activities have slowed down during the COVID pandemic, this has not been the case with open access! The OA initiatives tracked through this series continue to show strong growth on an annual and quarterly basis. Important milestones are being reached, and others will be coming soon.
The Directory of Open Access Journals now lists over 15,000 fully open access, peer reviewed journals, having added 379 journals (> 4 per day) in the past quarter, and now provides searching for over 5 million articles at the article level.
A PubMed search for “cancer” limited to literature from the past 5 years now links to full-text for over 50% of the articles.
Anyone worried about running out of cultural materials during the pandemic will be relieved to note that the Internet Archive has exceeded a milestone of 6 million movies in addition to over 27 million texts (plus audio, concerts, TV, collections, webpages, and software).
Analysis of quarterly and annual growth for 39 indicators from 10 services reflecting open access publishing and archiving (Internet Archive, Bielefeld Academic Search Engine, Directory of Open Access Books, bioRxiv, PubMedCentral, PubMed, SCOAP3, Directory of Open Access Journals, RePEC and arXiv) demonstrates ongoing robust growth beyond the baseline growth of scholarly journals and articles of 3 – 3.5 per year. Growth rates for these indicators ranged from 4% – 100% (doubling). 26 indicators had a growth rate of over 10%, 15 had a growth rate of over 20%, and 6 had a growth rate of over 40%. The full list can be found in this table.
Thank you to everyone in the open access movement for continuing the hard work that makes this growth possible.
Quelques articles seront familiers aux lecteurs de Soutenir les savoirs communs, le travail de l’équipe; d’autres sont nouveau recherche fait par Tanoh. La vidéo Qu’est-ce que la revue Afroscopie?, un entretien avec Benoit Awazi, est éclairante pour quiconque s’intéresse à la recherche en Afrique francophone.
Merci et félicitations à notre Tanoh Laurent Kakou, candidat au doctorat en communication (et diplômé d’ÉSIS), qui a réussi son examen de synthèse cet été! Meilleurs voeux à Tanoh et sa recherche.
Some articles will be familiar to readers of Sustaining the knowledge commons, as the work of the team; others are new research projects by Tanoh. The video Qu’est-ce que la revue Afroscopie?, an interview with Benoit Awazi, is enlightening for anyone who is interested in research in francophone Africa.
Thank you and congratulations to our Tanoh Laurent Kakou, a doctoral candidate in communication (and graduate of ÉSIS) on passing his comprehensive exam this summer! Best wishes to Tanoh and his research.