Preservation of Digital Blog-Posts

A Literature Review, January 2021

The goal of this literature review was to gain an understanding of the current status of research on the topic of digital blog preservation. After conducting a series of searching within the database LISTA (Library, Information Science, and Technology Abstracts), one can determine that there are little to no recent developments in technology or research specifically for the access/preservation of digital blog posts.

Unsurprisingly, much of the scholarly conversation about blog/microblog preservation took place between 2002 and 2010. 

Thoughts on Blog Preservation

Despite the varying opinions that blogs are either easier or more difficult to preserve than other digital communications, scholars agree that blogs and microblogs have unique qualities that deserve scholarly discussion.  

According to Patsy Baudoin, many blogging websites utilize software that automatically preserves the sequencing of posts (2008). This innate quality of the software supports the archiving principles of “original order” and “provenance”. However intelligent the blogging software appears to be, blogs and other user-generated content are especially vulnerable to link rot (Banks, 2010).

Blogs can become complex to preserve because they may contain various file formats, media, or have several owners (Baudoin, 2008). To add to this sentiment, Grimard (2005) states that the variety of formats adds to the “opaqueness” of digital records (opaqueness referring to the unnatural structure of electronic information that is only computer-readable).

To maintain the integrity of the blog during the preservation process, the digital archivist would have to consider preserving the additional external links within the original blog post. Furthermore, copyright can be an issue in certain blog preservation circumstances, as there have been several cases brought to the US Supreme Court (Chen, 2005).

Preservation Technology

Open-source technologic advancements in blog preservation have been disappointing at best. According to Caroline Young, there have been several programs for blog preservation that have essentially failed soon after conception (2013).

Some examples are PANDORA by the National Library of Australia, and ArchivePress by the University of London’s Computer Centre and British Library Digital Preservation department. Young mentions a developing blog preservation software called BlogForever, which was still in development in 2013. Now, it seems to be available for use and claims to be a new system to harvest, preserve, manage and reuse blog content.

Young (2013), Banks (2010), Rosenthal (2016), and Chen (2010) all highlight the impact made by the introduction of the Internet Archive’s Wayback Machine. The Wayback Machine has simproved the landscape of digital preservation of grey literature like bog posts; however, it is not without its challenges. Much like other archiving software, it has difficulty with images and audio files. 

Solutions to the Preservation Problem

Though an older article, Grimard (2005) offers some solutions to digital preservation that are still relevant. One important recommendation is to standardize the format of the information. The recommendation is echoed by Young (2013). Both authors emphasize the importance of converting files to the most usable format. Since file formats are simply a set of conventions that software developers can change and alter, they may become obsolete. Young describes the universal XML format as being hierarchical and organized logically. 

LOCKSS is a blog preservation software mentioned in both Leroy (2018) and Rosenthal (2016). It is an open-source software designed with libraries in mind. It also claims to preserve animations, data sets, images, audio, and text content.


The scholarly conversation on the preservation and conservation of blog content has slowed in the past decade. This could be because the options currently available are adequate for the need of blog preservation.

Blogs and microblogs are comprised of various formats that can contribute to the challenges in digital preservation. According to research in the early 2010s, images, animations, and audio files, which blogs usually contain, are difficult to preserve with the Wayback Machine. This may have improved in the more recent years.

There are also preservation software options like the LOCKSS and BlogForever that seems to be more targeted toward archiving blog content than the Wayback Machine is.

Reference List

Chen, X. (2010). Blog Archiving Issues: A Look at Blogs on Major Events and Popular Blogs. Internet Reference Services Quarterly15(1), 21–33.

Baudoin, P. (2008). On Preserving Blogs for Future Generations. The Serials Librarian53(4), 59–61.

Farace, D., & Schöpfel, J. (Eds.). (2010). Chapter 14. Blog Posts and Tweets: The Next Frontier for Grey Literature. In Grey Literature in Library and Information Studies (pp. 217–226). K. G. Saur.

Grimard, J. (2005). Managing the Long-term Preservation of Electronic Archives or Preserving the Medium and the Message. Archivaria, 153–167.

Leroy, A. (2018). LOCKSS Distributed Digital Preservation Networks. Université libre de Bruxelles. Belgium. ISSN, 9.

Rosenthal, D. S. H. (2017). The medium-term prospects for long-term storage systems. Library Hi Tech35(1), 11–31.

Young, C. (2013). Oh My Blawg! Who Will Save the Legal Blogs? Law Library Journal105(4), 493–503.

Cite as: Pelland, K. (2021). Preservation of digital blog-posts. Sustaining the Knowledge Commons.