Building and Using a Sustainability Information Systems Dictionary

Principal Investigator:
Thomas Abraham

Co-PIs:
Viet Dao, Nesreen El-Rayes

Abstract:
In this study, we build a dictionary of sustainability information systems (SIS) terms using a literature-based process. We refine the dictionary by applying it to sustainability reports and comparing the results to an earlier manual process. We use a text mining tool called WordStat to help us extend and refine our SIS dictionary. Sustainability reports provide a valuable secondary source for researching SIS. Previously, finding the information systems required tedious manual searches through the documents to identify both the systems and the context in which they were used. Our results include a SIS dictionary artifact and the information systems identified in the sustainability reports of three organizations in different industries.

Description of Research:
Building the Dictionary
In this study, we follow a six-step semi-automatic dictionary- building process (S-DBP) (Deng et al., 2019; Brier & Hopp, 2011). The first step is to clarify the objective of the dictionary. In our case, the purpose of the dictionary is to help identify sustainability information systems within sustainability reports. The second step is corpus creation. The corpus consists of the documents used as the source material to help identify terms used in the dictionary. Instead of building a corpus from scratch, we used data from an earlier study (El-Rayes et al., 2022), which extracted the abstracts, titles, and keywords from a huge collection of research articles in the area of sustainability information systems from the Scopus database. The researchers used a a frequency analysis of terms using a text mining tool called VOSViewer to extract over 20,000 terms from this extensive corpus. They used a pre-processing technique called cutoff criteria to select 604 terms that met the criterion of at least ten occurrences. Our study continued the pre-processing step by extracting roughly 70 terms from that list based on their relationship to information systems or information technology. We cleaned the data by removing stop words. Unfortunately, common words such as “is””, and “it” are often used as acronyms in this domain. We also eliminated words or phrases that were essentially duplicates. After pre-processing the terms, we categorized them as either Green Information Technology, or Green Information Systems. In order to expand our scope to include social sustainability, we searched the Scopus database for additional terms using the AMCIS 2024 tracks as a source. So, we searched for articles that included terms such as digital equity, digital social entrepreneurship, social computing, health informatics, and so on.
Applying the Dictionary to Sustainability Reports
We created a WordStat project with 5 Biogen GRI documents that we had previously coded manually. These were sustainability reports from 2009 to 2015. We analyzed the documents by applying the SIS dictionary to them. We then used the Keyword Retrieval feature to extract and save the original paragraphs containing the terms in the five cases (GRI documents). This step produced 108 paragraphs containing the terms from the dictionary. This analysis took only a few minutes for five reports consisting of several hundred pages. We compared the paragraphs retrieved using text mining software and the SIS dictionary with the manual search we had previously used .We see that for the 2009 report, WordStat found ten paragraphs with terms that matched the dictionary entries. The manual search found three entries. We refined our dictionary based o the results and applied it to 5 years of sustainability reports from Coca Cola and Caterpillar. After each run, we refined our dictionary. The final results provided us with a comprehensive list of sustainability information systems used by the three organizations. We believe this is a successful proof of concept for using a dictionary and text mining tools to extract useful data from sustainability reports.

Previous
Previous

When to Hedge Downside Risk?

Next
Next

An Integrated Approach to Identify Dual-Target Inhibitors for Nipah Virus Through Drug Repurposing