1. Introduction
In 2024, the International Union of Geological Sciences (IUGS) made a balanced decision to reject the proposal for the Anthropocene inclusion in the Geologic Time Scale, as a geological epoch distinct from the Holocene, based on chrono-statigraphic criteria. They, however, recognized that “the Anthropocene as a concept will continue to be widely used not only by Earth and environmental scientists, but also by social scientists, politicians and economists, as well as by the public at large. As such, it will remain an invaluable descriptor in human-environment interactions” [
1] (p.3). Thus, the Anthropocene, a term coined and popularized by Crutzen and Stoermer [
2], has moved beyond formal nomenclature and the more complex scientific debates regarding whether it accurately reflects the geology of humankind [
3], its chronological framework, e.g., the rise of agriculture, the invention of the steam engine or the first detonation of atomic bomb [
4,
5,
6], its connection to the Great Acceleration [
7], or the accumulation of human-made materials on Earth [
8]. Instead, it has become a key element in the narratives’ landscape surrounding conservation efforts [
9]. The overall process reveals contrasting philosophies and the various meanings assigned to them during the numerous years of scientific debate.
In this paper, the concept of the Anthropocene is used as a conservation narrative [
9], highlighting humans’ impacts on Earth’s oceans, geology, landforms, landscapes, freshwater systems, hydrology, ecosystems, biodiversity, and climate. This concept is linked to the idea of Accelerating Change [
10], which refers to the rapid and exponential rate of technological advancement observed in recent history. This acceleration may indicate a more abrupt and profound transformation in the social, political, economic, and cultural factors that shape human-nature relationships [
11] and, to some extent, changes or shifts in relative beliefs, perceptions, or preferences of public opinion [
12].
In geology, the terms “golden peak” and “golden spike” refer to markers that define the boundaries of geological time periods. Similarly, we can identify disruptions in conservation narratives and practices that help us understand the accelerating changes of the Anthropocene narrative. First, there is the issue of the politicization of conservation science [
13,
14,
15]. Key concepts in environmental discourse, such as conservation, biodiversity, climate change, and sustainability, are largely re-formulations of scientific ideas that have been around for decades. These terms often serve, currently, to support political decisions and advance policy agendas.
Secondly, ’bureaucratic capture’ [
16] and the corporate mindset of the conservation establishment, such as international Environmental Non-Governmental Organizations [
17] (pp. 7-8), have resulted in the articulation of conservation priorities and needs that align with prevailing development policies [
18,
19]. As several authors state, the almost avoidance of public recognition that action to protect nature happens when arguments are framed in terms that resonate with the combination of imagination, feelings, and rationality that guide decision-making in people’s everyday lives [
20,
21] could be a social Anthropocene’s marker. Thirdly, as a plausible extension of the previous, the denial of scientific findings in the area of environmental science and policy, especially regarding existentialistic risks, mostly climate change and emerging diseases [
22], is neither a new phenomenon nor disconnected from conspiracy theories and distrust to scientists’ practices and integrity [
23,
24,
25]. The phenomenon’s extent and severity might be a marker of the Anthropocene’s counter-narratives.
However, the accelerating change in the Anthropocene narrative [
10] is rooted in technological change and progress, the advent of the Internet and the World Wide Web [
26], and the technology of digitizing printed material [
27,
28]. The recording of internet users’ searches, mostly through Google web search engine and its services Google Trends offered the technical foundations for establishing culturomics [
29] and conservation culturomics [
30,
31] and the processing of internet searches’ records in almost real-time. The recent extension of Google Trends-Glimpse [
32] recording the Absolute Search Volume in Google (ASVG) without normalization increased further the technological capacity, revolutionizing the extent of the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities [
29] (p. 176). The same stands for conservation culturomics [
30] (p. 269), focusing on demonstrating public interest in nature, identifying conservation emblems, providing new metrics and tools for near-- real-time environmental monitoring and supporting conservation decision-making, and framing conservation issues and promoting public understanding.
The above gain is enhanced and multiplied with the introduction of the technological capabilities of The GDELT Project [
33], summarized in the document Culturomics 2.0 [
34]. As Leetaru [
34] explains, “the traditional Culturomics approach treats every word or phrase as a generic object with no associated meaning and measures only the change in the frequency of its usage over time. The Culturomics 2.0 approach extends this model by imbuing the system with higher–level knowledge about each word, specifically focusing on “news tone” and geographic location, given their importance to understanding news coverage. Translating textual geographic references into mappable coordinates and quantifying the latent “tone” of news into computable numeric data permits an entirely new class of research questions to be explored via the news media, not possible through the traditional frequency count approach”. The GDELT Project monitors the world’s broadcast, print, and web news from nearly every corner of every country in over 100 languages.
Conservation Culturomics 2.0 is a subtle nod to the original Culturomics 2.0 and should be regarded as a subfield of the almost limitless, real-time evolving GDELT thesaurus [
33]. This study aims to explore how advances in technology, specifically through monitoring public opinion via internet user searches (using tools like Google Trends- Glimpse), can complement the efforts to analyze how the ‘same’ public opinion as information receivers interact with key concepts such as conservation, biodiversity, climate change, and sustainability (as seen in the GDELT Project). By examining dominant communication streams, the research addresses doubts, inconsistencies, and variations in the existing literature concerning the early phase of conservation culturomics [
35]. A
fter all, as described on the Society of Conservation Biology website ‘Conservation Culturomics’ [
36]
, its scientific program focuses on ‘insights that could prove to be key in allocating funds for conservation projects, increasing their visibility and success, and ultimately incorporating social aspects of conservation on a broad scale in a quantitative fashion’.
As a reminder, three key elements highlight the controversies that characterized the early phase of Conservation Culturomics from 2013 to 2020. First, there are questions regarding the validity and replicability of normalized Google Trends data reflecting public searches on relevant keywords [
37,
38,
39,
40,
41]. Second, the language in which these searches were conducted raises additional concerns [
42]. Third, various independent confounding factors, such as internet penetration rates by country, literacy levels, press freedom, and access to information, further complicate the analysis [
43].
In this contribution, we aim to shift focus from the typical questions raised during the early phase of conservation culturomics, as previously outlined. We aim to harness the methodological similarities between ecology, economics, and control theory [
44,
45]. Our objective is to identify statistical descriptives of environmental topics related to the Anthropocene narrative across major languages used in online searches. This is particularly important in a world where the flow of information holds significant practical implications. Environmental awareness is a fundamental motivator for individuals to engage in behaviors that protect the environment [
46].
However, this massive flow of information, projected to reach 149 zettabytes by 2024, of data created, captured, copied, and consumed globally often exceeds individuals’ capacity to remember pieces of information or understanding. People have a specific capacity for remembering pieces of information, the entropy of which can be expected to influence the rate at which they distort information or make mistakes [
47]. This can lead to secondary undesirable social phenomena, such as the diffusion of damaging rumors [
48,
49], the spread of misinformation, or the distortion of truth [
50].
We examine two questions: (1) Can we compare measures of size vs. variation of public interest regarding the flag keywords of the Anthropocene narrative, i.e., conservation, biodiversity, sustainability, and climate change, across the major languages of web searches as a proxy of conceptual differentiation of human-nature relationships worldwide? (2) Is it feasible to estimate the effect of information entropy on the conservation of environmental awareness in people’s memory in various languages as a proxy of social and cultural conditions? We assume that such answers might help uncover public interest deficits in various cultural setups that warrant increased conservation community efforts.
2. Materials and Methods
Consider a population of
N individuals whose searches on
k environmental topics related to the Anthropocene narrative (
k = 1…4, conservation, biodiversity, sustainability, climate change) are recorded for a period
t (
= 1…21 years, or
= 1…252 months, from January 1, 2004, to December 2024). These individuals are categorized into
l major Internet languages (
l = 1…9, including English, Chinese, Spanish, Portuguese, French, German, Arabic, Russian, and Japanese). It is important to note that we focused on nine out of the top ten languages consistently present on Internet surveillance web pages [
51]. The tenth language was excluded from our analysis due to its inconsistency during the observation period, as it alternated between Italian, Indo-Malaysian, Turkish, and Persian.
The methodology consists of two parts. The first part involves studying the evolution of the ratio of the population of internet users, serving as a proxy for public opinion, to the volume of mass media information on environmental topics related to the Anthropocene narrative over a specific period. This ratio will be referred to symbolically as the Public Opinion Response to Information Entropy (PRIE).
To analyze PRIE, we will calculate its expected (or average) value and standard deviation during a defined period (t). PRIE can be considered as a multi-language (l) ensemble based on a specific flag keyword (k) or, conversely, as a multi-keyword (k) ensemble tied to a particular language (l). In both cases, the individual elements comprising the variable PRIEt must be weighed, ensuring that the total weight is 1 or 100%.
This methodology requires matrix algebra and is analogous to the process of calculating the “efficient frontier,” inspired by H. Markowitz’s risk-return relationship for multi-asset portfolios [
52]. In the present context, the ‘efficient frontier’ represents a set of values of optimal highest expected PRIE for a given level of PRIE’s standard deviation (SD) or the lowest SD for a given level of expected PRIE. Specifically, for the time series of data for PRIE related to a language (or a keyword), we will have
l = 8 (or
k = 4) and a total of
tm = 96 months, covering the period from January 2017 to December 2024. Our goal is to identify combinations that yield either the highest expected PRIE
t with the lowest PRIE standard deviation, or vice versa.
Here are two important remarks concerning the data: (1) The first remark addresses the adjustment of the Absolute Search Volume in Google-Glimpse (ASVG henceforth) for 2004-2024 across the
k topics in the top 9 languages used for internet searches. This adjustment accounts for the evolution of the internet penetration rate. It is noteworthy that the total number of internet users in 2005 was approximately 1,023 billion, whereas it reached 5.5 billion by 2024 [
51]; (2) To make these adjustments, we calculated each language’s cumulative growth rate (either per year or per month), using the formula
where
ty represents the time period. We then multiplied the ASVG per
l and
k per time unit over the entire observation period. Google Translate also translated the searched keywords into each language used (see
Appendix A). We also collected data on the geographical distribution of ASVG frequencies by language and keyword and a world map highlighting the countries and territories with the highest frequencies for each language and keyword.
We focused on the period from January 2017 to December 2024 for the GDELT Project, which made its data publicly available at that time (refer to the Introduction section). This timeframe represents a total of
ty = 7 years or
tm = 96 months. The GDELT data collection follows the same consistent rationale, utilizing the same
k keywords and
l languages while analyzing information sources around the world. The specific GDELT query we employed is
https://api.gdeltproject.org/api/v2/summary/summary?d=web&t=compare [
53]. This query assesses the volume of up to four web and TV news keywords across various networks. It enables us to analyze human faces’ average “tone” in images, capturing both positive and negative expressions. By combining these expressions, we generated raw data reflecting the frequency of appearance for each keyword, presented as a precise time series without any smoothing applied. It is important to note that Japanese data were excluded at this stage, as we did not receive GDELT returns after the end of 2022.
The variable of interest in this analysis is the ratio of ASVG to the GDELT appearance frequency for each keyword and language over a specified period. Information entropy
Hfi for PRIE(
t,k,l) is calculated using the classical Shannon information entropy formula, defined as follows
where
fi is the frequency of information type
i, corresponding to the PRIE value for keyword
ki (
i = 1…4) at a specific language and time
ti. It is important to note that a high
fi indicates that interested individuals frequently receive this information or that the GDELT value for
ki is elevated. In the context of the Anthropocene narrative, information related to environmental topics can be expressed as a binary string of length
k (for instance, ‘1011’ when
k = 4). Given that information is represented in a binary format, there are 2
k possible types of information, each labeled with an integer 0 ≤
i ≤ 15. Additionally, entropy increases as the distribution of different types of information approaches uniformity (when
fi → 1/2
k). In that case,
Hmax = LN(4). Conversely, entropy decreases when one type of information begins to dominate (when
fi → 1).
Although it is a broad generalization, we assume that an individual searcher/receiver of information is an average representative of a PRIE
(t,k,l).This assumption allows us to connect information entropy, memory span, and mental speed, thereby determining the information entropy associated with short-term memory capacity, which in turn serves as a limiting factor for cognitive functions [
47]. Furthermore, this approach enables us to link language and cognition with the likelihood of a person or group remembering, understanding, interpreting, or correctly diffusing information. This probability is represented by a formula inspired by [
50]:
The coefficient Ac is referred to as the ‘conservation awareness factor’. It represents an individual’s (or group’s) ability to accurately retain flag keywords’ true or undisputed meaning in each language. This factor is essential for calculating the likelihood of adopting firmly held beliefs within the various components of the Anthropocene narrative.
By definition, Ac is ≥ 0. When Ac is large, the ability to control information distortion is strong, and vice versa. As Hfi approaches Hmax, the probability of distortion approaches a maximal value of 0.5, whereas as Hfi approaches zero, the probability of distortion approaches a minimal value of 1/(eAc + 1).
In the context of the ‘efficient frontier’ of the PRIE, there are four categories of interest to consider. These are:
l, representing the languages studied (
l = 1…8);
m, referring to the months of observation (
m = 1…96, covering January 2017 – December 2024);
k, denoting the keywords of interest (
k = 1…4); and,
w, indicating the weights assigned per language (or keyword), which sum to 1 or 100%. The ‘return’
R of the composite PRIE is represented by a matrix of size
m x
l, for a given
k. Mathematically, this is expressed as:
For reasons of notational simplicity, each element r
m,lof the matrix symbolizes the ratio ASVG
m,l/DELT
m,l. Each row of the matrix represents a vector of ‘expected’ (or average) monthly ‘return’
for a specific
l. The mean ‘return’ for the entire observation period is calculated by multiplying this vector by
tm = 96. The overall expected PRIE return is computed by multiplying this result by a
l x 1 vector of languages’ weights, or
Thus, the expected PRIE return is (Note: w and r are l x 1 column vectors, so wT is a row vector of size 1 x l). The PRIE ‘risk’ or Standard Deviation is calculated as the covariance matrix of the ensemble of languages l, given a keyword, for the observation period. Calculations are as follows:
Covariance matrix:
where:
- X is the m x l matrix of the ‘excess returns’, e.g., LN(rli,ti/rli,ti+1), or (rli,ti – average li), for each individual language.
- XT is the l x m transpose of X.
- m is the months of observation.
- Σ is the l x l covariance matrix
The PRIE variance is σ2 = , or (1 x l)x(l x l)x(l x 1), and the PRIE SD is .
The same procedure for calculating expected PRIE returns and standard deviation (SD) can be applied to a size
m x
k matrix
, given a specific language
l. Figure 1 illustrates an indicative case of data collection, organization, and calculations in Excel that explore the relationship between the expected (or average) value of PRIE vs. its standard deviation for
l = 8 languages and the keyword ‘biodiversity,’ covering a period of 96 months (January 2017 -December 2024). This operation involves 10
3 Monte Carlo simulations to assign relative weights to each of the height languages within the examined linguistic ensemble. We used the ’RANDARRAY(1,8)’ function in Excel to generate random or dummy weight values, which we then summed. The weight assigned to each language is derived from the random weight divided by the total random weight, ensuring the sum equals 1 or 100%. The expected PRIE
lis calculated as
Further, it is important to note that the Monthly Variation of PRIE/l can be calculated in two ways: either using the formula LN(PRIEl,t/PRIEl,t+1) or by calculating (PRIEl,ti-averagePRIEl). This approach helps to normalize (or compress) the variations in values for both the numerator and the denominator, which may differ across languages due to varying internet penetration rates and the population size of users. Overall, through this procedure, we expect to generate a ‘cloud’ of expected PRIE vs. SD points, a series of which corresponds to the ‘efficient frontier’.
The model that extended the expected PRIE vs. its SD by incorporating the information entropy and the ‘conservation awareness factor’, Af, also required Monte Carlo simulations to explore the range of Af. This approach combines PRIE calculations (as illustrated in
Figure 1) with information entropy estimations over time periods t, with k and l, leading to two main outcomes: (1) it represents the ‘efficient frontier’ for expected PRIE vs. its SD, and (2) it enables the calculation of confidence intervals and probabilities related to misunderstandings, memory failures, or lapses among individuals, considering different languages and various keywords.
Simulations were conducted using TREEPLAN SimVoi 2024. Monte Carlo simulations for
Af are based on a ‘randtruncnormal’ distribution incorporating mean, SD, min and max entropy values, with truncation values ranging from 0 to 4.
Figure 2 presents a specific case where the relationship between the expected (or average) value of PRIE and its SD aligns with the earlier calculations in Excel. In this instance, we focus on the four Anthropocene flag keywords in the studied languages, covering the same 96-month period (January 2017 -December 2024).
3. Results
We will now present a selection of results that summarize our entire procedure. The first result (
Table 1) provides an overview of the descriptive statistics for Internet usage across eight languages and four key terms of interest. This summary serves as the foundation for all calculations related to the efficient frontier, which illustrates the relationship between expected PRIE and the standard deviation of PRIE for ’l
’ languages and ’k
’ keywords that highlight the concepts of the Anthropocene narrative.
The findings in
Table 1 indicate that there is no clear, uniform global pattern regarding public opinions on perceptions of the Anthropocene narrative. The population size for each language and the relative volume of searches suggest that historical and cultural factors play a significant role. For instance, conservation is a priority in English-speaking countries, while sustainability is emphasized in Germany. Socio-political conditions, such as those in China, and a particular interest in nature observed in the Global South (as indicated by Spanish and Portuguese searches in South America) also influence these perceptions. While there is a growing interest in climate change, it cannot be concluded that this interest consistently dominates the Anthropocene narrative over the long term.
Figure 3 illustrates the evolution of four keywords related to the Anthropocene in the three most commonly used languages for internet searches: English, Chinese, and Spanish, from 2004 to 2024. This provides an additional perspective on the quantitative results presented in
Table 1. In addition to significant differences in public interest evolution between the Chinese population and those of European origin, two key observations can be made: (1) conceptual priorities vary among cultures. For example, English speakers tend to prioritize conservation for most of the studied period, while Spanish speakers focus more on biodiversity; (2) since around 2020, there has been a noticeable increase in interest in climate change across all three languages, with several monthly peaks of interest evident in the data
Figure 4 illustrates the results of the Expected PRIE versus SD PRIE methodology for the matrix of time by language (
m x
l) for
k = 4. This means that each concept related to the Anthropocene narrative is analyzed separately, with results displayed on the same graph to facilitate keyword comparison. A total of one thousand points per keyword were simulated using a Monte Carlo (normal) procedure. The centroids of each cloud/keyword indicate the perceptional distance on a global scale, representing all languages combined for the period from 2017 to 2024.
Notably, the concepts of conservation and biodiversity appear to share a common conceptual framework that transcends cultures and languages, as their centroids are nearly aligned on the graph. On the other hand, sustainability is the most compact concept, as the area covered by its corresponding data cloud is significantly smaller than that of the other concepts. In contrast, the data cloud for climate change exhibits the highest standard deviation (SD) in PRIE, suggesting substantial variations in perceptions of this phenomenon across different languages and cultures.
Figure 5 illustrates the Efficient Frontier line for the matrix of
m x
k, categorized by each language
l. The y-axis represents the Expected PRIE, while the x-axis shows the standard deviation of PRIE (PRIE SD). Values that fall on the Efficient Frontier are considered optimal, as they provide the best possible Expected PRIE for a given level of PRIE SD. In contrast to the previous results shown in
Figure 4, the Efficient Frontier highlights the trade-off between Expected PRIE and PRIE SD. Values located below the frontier are deemed sub-optimal, as they do not yield sufficient Expected PRIE for the corresponding level of PRIE SD. Notably, Spanish and Russian exhibit the steepest slopes on the Efficient Frontier, suggesting that these languages have a significant variability in public interest across different keywords, regardless of the size of the interested population.
Finally,
Figure 6 presents a combination of the previous partial results, enhanced by the information entropy related to time, language, and keywords (refer to
Figure 2 in the Methods section for details). The noteworthy aspect of this exercise is our ability to calculate confidence intervals and assess the high/low probability ranges for memory failures or misunderstandings of flagged keywords. An awareness ratio of 1 indicates that a person or the public fully remembers or understands the meaning of all the keywords, while a value of 0 signifies a complete disconnection or distortion of their meanings.
There is no clear reason to believe that people who speak different languages have a natural tendency to forget or misunderstand information on Anthropocene-related concepts. However, several intangible factors might contribute to these differences. These factors could include the structure of the language itself, social and political conditions, varying perceptions of nature, the influence of dominant narratives, differences in literacy levels, and familiarity with online searching.
4. Discussion
This discussion serves as a self-reflective essay regarding our contributions. We want to emphasize that the most significant element in the paper’s title is the question mark that follows “Conservation Culturomics 2.0.” We assert that evidence supports the proposed methodology’s feasibility, particularly highlighting the shared foundations of functionalism in ecology and economics [
44].
Our findings confirm that ideas expressed years ago remain relevant today when examined from a fresh perspective. For example, there is growing global concern about issues related to the Anthropocene narrative, especially with regard to conservation [
54]. Support for conservation policies has been increasing worldwide. However, pro-conservation behaviors appear to be inconsistent and varied among cultures [
55]. The emergence of online search technologies and the development of culturomics and conservation culturomics scientific programs have revealed several classic methodological challenges in social science that previously hindered comprehensive assessments of people’s perceptions of conservation. Traditional approaches, such as participatory appraisal, ethnographic studies, focus groups, electronic and mail surveys, public comments, and knowledge co-production, have often proven to be time-consuming and resource-intensive [
56].
The transformation of Markowitz’s multi-asset optimization theory, combined with Information Entropy, is technically feasible and significantly enhances the traditional use of online public searches for issues and themes related to the Anthropocene narrative.
However, carefully evaluating our findings leads us to express reasonable reservations about their generalization. Although the rationale and computations seem solid and error-free, one could raise the question of the noisy historical, social-ecological environment in which the primary data are generated.
First, as the Introduction section emphasizes, there is a notable trend of reinterpreting longstanding ecological concepts to support new political agendas and policy proposals. For example, the roots of conservation can be traced back to influential figures like Shelford, as well as organizations such as The Ecological Society of America and The Nature Conservancy, which were established in 1915 and 1940, respectively. The idea of biodiversity is grounded in Tansley’s concept of ecosystems, while the term itself is a catchy adaptation of Lovejoy’s original notion of biological diversity [
57,
58]. Additionally, Arrhenius laid the groundwork for explaining global warming [
59] long before J. Hansen alerted Congress in 1988 to the dangers of climate change. The concept of sustainability, popularized by the commission led by Norwegian politician Gro Harlem Brundtland, has roots that reach back to ancient farming practices {60]. In an age where science has lost some of its esteemed status established during the Enlightenment, much public space is now available for science denial and conspiracy theories [
61,
62].
A significant source of error arises from translating English terms into other languages, even when using tools like Google Translation. For example, the official term ’sustainability’ in English can evoke both ’sustainable development’ and ’sustainable growth.’ This conceptual distinction, often unnoticed in public discourse, is frequently and mistakenly regarded as equivalent [
63]. In French, the term ’développement durable’ dominates public discussions, while in German, the distinction between ’Biodiversität’ and ’Artenvielfalt’ creates confusion among the public, as evidenced by Eurobarometer polls.
The methodology uses historical data. The interesting part of this exercise is that it might uncover existing comparative deficits between languages/cultures. Further, it could help create scenarios using sensitivity analysis techniques. It also helps to understand the cultural/linguistic compromise one can discern in the foundational conceptual model of IPBES [
64], where it is flagrant that two cultural archetypes coexist. “Quality of life – Human well-being – Living in harmony with nature” or “Nature’s contributions to people – Nature gift – Ecosystem goods and services” show different worldviews that unavoidably will reflect public opinion and people’s searches for information online.
Focusing exclusively on broadcast news validates the information presented, but it overlooks a vast amount of content, particularly the opinions that thrive on social media platforms like Twitter, Instagram, and Facebook [
66]. While these platforms have significantly expanded the reach of awareness-raising campaigns and public discussions about conservation, they also have a negative side. They contribute to the spread of rumors, distorted information, and anti-science ideologies. Finding the balance or the filtering of authoritative information in such platforms will increase the mass of data and the accuracy of our methodology.