Mis and disinformation has surfaced as a major research topic over the past five years. Emerging fields can be rather parochial, however, with minimal cross-fertilization and dialogue across disciplines. Information disorder as a field of study has followed that pattern. Much of the early research focused on political movements and elections—particularly the 2016 elections—and with much of the work conducted in the global North. The challenge of understanding and combating mis- and disinformation in an organized and expanded fashion was a key focus of the 2020 Exploring Media Ecosystems conference.
Now, more than two years and one viral pandemic later it is clear that mis- and disinformation goes far beyond politics and far beyond borders. It has become a major challenge—and a risk—for organizations working to improve health and well-being around the world. Covid-19, which spread globally with astonishing speed, illustrates the importance of cross-border research. At the same time, the field of mis- and disinformation studies has grown in response to both the uses and misuses of digital communication tools. Back in mid 2020, with support from the Bill and Melinda Gates Foundation, Media Ecosystems Analysis Group (MEAG) launched the Health Discourse Research Initiative (HDRI) with a two-year goal of building research collaborations across institutions and global geographies. The resulting collaborations have taken a multidisciplinary approach to analyzing mis- and disinformation phenomena within global health topic areas and are helping to inform assumptions about how mis- and disinformation impacts global health and organizations working in this space.
On July 12, 2022, MEAG hosted at MIT’s Samberg Center a one-day symposium to present research and results from key HDRI-funded collaborators. Presentations ranged from the use and abuse of e-prints to further conspiracy theories on discussion forums, analysis of link sharing behavior in tracking information disorder, the role of imagery in the spread of misinformation, and the role of influencers on the global stage. The day ended with wide-ranging discussions about lessons learned and lessons yet to be learned. Let’s dive more deeply into the research and the discussion.
Satrio Yudhoatmojo began by discussing research titled, Understanding the Use of e-Prints on Reddit and 4Chan conducted by himself and members of the International Data-driven Research for Advanced Modeling and Analysis Lab (iDRAMA Lab) at Binghamton University, led by co-founder Jeremy Blackburn.
Eprint/preprint servers provide online access to knowledge (prior to peer-review), often without subscription fees. While e-prints can be helpful for experts in the field to see what new research is emerging, they are not as reliable as peer-reviewed research and can often be retracted if they fail peer review. Using datasets from a variety of sources including preprint servers, Yudhoatmojo and his team characterized the presence of e-prints on Reddit, identified their types and similarities, performed quantitative analysis of top threads, and in general explored how e-prints and preprints are used within Web communities.
The group found a substantial difference between the content of e-prints and the online conversation about that content, which points to the use of preprints to propagate or discuss conspiracy theories and/or extremist ideologies. The group concludes that, given the explosion of preprints growing out of the COVID pandemic, the scientific community needs to take more care to ensure that pre-peer review information is as accurate as possible before being made public.
Gianluca Stringhini of Boston University then discussed his group’s research into tracking image-based mis- and disinformation.
Images play a large role in online communication and are frequently used to spread misinformation. However, processing and analyzing images is more challenging than analyzing text for a number of reasons. Stringhini and colleagues have developed an analysis pipeline that involves collecting data from news outlets (including Twitter, Reddit and 4chan), applying perceptual hashing, database indexing, and clustering to prepare images for further analysis. The group used Snopes and PesaCheck to explore how specific images moved around the Web, applied manual annotation to better understand narratives that were arising, then applied these methods to image-based COVID-19 misinformation, including Bill Gates conspiracies, false cures, and how political leanings influenced imagery.
Stringhini’s research suggests that misinformation images are not reshared more frequently than factually correct ones, which contradicts previous research (Vosoughi, Roy & Aral, 2018). Could this be because previous research is biased toward information that is perceived to be more worthy of fact checking? One thing that seems clear in this work is that misinformation imagery is more long lived than “regular, “accurate” or factually correct imagery.
In general, Stringhini et. al. found it challenging to apply automated techniques to content moderation because images are often modified or altered not with the intent to mislead but for satirical or ironic purposes. Marked cultural nuances also influence how imagery is shared and altered. Looking at the behavior of actors who spread such imagery may be helpful in distinguishing between original/unaltered images versus those that are manipulated for any number of reasons.
What improvements are needed to turn this image analysis pipeline from a research tool into a more general purpose tool for fact checkers? Stringhini and team are considering making it a web-based service, which in turn raises questions of how to maintain such a web-based service over the long-term.
Next, Fabio Giglietto presented research conducted by his group at the University of Urbino Carlo Bo on “coordinated link sharing behavior in Nigeria and India.” Coordinated Link Sharing Behavior (CLSB) happens when a group of user accounts — either controlled by a single user, or independent actors working in consort — share the same content over a short period of time. CooRnet (a package within the R statistical programming language) detects this behavior, tracks the accounts, and extracts links to launch a CooRnet iteration that can then be analyzed. Giglietto et. al. started with a curated list of 25,000 URLs (the IFCN Coronavirus Alliance Dataset), which had been flagged as false by Facebook fact checkers, and identified 98 coordinated networks within this list of URLs. Despite the perceived global focus, upon closer examination the researchers found that 49% of the stories and 34 of the networks originated in the United States.
In Nigeria, many different news websites use a similar template to certify the quality of a given news outlet. However, despite appearing legitimate, much of the content on Nigerian news sites shows a pattern of copying and pasting. Religion is an important factor on Hausa language sites in particular, and this played an important role in allowing the researchers to detect problematic networks in general.
In contrast to Nigerian websites, Indian websites tend to be more anonymous. However, because more accounts were sharing links related to politics and political media, it was easier to distinguish information arising from legitimate sources as opposed to illegitimate sources. Link sharing originated in Nepal was often related to medical false cures.
Professor Giglietto and team concluded that overall, CLSB is a solid strategy for studying the attention economy. The group looks at behavior, not content, but judging from what they see in their analyses, coordinated link sharing networks are surprisingly deep relative to the number of initial URLs. One very large challenge in this work was involving collaborators from the global South.
Closing the morning session of research findings, Jonathan Corpus Ong and Nora Suren from the University of Massachusetts at Amherst presented their work on Covid Influencers in a Global Context. Their work examines the possibilities of digital activism and political accountability in countries with populist leadership, focusing on the Philippines, Indonesia and Turkey. Ong, Suren and colleagues concluded that on the whole, populist leaders in these three countries (unlike Trump in the US and Bolsonaro in Brazil), were not adversarial towards their respective health ministers. However, the research did uncover evidence of the science/scientific data becoming politicized, and the COVID crisis presenting an opportunity to target dissenters, resulting in setbacks for media freedom.
In the Philippines the researchers found evidence of a narrative in which undisciplined citizens and Western countries were being blamed for vaccine hoarding. In addition, they saw online contestation between supporters of (former President) Duterte and anti-government activists, and that some amount of xenophobia was directed toward Chinese workers. In Indonesia, the researchers saw blame directed at undisciplined citizens and the poor; however, influencers that were enlisted for the purposes of government PR and propaganda limited the dissent to some degree. Turkey experienced a social media backlash aimed at Turkish tourism, exposure of progressive discourses that pushed back on medical populism, dissent (on social media) that tapped into rising nationalist sentiment, and xenophobia (racist expressions) that proliferated in comparisons between Turkish tourism workers.
Outcomes of this work at UMass included consultations with Facebook to flag content found via the research, presentation of the Turkish case study at the International Communication Association (ICA) conference, a book chapter contributed to an edited volume about the “Influence Industry”, a new project called Survey of Spiritual Entrepreneurs on Social Media, and ongoing collaborations with various media partners.
Special guest Jonathan Nelson of Constella Intelligence offered a short presentation about “LATAM & Brasil: Trends and Vulnerabilities in Digital Conversations Related to COVID-19 and Vaccines”, a project that attempts to define the scale and scope of information pollution in Latin America.
Researchers at Constella identified keywords in 4,000 relevant conversations used to spread information pollution. They performed a qualitative analysis and tagged 1.4% of activity as health information pollution. These included major clusters of anti-vaccine movements and alternative treatments, pandemic conspiracy theories and rejection of government measures, news about Covid-19 being created in China, anti-lockdown measures, and vaccines side effects. Disordered information content linked to possible side effects of vaccines and perceived benefits of alternative Covid-19 treatments comprised 40% of the results.
Further analysis indicated high activity users account for only 1% of all users but are involved in 25% of conversations showing evidence of information pollution. Half of these users are found in the “Covid deniers community in Spanish” group, but a high number of influencers from outside the region are also present. Similarly, Caribbean countries are most influenced by information pollution imported from other countries, mainly from the US, Spain, and the UK. The Constella group also noted that 16% of disordered information influencers had been suspended within two months of the study. Finally, YouTube was also the most shared domain.
In the afternoon we moved from formal research presentations to a more informal exploration of the field. Facilitated by Ethan Zuckerman, this set of discussions was guided by two questions: What have we learned about mis/disinformation and what do we need to learn going forward?
What have we learned?
First, Ethan pointed out that there are two overarching discourses about mis/disinformation. One revolves around the enormous amount of public interest in mis/disinformation and the other involves a global media discourse looking for a simple diagnosis for the complicated problems of information disorder. Ethan asked, “Can we seriously consider that both of these audiences matter? We want to produce important scholarly work, but we also want to engage wider audiences. That said, what do we feel that we know with a certain amount of certainty about disinformation and information disorder, that we could share with wider, more public audiences? For example, we know from our work with Yochai Benkler at Harvard that information disorder is more of a problem of amplification than it is a problem of digital spread. Let’s go around the room…”
Fabio Giglietto: based on traditional findings about how influence works in tandem with public opinion, influence nowadays passes through social media. We have a tendency to want to look at official sources of information, but we are suggesting that increasingly the spaces of influence are sports, entertainment, and memes that turn out to be politically influential speech.
Jeremy Blackburn: What do we feel comfortable asserting about those meme spaces? Are we quite certain that these problems are not just a right wing problem? These weird spaces exist on the left and are becoming increasingly radicalized. For example, there are Tankees, extreme communists — the 4chan of the left. These groups were banned from Reddit at onset of the Ukraine war and are authoritarian, genocide deniers. They also claim to be anti fascist.
Ethan Zuckerman: In a world where we all started studying Twitter because it is the model organism of social media, we know a decent amount about the space under that lamp post. However, actors are doing coordination work on platforms we have very little access to. Who has thoughts about this?
Kiran Garimella: Platforms like WhatsApp are big but not fringe, there’s no easy way to study them. To take things like WhatsApp seriously we need ways of studying them. What we know about WhatsApp is that insider networks are very common and insider networks may have different dynamics.
Jonathan Nelson: Zeynep Tufekci’s comment about “politics as a football match” resonates with me. The way people embrace political labels and identities and their desire to obliterate the enemy; the idea of embracing identity and trying to assert those divides. Perhaps we should think about media outreach from the get-go instead of having it as an afterthought. So misinformation becomes a fair tactic to demonize “the other side.” Misinformation can be about signaling; if you amplify something that you know is misinformation, you are making a costly signal.
Ethan then invited attendees what they would want to communicate to a broad audience about how information disorder works or doesn’t work.
Not all influence is bad, noted Catherine D'Ignazio. Take for example advocates who try to influence the news media to inject feminist critique into coverage of gender-based violence. Fear and mistrust is something that is absolutely worth fighting in narrative, but we should monitor the 1% of influencers putting out the 25% of content.
Are we seeing that 1% of influencers change over time? Jonathan Nelson pointed out that all nation states/groups/individuals use narratives and messaging in service of strategic objectives. Clearly after years of analyzing these trends, we can conclude there are subversive tactics to amplify certain narratives — we need to ask why are these tactics being employed and to what end? Ultimately, there are always influencers in social media, but if the influencers all believe the same thing, there might be something wrong with that ecosystem.
Many questions about misinformation center on authenticity/inauthenticity. Mainstream media has been worried about the idea that there is too much spam on social networks. Satrio Yudhoatmojo pointed to questions of proportionality and how we answer questions about disordered information — what percentage of content is authentic and inauthentic? When users on Reddit/4chan share the preprint archives, there are two hypotheses as to their intentions: they are part of an organized disinformation campaign or they are genuinely deluded and trying to understand the research. In this case, sharing misinformation may be the authentic information.
So, what is the proportionality of intentional to deluded? Based on his research, Gianluca Stringhini concludes that the higher proportion are the deluded. But the underlying question is about where the seeds of delusion were planted. (Perhaps QAnon? Or Pizzagate?) Assuming that most disinformation is inauthentic and carried out by bots is oversimplification. Most disinformation is posted by real people, but these people may be following narratives planted by malicious actors. And there are known State-sponsored troll accounts that are jumping on existing narratives and then deliberately acting to polarize discussion. The reason disinformation or disordered information can be so powerful is that most of the people spreading it are not bad actors, but are simply spreading something that they believe is true or want to believe is true. In some cases, it is organic misinformation; in others the misinformation is a seed planted by a bad actor and spread via people who genuinely believe it.
Thinking about questions of authenticity, Fabio Giglietto points out that a certain degree of inauthenticity will always be necessary for influence to work. Traditional advertising does not work as well as it once did because we now disregard it. People don’t notice advertisements or they figure out how to avoid them, and young people don’t click through. If this is the case, how do you influence people? One example in the case of young people is TikTok, where people can be influenced by sources they do not expect to be influenced by.
Final thoughts about how information disorder works or doesn’t work? Emily Ndulue notes that the attention economy is a highly impactful factor that will increase the scope of the problem.
One thing we also know, said Justin Arenstein, is that a lot of the dynamics being shared here aren’t replicable on “dark social.” Eno Darkwa pointed out that information (and mis/disinformation) has a huge impact on individual, family and community dynamics. Rahul Bhargava noted that knowledge production is also influenced by information disorder, in other words, tools and understanding won’t remain static. Finally, Fernando Bermejo noted that media effects grow out of two main factors: whether media provides a single view of the world (and if it does you should be very worried), and whether media has a direct impact over people/individuals. Nowadays we are worried because we are presented with so many different versions of the world and we cannot tell which is the right one. The reality is, societies have for decades worried about the influence of powerful media. Now, with the rise of mis/disinformation these same worries have simply shifted to the other side of the equation.
What do we need to learn?
Closing out the day, Rob Faris and Justin Arenstein led discussions about what we have yet to learn when it comes to mis and disinformation.
Rob hypothesized that our underlying media economy is what allows disinformation to work at scale, and a coalition model is needed to take a more expansive approach in understanding disinformation. He argued that information processing and knowledge creation are fundamentally a social phenomenon, and these “epistemic systems” are an area that needs a lot more attention and research.
People become engrossed in specific epistemic systems — they believe certain sources and disbelieve others. We saw an ironic example video of Donald Trump not being able to say the word “vaccine” to his followers without alienating them. He had helped to create an epistemic system so distrusting of mainstream media that even discussing vaccines risked alienating his base.
The epistemic system Rob described consists of three layers. First, there is a social layer where people opt in to the system to figure out what’s going on in the world, to ingest information and to decide what to propagate further. Second, there is a narrative layer made up of a tapestry of different ideas, theories, articles and stories put out by the epistemic system. Referring back to the example, Donald Trump is “driving the bus” and has a long list of talking points that are popular within and are replicated by his group. Finally, underpinning the top two layers is an operating system layer that contains the rules and norms. This layer is not immediately apparent but it shapes the way the entire system operates and influences which voices and narratives are authoritative, which are to be trusted, which lines are accepted or rejected, and who is a part of the system or not.
To understand misinformation we need to understand how it becomes embedded in these epistemic systems. There's a big difference in misinformation that is socially and politically salient and that becomes fixed in the social identity of how people see themselves and how they view the world. We also need to differentiate between when misinformation aligns or does not align with other pieces of the collectively curated narrative fabric. For example, it’s easier to address casual or incidental misinformation because it doesn’t tend to line up with identity (pieces of misinformation that are part of who you are as person). If we don’t understand how misinformation becomes embedded in these epistemic systems, we will have a hard time understanding them.
So, how do we view mis and disinformation through a social lens? What do we need? What are the approaches/methods/data to study this operating system? Rob offered the case study approach as a promising one, but argued that we need to go farther than case studies, and that we need more research in addition to more longitudinal data such as Twitter feeds over many years.
Justin Arenstein (of Code for Africa) thinks we have approached information disorder as simplistically as we can. In his work, which he approaches very differently than many other attendees at the symposium, he breaks information disorder into distinct problem types. First, there’s the “whack-a-mole” fact checking journalistic approach where you find something misleading, debunk it, and put it out there and hopefully that solves the problem (but it doesn’t).
Second, once you’ve whacked the mole and found the spot where there’s questionable information, you then try to understand how it’s all linked together. This requires a different approach (for example, a data science or forensic research approach as opposed to a journalistic approach). Then, using a variety of tools (including Media Cloud) you attempt to map the networks and understand the meta narratives that might make people more susceptible to the original misleading claims, but also make them susceptible to the networks that connect these narratives together. Justin noted that his group is doing something that very few people are and that is working with organizations like the Global Disinformation Index to understand the underlying economy that allows disinformation structures to work at scale. This collection of organizations is working to understand how, for example, programmatic advertising supports some of this misleading content.
Since the problem of information disorder is too large for one organization to solve, Justin’s group has created an observatory model—a Disinformation Observatory— that pulls in non-profits, civic watchdogs, etc. alongside for-profit operators as a place where non-public information can be safely shared. They’ve also created an international network of fact checking organizations.
Justin’s group has found that the seeders of disinformation content (usually in the form of infotainment) are often paid-for-hire operators who are paid — either by large, global state operators or domestic interest groups — to grow an audience. Justin’s group has also identified many specific social media accounts that are pushing narratives from opposing ideological /political camps. These networks are almost always regional or hyperlocal but have surprisingly global connections, and they are adept at staying ahead of authorities.
Most recently there is growing collaboration in these disinformation networks around topics such as Covid-19, Ukraine, racism, pan-africanism, anti-colonialism etc. pushing narratives that are deliberately designed to create dissent and conflict so another group — religious, economic or political — can step in with a solution, all with the goal of capture, be it a procurement system, a government, or an information consumption agenda.
What is missing that would allow us to address these problems? Justin noted that first, there are almost no actor watch lists. His group is now working with various partners in society to map these actors in various African countries and their social identifiers, meaning who is legitimate and who is not. Second, there are very few vernacular lexicons and databases in Africa that allow for tracking the content or the narratives, so the group is working to map the vernacular in order to understand how it translates into street slang and then into global languages. Third, there are no analysis or monitoring tools. For example, TikTok represents a huge hole because it’s hard to do network analysis or formal fact checking on it. Significant human resources are also needed in order to monitor dark social in a useful way. Fourth, we need to build insights into cross pollination between the platforms. Fifth, we have yet to find a way to archive, at scale, original evidence and links. Finally, we need better resources for sharing research findings. There is currently too little focus on turning some of the most promising research methodologies and the tooling into practical, scalable resources that can be used in the field. Justin’s group is working to address these missing pieces, but progress is not as quick as they would like it to be.
In conclusion, it was a day full of information and discussion. The research presented at the symposium demonstrated how different methodological approaches can help us understand and fight mis- and disinformation, and showed that only complex approaches will work against such global and multidimensional phenomena.The afternoon conversations were a illustration of the need to think systemically, and to continue the learning process required to work towards a healthy information ecosystem online.
Fernando Bermejo, the symposium organizer noted that MEAG is fully planning to continue to organize research in this field and to find and bring collaborators together. We hope to make this symposium an annual event, and we hope to see this group of mis and disinformation researchers as well as new collaborators next year to review our progress in the field.