Ethics in Alternative Social Media Research: A Forum
Earlier this month, news broke that an anonymous research team from the University of Zurich had conducted an experiment on Reddit’s “Change My View” subreddit. Briefly, the experiment involved using generative AI to persuade people to change their minds about topics. The researchers revealed their experiment after the fact. There was no informed consent, and the moderators of the “Change My View” (CMV) subreddit were not informed about the experiment until after it was completed. This led to massive condemnation of the experiment as a violation of research ethics: see here, here, and here, for some of the reporting. Hours ago, the CMV moderators posted a note saying the research team has apologized.
Rather than recapitulate the criticism of the University of Zurich AI persuasion experiment on Reddit, here on the Network of Alternative Social Media Researchers blog, we’re going to hold a forum-style conversation. The topic? Lessons we alternative social media researchers can draw from the Zurich experiment on Reddit. Several members of the ASM email list have agreed to take turns addressing this topic. Each person will both respond to the broad question of ASM research ethics as well as to previous parts of the conversation.
Robert W. Gehl: Pay Up
When I heard about this experiment, my first thought was that this is merely a peek through the curtain. For all their faults, the Zurich researchers at least revealed what they did. No doubt the researchers thought that the value of their work would outweigh the discomfort of the Redditors under study – that after a bit of shock, the Redditors would agree that the experiment was noble and of course could not have been done with informed consent, so that basic tenet of research was unneccessary. The researchers probably believed their findings would ultimately be welcomed by the Change My View users – and possibly by the rest of us.
The researchers were very, very wrong, of course. But they at least revealed their work.
I’m afraid that for every Zurich team that admits they’ve experimented on social media users, there are ten thousand experiments conducted by marketers, governments, and the social media companies themselves: attempts to see what types of posts get the most interaction, what keywords work best, the best photos and videos to attach, the best ways to customize posts based on psychographics. Few of us are privy to them – they are veiled, locked behind trade secrets and non-disclosure agreements, in service to the predominant political economy of corporate social media: surveillance capitalism (or, as Aral Balkan calls it, “people farming.”) The people doing these research projects learned their lesson from the fallout of the 2015 “emotional contagion” study on Facebook: it’s best to not reveal what you’re up to.
These forms of manipulation are a big reason why activists developed non-corporate alternative social media, and why people have shifted to posting to places such as the fediverse. This obviously has major implications for how we conduct research on alternative social media.
No doubt others in this forum will point to many ways to ethically engage in social research in alternative social media. For my part, I will argue that an examination of the political economy of non-corporate social media – particularly the fediverse – suggests an ethical principle that may not be obvious at first: we need to pay up.
Here’s why. Most of the instances on the fediverse are run on a not-for-profit basis. As I found in my research, the models vary, from informal, “toss me a few dollars to help out” approaches to formally organized non-profits. In all of these cases, people are paying for hosting and bandwidth out-of-pocket – they aren’t funding them by selling ads or selling user data.
In light of that, I would argue that ethical alternative social media research often should include funding or material support provided to the communities under study. For example, if someone has a large grant to study ASM, part of the grant budget should include direct payments to the affected instances. This is especially the case when the research involves bandwidth-heavy tools, like extensive use of APIs. Other forms of support could be help with moderation or co-designed research to solve specific problems faced by ASM admins and members.
Indeed, to circle back to the Zurich Reddit manipulation study, the apology from the researchers to the community included an offer of help to detect AI-generated content. Imagine if the researchers had offered that from the outset. What might have been different?
There’s much more to say, so I will turn things over to our next forum participant, Marc Damie, who is mapping relations between fediverse instances.
Marc Damie: We Need a Fediverse Research API
To understand my answer, you should know a few things about my work. As a PhD student designing privacy-preserving technologies, I want to develop protocols for the Fediverse, which requires understanding its structure. To achieve this, I created simple crawlers to “take pictures” of the Fediverse; i.e., graphs representing interactions between servers. The resulting dataset is already available online, and a paper is on its way.
This Reddit controversy naturally attracted my attention as it shows us what we should not do. However, this controversy is also an extreme example: the researchers actively interferred with real-world human-beings, while most research on social media (like mine) consists in passive data gathering. For example, some researchers have created the “Webis Mastodon corpus”: a corpus of 700M Mastodon posts to foster research in information retrieval. Many Mastodon users were unhappy to learn their posts (potentially containing sensitive information) had been included in a dataset without their approval.
While the Reddit case is a useful starting point, the Webis corpus is more relevant to our discussion. Unlike the clearly unethical study on the “Change My View” subreddit, the Webis corpus occupies a gray area, and it involves alternative social media. This raised a question that has troubled me since beginning my research: Is my work fully ethical?
Before I starting crawling, I spent a year consulting legal teams from my two research institutes. We adopted strict ethical practices: querying only public APIs, slow data scraping, respecting crawler policies, open-sourcing code, and publishing only aggregated data (no personal information). After three months of crawling, I’ve received no complaints; a Mastodon developer even “starred” my crawler’s GitHub repo!
Despite my precautions, ethical ambiguity remains because academia lacks clear guidelines for research on ASM. Existing frameworks for centralized platforms don’t easily apply.
Robert’s point about payment is interesting. While I understand the motivation, I wouldn’t prioritize it yet. Practical implementation seems challenging: should developers be compensated? How to split the payment between instances? Should we pay instances a fixed price independently of their location?
However, it leads me to two follow-up considerations:
- What do researchers bring to the Fediverse? Funding is one possible contribution, but Fediverse actors might also value the research outcomes. For example, my work could improve the spam detection. This aligns with Robert’s proposal on “co-designed research”. Promoting the potential research output is common for research in collaboration with companies, and it could be imported to research in collaboration with ASM entities.
- How should we handle research requiring intensive API usage? Some of my crawlers need (moderately) intensive API calls to gather the necessary data. Usually, they gather some data and aggregate them. However, if I had a dedicated API point providing directly the aggregated data, the computation would take a second on the server and a single API call. Research API have (historically) existed on Twitter and were vital for scientific research. I believe the Fediverse may need a research API. I hope that my research may partially demonstrate the interest of such API.
While creating a research API is straightforward on centralized social media, the decentralized nature of ASM introduces technical challenges (that we can reasonably overcome). For such an API to succeed, academic institutions must contribute both financially and technically. This development would also present an opportunity to establish a code of conduct for ASM research.
For example, a Fediverse research API could formally gather consent from Fediverse instances. Currently, researchers rely on public APIs, assuming by default that instances consent to data processing. A dedicated research API would allow instances to actively opt in (or out) of research studies, finally moving ASM researchers out of the ethical “gray area.”
Mareike Lisker: Pay Attention to Instance Norms
Like Marc Damie, I am also interested in the “gray areas.” And, like Marc, I am researching alternative social media as part of my PhD studies, so the question of how to ethically research and collect data in the Fediverse (specifically Mastodon) also applies to me.
Recently, I co-authored a review of the data practices of 29 studies that collected data from Mastodon. The starting point for this review was the fact that several instances explicitly prohibit data gathering in their rules and policies, which is barely reflected in current research. Only a few studies acknowledge, let alone adhere to, the instances’ rules on data gathering.
Although Marc very likely used the term “passive” to contrast it with the intrusive Zurich experiment and express that gathering data on social media does not involve any intervention, I want to challenge the notion that gathering social media data is passive, as this conceals the fact that using an API to gather data is a very active action. The idea of passivity can contribute to the perception that researchers are uninvolved in and unaccountable to the communities they study, in the same manner that using the Mastodon API can flatten the social complexity of networks and alienate researchers from the communities they study.
Another distinction that I would like to bring into the discussion is between legal and ethical considerations. Legally, besides the GDPR that applies to anybody working with data in the EU, the rules and policies of a Mastodon instance are binding only for its registered users. Unregistered users and users from other instances do not have to commit to them. Unless a researcher is by chance registered on an instance from which they gather data, and that instance prohibited it in their policy documents, there is no legal violation. The federated nature of Mastodon only adds complexity to this affair, since when a toot is boosted on another instance, the originating instance’s rules no longer apply.
Ethically, however, researchers—or as a matter of fact the institutional ethics boards overseeing the research—could still feel obliged to individually review the terms and policies of each instance from which they intend to collect data.
Finally, to shift the focus from the philosophical to a more hands-on perspective: I find Rob’s proposal to include financial, material, time or knowledge/co-design support promising, and I will consider how it could be implemented in my project. It might be a worthwhile endeavor to formulate recommendations that take various aspects, such as who and how to “compensate” (developers, admins, moderators, users), possibly dependent on location or size of the server, into consideration.
I support Marc’s idea of a Research API. In our paper, we proposed the technical idea of formalizing instance rules and policies to make them machine-readable and versioned, so that they can be referred to in research. This could be incorporated into a Research API.
What most research on (alternative) social media seem to have in common is that it can happen entirely covertly by default. Once research is published, it has already been conducted and the data gathered. This inherent covertness is something we as researchers need to account for.
Comments
For each of these posts, we will also post to Mastodon. If you have a fediverse account and reply to the ASM Network Mastodon post, that shows up as a comment on this blog unless you change your privacy settings to followers-only or DM. Content warnings will work. You can delete your comment by deleting it through Mastodon.
Reply through Fediverse