The CITRIS Policy Lab, Berkeley Center for Law and Technology, and Center for Long-Term Cybersecurity are co-hosting a series of by-invitation workshops on the ethical and legal implications of platform data scraping, data licensing and sharing agreements, and compelled platform data disclosures for research, oversight, and commercial purposes.
Third-party scraping of platform data for research and commercial purposes is increasingly raising ethical and legal uncertainties. In 2021, researchers at the NYU Ad Observatory had their Facebook accounts blocked by Meta for scraping political ads provided by willing research participants. The following year, the Ninth Circuit ruled that hiQ Labs had violated LinkedIn’s terms of service by scraping public profiles, yet prohibited LinkedIn from selectively blocking potential competitors from accessing publicly available data via its platform on grounds that doing so constitutes “unfair competition under California Law.” In 2023, Elon Musk’s X Corp filed a lawsuit against four “John Does” in Texas for “unlawfully scraping data” and simultaneously implemented technical barriers on the platform to further mitigate third-party scraping.
Participants will examine the concept of fair breach for research and commercial scraping activities in relation to platforms’ terms of service, data protection laws, and intellectual property rights. The workshop will also facilitate discussions on appropriate definitions for researcher, public interest research, and allowable commercial purposes.
In response to the ethical and legal uncertainty involved with data scraping, researchers have proposed alternative approaches to data collection and management. In particular, data licensing has been gaining traction among the developer community. However, key questions such as who has the right to license and share data and what types of data could be licensed remain unanswered. This workshop will foster discussion from researchers, practitioners, and policymakers to map out opportunities and strategies to operationalize data licensing.
Social media platforms hold vast amounts of data that are valuable for enabling public-interest research, supporting evidence-based policy interventions, and spurring new industries. While some platforms have made data available to third parties through public application programming interfaces (APIs) or partnerships, these arrangements are becoming increasingly tenuous. In light of this, legislation has been proposed in the United States, such as the Platform Accountability and Transparency Act, and passed in the European Union, such as the Digital Services Act, that compel platforms to make certain types of data available to third parties for research and oversight purposes.
While these efforts are promising, compelled data disclosures are receiving pushback. In Washington Post v. McManus, a Maryland Law requiring platforms to maintain records of all political ads was overturned on the grounds that it violated the platforms’ First Amendment rights. Others have raised concerns that compelled data disclosure may be in conflict with established data protection laws. This workshop will explore these ethical and legal quandaries among researchers, practitioners, and policymakers to map out opportunities and strategies to support third-party access to platform data.