How to fix transparency for online platforms

(Status: Draft)

Online platforms such as social networking sites (also called intermediaries) present regulators with a guardian knot that seems to have no winning solution. States are bound to safeguard citizens, public debate and economic freedoms; they therefore cannot simply implement harsh measures favoring one of the three sides.

Regulating platforms

At the same time, the size of platform corporations and growing number of evident problematic anecdotes make it hard to remain inactive. So governments reach for strategies that are often rooted in their own regulatory history. Among the nuanced approaches, three major models are seriously considered:

  • Full regulation (as in: child pornography, libel, copyright)
    Platforms are bound immediately by law and must implement visibility actions - e.g. hiding, showing, removing, emphasizing, de-emphasizing of content.

  • Self-regulation (as in: community standards)
    Platforms implement their own guidelines for handling content and exercise autonomous control.

  • Regulated self-regulation (as in: German hate speech regulation)
    Platforms are bound to implement content moderation following a state-mandated framework. The actual execution of content moderation is performed by the platform with some (often considerable) leeway in case-by-case decisions.

Each of these approaches has its drawbacks, since it is impossible to simultaneously offer strong protection and have a meaningful democratic legitimization. That's why indirect strategies have found substantial interest.

Indirect regulation

  • Media literacy
    Citizens play a pivotal multiplicative role in the regulatory diagram. Users with strong autonomy and cognitive resilience would free platforms as well as regulators from the burden of having to protect them. Susceptible individuals, on the other hand, would encourange stronger measures. The challenge with media literacy is that it is created neither quickly nor easily, and potential gains may be modest.

  • Transparency
    Transparency mandates that force the disclosure of e.g. modes of operation (how does an algorithm work) and/or platform data are an interesting regulatory approach, because they operate extremely indirect. Transparency by itself does not alleviate any problems and the impact on platforms is fairly clear and limited. What transparency does is bind platforms to societal norms by making it harder to obscure platform-internal events. It therefore enables societal actors to exert pressure on platforms, leaving the state out of the equation. That makes transparency one of the most promising approaches right now.

A transparent black box?

At first glance, it seems obvious that increased transparency - for example laying open the precise mechanisms of algorithms - would help everyone. Users could better understand how the platform affects them, society could exert vigilant supervision, public debate could reflect upon the overall impact and platforms would benefit from an increase in trust.

The past has shown, however, that platforms are vigorously opposed to such ideas - for reasons both good and hollow. Open sourcing a search algorithm, for example, has severe unintended consequences: Competitors could copy it, affecting enterpreneurial freedom. But worse, malevolent actors could make use of the information to create optimized attacks aimed at distorting the platform. Another issue is that the algorithm itself may be highly dynamic; its effects cannot be assessed without looking at real inputs - rendering the transparency inefficient.

On the hollow side, platforms of course will want to avoid any kind of public scrutiny that could cause reputational damage. Even before the Cambridge Analytica-fiasco, platform corporations have been acutely aware of the dangerous international mix of political demands that are levied against them, and have reacted with political and legal hardening.

The fix

So what's my proposal to fix this mess? My suggestion is pretty simple:

Platforms, don't create intransparent secret API access for elite institutions.
Instead, whitelist universities for browser-based access. Let them see all the public pages in any quantity they want.

Why do I think this is a good idea? Let me explain. Current attempts to enable transparency try to circumvent the dangers of fully opening up: They try to limit the amount of personal data, for example, that is exposed and may be misused, as was the case with Cambridge Analytica. So far, they also rely on vetting processes that grant special access to individuals or institutions. Social Science One is one example of an initiative that sought to bring scientists and Facebook together to enable large-scale research on the platform.

Such attempts are noble, but have been plagued by numerous problems, as researchers such as Axel Bruns and Deen Freelon have shown. Self-censorship may prevent scholars from pursuing challenging questions; institutional matthew effects disadvantage non-US institutions, and the data that is provided excludes some publicly available information, such as the names of commenting users.

My proposal makes all of this much simpler. Instead of creating a separate, complex and again intransparent means of access, platforms should simply take what they already have - the plain display of information that is visible to everyone in the world in their browser - and give universities unrestricted access to it (for reading, not posting). So far, technical measures prevent anyone from opening too many pages on Facebook in a short time frame, for example. After so many pages, we see the well-known captchas to make sure that we "are human".

The beauty of this proposal is that it does not create any additional information channels, ensuring that no additional privacy risks are created. Any information that would be accessible through such unencumbered access would also be available through, for example, a wide-scale network of individual computers. If Facebook were to drop automated throttling mechanisms for university computers, researchers could measure precisely what they are interested in: The publicly visible ecosystem of public information and public debate. At the same time, they would not gain any access that was not existent before.

It remains important to investigate and discuss research questions that would not be covered by such a measure, but I expect that it would take a lot of pressure out of the debate, and also severely empower civic society's capability of monitoring what is going on online.

• Category notes • Tags apis intermediaries