A question that has been bothering me since the start of the Fellow Program is how communication scientists can handle the disclosure of data in a sensible way. The advantages of open data are obvious: Reproducibility and reusability of already processed data increase the quality of research and save resources. The demand for open data that flows over to us from the sciences and psychology is therefore understandable and the desire for disclosure is generally very welcome. In my experience, however, especially in our small discipline, ownership thinking („I collected it, thus this is MY data“), the fear of giving away competitive advantages and not wanting to be seen in the cards, is particularly strong. I can understand that, I have been socialized in this way, too.
But to accuse communication science of simply insisting on old ways of thinking and a lack of innovative power would be too short-sighted. In fact, an “AG Forschungsdaten” in the DGPuK has been dealing with this topic since last year. So far, they have communicated a general confession to open data, but it is not very concrete yet, because they didn´t release any best practice examples or guideline by now (but that is certainly in planning). I think it is especially necessary to take a closer look to the specific kinds of data we process. Communication science data is very diverse and distinct from other social sciences. There are special challenges that cannot always be solved by long-term planning, like, for example, to obtain an informed consent from respondents to re-use and publish data in advance.
At the intersection at which I am currently working, i.e. the social media analysis of digital data traces, this obtaining of informed consent is unrealistic and would entail very strong limitations of representativeness as well as a high expenditure of resources. Moreover, a re-publication of social media posts is diametrically opposed to the “right to be forgotten”, and even anonymization strategies only help to a limited extent, since they have too often proved useless. Not to mention that there is already an ethical debate about whether public posts by private users on Facebook, Instagram, Twitter and in blogs should be investigated at all without the users consent (boyd & Crawford 2012, Zimmer 2010). Even if institutions or non-private individuals are of interst, as for example in my fellowship project, there are legal hurdles: The copyright of the content usually lies with the platform operator or the user himself.
Since I am very interested in a solution of the problem – after all I would like to make my data from the “Visual framing of politicians on Facebook” project available openly – I was looking for positive examples to guide me. And I actually found something: The data on the „Social Media Monitoring of the German Federal Election Campaign 2017“ project has been online since the end of February in the Gesis datorium. You can find it here: http://dx.doi.org/10.4232/1.12992. I was curious about the datasets because Facebook and Twitter posts were collected here, even if they only cared about textual content and not the pictures. Unfortunately, the result is somewhat disappointing. For legal reasons, the Facebook posts were not published at all and only the IDs of the Twitter posts. With the help of the IDs you can of course try to reconstruct the tweets afterwards. Due to the volatility of the data, however, this is a crutch and will become less and less practicable with increasing time intervals (cf. Bachl 2018). Of course, the data set is still great, it offers a good starting point when planning a study on political actors and media on social media channels. However, to me it would make even more sense to maintain a database in addition to the original data, which could be updated and expanded collaboratively (just thinking aloud).
Of course, this whole problem of the impossibility of publishing communication data would not arise at all if the platform providers worked with scientists and made the data available on a broad basis. Demands for this collaboration were recently published in an open letter by Axel Bruns, AoIR president. I do not want to discuss the matter here, but I find the open letter entirely worthy of support. Nevertheless, I wonder how realistic the demands for unhindered access for scientists on a broad basis really are in the medium term. And what do we do in the meantime? Young scientists in particular have no time to wait for implementation by platform operators and politicians.
Perhaps we should set up a joint infrastructure in which the data is collected and stored? At the moment, in my estimation, many resources are bound by the fact that researchers and/or institutes create doubled structures here, or worse: are not able to carry out a comprehensive social media study alone. Thus, a joint data repository would be really desirable! This could perhaps work, at least at national level or within scientific societies.
Well. I still don’t have a solution for publishing my project data. At least it is a first step towards openness to know who has collected which data at all – perhaps this will at least make it possible to re-use the data, link relevant data sources or even to establish new cooperations.