Do you believe your (social media) data? A personal story on location data biases, errors, and plausibility as well as their visualization

Tobias Isenberg, Zujany Salazar, Rafael Blanco, Catherine Plaisant

View presentation:2022-10-19T19:12:00ZGMT-0600Change your timezone on the schedule page
2022-10-19T19:12:00Z
Exemplar figure, described by caption below
In our paper we describe a journey diven by a personal dataset, in which we discovered and then analyzed various sources of data errors and data bias, This investigation led us to new visual data representation we call Motion Plausibility Profiles, which allow us to analyze a person's history of posting geo-located images on social media. The image shows the Motion Plausibility Profile of a Flickr user, which indicates that they posted only one or two images at with plausible coordinate data in 2008, that starting from 2010 they apparently individually manipulated the geo-locations of the posted images for each image individually as evident in the implausible (color-coded) speeds between image locations (including some that would require jet airplaine travel between photo sites), and from sometime in 2013 they assigned the exact same geo-location for all of several images that they posted for a given day (indicated in blue).

Prerecorded Talk

The live footage of the talk, including the Q&A, can be viewed on the session page, Questioning Data and Data Bias.

Fast forward
Keywords

Social media data; Flickr; Panoramio; iNaturalist; data bias; data error; data plausibility; data obfuscation; citizen science

Abstract

We present a case study on a journey about a personal data collection of carnivorous plant species habitats, and the resulting scientific exploration of location data biases, data errors, location hiding, and data plausibility. While initially driven by personal interest, our work led to the analysis and development of various means for visualizing threats to insight from geo-tagged social media data. In the course of this endeavor we analyzed local and global geographic distributions and their inaccuracies. We also contribute Motion Plausibility Profiles---a new means for visualizing how believable a specific contributor’s location data is or if it was likely manipulated. We then compared our own repurposed social media dataset with data from a dedicated citizen science project. Compared to biases and errors in the literature on traditional citizen science data, with our visualizations we could also identify some new types or show new aspects for known ones. Moreover, we demonstrate several types of errors and biases for repurposed social media data.