Cardiff Online Social Media Observatory (COSMOS): Social Media and Data Mining
Cardiff Online Social Media Observatory (COSMOS) is an Economic and Social Research Council (ESRC) and Joint Information Systems Committee (JISC) half a million pound investment that brings together social, political, health, mathematical and computer scientists to study the methodological, theoretical, empirical and policy dimensions of Big ‘Social’ Data. Our objective is to establish a coordinated international social science response to this new form of data in order to address next-generation research questions.
Our empirical research programme is contextualized in terms of the ‘coming crisis of empirical sociology’ (Savage and Burrows, 2007), which is located in the increasing asymmetry between traditional social scientific methods and the power of transactional data generated through the internet. This has led some commentators to question the extent to which university-based sociology and social science can compete with the data rich resources built into the marketing and data generation strategies of the large multi-national corporations that hold and marshal much of this transactional data.
Big Social Data, generated in large part by Social Media interactions, are distinctive from corporate transactional data and ‘conventional’ social science data as they are naturally occurring or ‘user-generated’ and are largely accessible by researchers. COSMOS technologies and digital social research tools capture Big Social Data at the level of populations in real/near-real-time. This offers researchers the hitherto unrealizable possibility of studying social processes as they unfold at the level of populations as contrasted with their official construction through the use of ‘terrestrial’ research instruments (survey, interviews etc.) and curated and administrative data-sets. The potential for systematic data mining and mixed method analysis in relation to key social science concerns and questions is now possible; COSMOS provides a means of operationalising the next generation ‘social computational tool kit’. It also provides a means of augmenting social science research training through the provision of new methodological tools and options for researchers conducting social inquiry in the 21st century. This process is informed by our recent work on the political and ethical implications of Big Data, that focuses on the tensions between the ‘panoptic’ and ‘synoptic’ powers of digital observatories and the allied possibilities of a ‘signature science’.
COSMOS on whether NSM methods are affecting research quality
Social Media and Prediction: Crime Sensing, Data Integration and Statistical Modelling. ESRC/NCRM; April 2013-September 2014 (£195K funded).
(Matthew Williams, William Housley, Adam Edwards, Luke Sloan, Pete Burnap, Omer Rana, Alex Voss and Rob Procter)
The key objective of this project is to develop the repurposing of user generated social media data for social research by developing innovative methodological and computational tools for establishing the link between online and offline behaviour. This will entail building statistical models based on social media data that forecast offline social phenomena. Project partners include the Metropolitan Police Service and the Office for National Statistics.
'Hate' Speech and Social Media: Understanding Users, Networks and Information Flows. ESRC/Google; April 2013-March 2014 (£125K funded).
(William Housely, Matthew Williams, Adam Edwards, Pete Burnap, Omer Rana, Alex Voss, Rob Procter and Vince Knight)
The aim of the project is to develop a probabilistic model-based methodology and resultant computational tool to inform the social scientific interpretation of the formation and spread of hate speech and antagonistic content in social media networks, as well as its consequences and reactions to it. Project partners include Google.
Anomaly Detection in Big Social Data. ESRC DTC Doctoral Studentship; Oct 2012 - Sept 2016 (£86K funded).
This project extends the Cardiff Online Social Media Observatory (COSMOS) framework by including the ability to automatically detect anomalies in time sequence big social datasets.
Digital Social Research Tools, Tension Indicators and Safer Communities: a demonstration of the Cardiff Online Social Media Observatory (COSMOS)
(Matthew Williams, William Housley, Adam Edwards, Malcolm Williams, Omer Rana and Nick Avis)
Tension indicators or ‘community monitoring systems’ have been developed by police services for the purposes of anticipatory governance to provide early warning of civil unrest and its escalation into major instances of collective violence. Hitherto this community monitoring has been terrestrial, premised on qualitative intelligence from front-line police officers and other ‘sentinels’ such as watch committees, residents and tenants associations, local media and criminal justice data, including records of court proceedings. The development of digital social research tools, particularly for mining social media, can make a major contribution to the indication of tensions in anticipation of major civil unrest. Furthermore, existing research has framed the issue of tension indicators and community safety in ‘panoptic’ terms, reflecting the interests of public authorities in enhancing their surveillance powers for monitoring populations of interest. A major implication of the social media explosion facilitated through Web 2.0 technologies and other digital technology (such as mobile telephones), however, is the rise of the ‘synoptic’ power for the many to watch the few, of citizens to better hold public authorities, such as police forces, to account for their actions. This also provides opportunities for investigating rival accounts of civil unrest, in particular through accessing the sentiments expressed by those directly involved. The potential of the COSMOS to mine and analyse social media also provides resources for non-governmental organisations and the wider citizenry to draw on digital social research in relation to major social-political problems, such as ‘community cohesion’, thereby supporting deliberative democratic processes that can enhance civil liberties.
Automating Sentiment Analysis from Social Data: A Scoping Study
CUROP (Cardiff University)
The growth of the “Social Web” and the corresponding rise in available “emotional text” (through on-line social network platforms such as Facebook and blogging platforms such as BlogSpot) over the past few years has led to an increased interest in sentiment analysis. Research that makes use of such analysis primarily focuses on extraction of text fragments that contain a particular viewpoint – to subsequently support the development of recommendation systems based on data acquired from a large user community. Aggregating the outcome of such an analysis with demographic information enables a better understanding of how a particular community “feels” at a given point in time. This therefore provides a very powerful, automated, research tool for social scientists, to better understand how a community responds to a particular geo-political event. This multi-disciplinary project will make use of the “We Feel Fine” Application Programming Interface (API) from Stanford University and better understand how such a tool could facilitate social sciences research. This project will link in with work on social network analysis (using data mining and graph analysis techniques) within COMSC. It will also build upon a strategic research direction between the two schools, in the establishment of a SOCSI-COMSC research group to investigate how automated social data/media analysis can facilitate social science research.
Supporting Empirical Digital Social Research for the Social Sciences with a Virtual Research Environment JISC, 2012 – 2013 (£55, 000 funded)
The Schools of Social Sciences (SOCSI) and Computer Science & Informatics (COMSCI) at Cardiff University have, over the past 18 months, established the SOCSI/COMSCI research network, an interdisciplinary research group with academic staff from both schools collaborating and sharing best practice in research and teaching. The SOCSI/COMSCI research network has already secured a funded ESRC Wales DTC 4 year postgraduate studentship, and an ESRC research grant to develop data harvesting and analysis methods and tools to detect tension and cohesion in online social networks. The ERSC grant has supported the network in developing the Cardiff Online Social Media Observatory (COSMOS), an information collection, archival and analysis engine for harvesting freely available socially significant data from sources such as social networking sites, blogs, micro-blogs, RSS feeds and Open Data (e.g. crime rates), and analyzing the harvested dataset to detect community tension and cohesion indicators. We propose to enhance COSMOS and engage the wider social scientific research community by extending it to provide an innovative virtual research environment (VRE). Researchers need to be able to use COSMOS data and pose hypothetical “what-if” questions, trying different combinations of social data analysis methods to confirm or refute an informal hypothesis, and then stress testing it further until a coherent and arguable position emerges.
Requirements Analysis for Social Media Analysis Research Tools, ESRC, DSR Community Fund (£5000)
The explosion of ‘born-digital’ data generated as a by-product of the increasing adoption of social media means that the social sciences are facing a data deluge that promises to revolutionise research, but which the research community is presently not equipped to exploit. While the sheer volume of such data presents challenges for the social sciences, such data is now being routinely analysed by industry for its own purposes. Where, in the past, academic social science was an obligatory point of passage for those wanting to learn about social phenomena, there is now a danger that social scientific research is simply bypassed by powerful actors with access to vast datasets. COSMOS is dedicated to helping social researchers meet this challenge and to re-invigorate their interest and leadership in the development of research methods. The methodology we have developed combines techniques that make use of computer-based tools to explore and structure this new form of data.
Postgraduate Projects (COMSC/SOCSI)
Edwin Chappell (COMSC)
David Hannerford (COMSC)
Clare Ruth Wright (2012 – 2016, ESRC Studentship 1+3) Anomaly Detection in Social Data
Housley, W., Williams, M.L., Edwards, A., Burnap, P. (2015) Digital Societies: Theory, Method and Data, London: Sage
Williams, M.L., and Wall, D.S. (Eds.) (2013) 'Policing Cybercrime: Networked and Social Media Technologies and the Challenges for Policing', Policing and Society Special Issue
Burnap, P. Housley, W. Morgan, J. Sloan, L. Williams, M.L., Avis, N., Edwards, A., Rana, O., and Williams, M. (2013) 'Social Media Analysis, Twitter and the London Olympics 2012', Cardiff School of Social Sciences Working Paper 153, Cardiff: Cardiff University.
Housley, W. Williams, M.L. Williams, M. and Edwards, A. (Eds.) (2013) 'Computational Social Science: Research Strategies, Design and Methods' International Journal of Social Research Methodology Special Issue, 16:2
Edwards, A., Housley, W., Sloan, L., Williams, M.L. and Williams, M. (2013) ‘Digital Social Research and the Sociological Imagination: Surrogacy, Augmentation and Re-orientation’, International Journal of Social Research Methodology, Computational Social Science: Research Strategies, Design and Methods, Housley, W. Williams, M.L. Williams, M. and Edwards, A. (Eds.) Special Issue, Volume 16:2
Williams, M.L. Edwards, A., Housley, W., Burnap, P., Rana, O., Avis, N., Morgan, J. and Sloan, L. (2013), ‘Policing Cyber-Neighbourhoods: Tension Monitoring and Social Media Networks’, Policing & Society Special Issue
Burnap, P., Rana, O., Avis, N., Williams, M., Housley, W., Edwards, A. (forthcoming), ‘Detecting Tension in Online Communities with Computational Twitter Analysis’, Technological Forecasting and Social Change
Morgan, J., Sloan, L., Housley, W., Williams, M.L., Edwards, A., Burnap, P. & Rana, O. (forthcoming) 'Knowing the Tweeters: Deriving Sociologically Relevant Demographics from Twitter', Sociological Research Online
National and International Collaborators
University of Warwick
University of St Andrews
University of Oxford
University of Queensland
Australian National University
University of Illinois, Urbana-Champaign
New York University (Centre for Urban Science and Progress (CUSP))
Office for National Statistics
Economic and Social Data Service
Metropolitan Police Service
Association of Chief Police Officers
South Wales Police