Thanks to Sara Day Thomson and Katrin Weller for contributions to this list. The list was published January 2016, any suggestions for further items can be sent to info@osc.cam.ac.uk
Key background readings
Tools for capturing social media data
Investment in social media platforms
Challenges in collecting tweets/data quality
Using social media in research
Social media archiving case studies - research and heritage
Web studies
Legal, ethical and regulatory challenges
Activities of the Library of Congress
Publicly shared Twitter datasets (a sample)
Key background readings
Boyd, D. & Crawford, K. (2012). Critical questions for big data. Information, Communication & Society, 15:5, 662-679.
Haddad, F. (2012), ‘An Undiscovered Archive? Online Video Sharing, Alternative Narratives and the Documentation of History’, New Media, Alternative Politics Working Papers, No. 3, March
Morstatter, F., Pfeffer, J., Liu, H., and Carley, K. (2013). “Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose.”
SalahEldeen, H and Nelson, M (2012), ‘Losing My Revolution: How Many Resources Shared on Social Media Have Been Lost?’.
Weller, K and Kinder-Kurlanda, K (2015), ‘Uncovering the Challenges in Collection, Sharing and Documentation: the Hidden Data of Social Media Research?’, Standards and Practices in Large-Scale Social Media Research: Papers from the 2015 ICWSM Workshop.
Tools for capturing social media data
Name of tool | Information | Open source? |
ARCOMEM | The ARCOMEM consortium defined a number of pre-packaged tools which can be used independently from each other for implementation of a socially aware and semantic driven Web preservation model | Yes |
COSMOS | COSMOS is the collaborative online social media observatory - an integrated social media analysis tool, developed for open access within academia. COSMOS is underpinned by a scalable Hadoop infrastructure and can support the rapid analysis of large data-sets and the orchestration of workflows between tools with limited human effort. | Yes |
Lentil | Lentil is an open-source program created by the North Carolina State University Libraries that allows for the harvesting of Instagram images through the use of hashtags. | Yes |
Social Feed Manager | Social Feed Manager is an application developed by George Washington University Libraries to collect social media data from Twitter. It connects to Twitter's approved API to collect data in bulk and makes it possible for scholars, students, and librarians to identify, select, collect, and preserve Twitter data for research purposes. | Yes |
TWARC | A command line tool (and Python library) for archiving Twitter JSON | Yes |
Tools and methods to capture social media data – reading list
NCSU Social Media Archives Toolkit
Bandziulis, L (15 July 2014), ‘How to Download and Archive Your Social Media Memories’, WIRED.com
Borra, E., & Rieder, D. (2014). Programmed method: developing a toolset for capturing and analyzing tweets, Aslib Journal of Information Management, 66(3), 262 – 278.
Bruns, A., & Liang, Y. E. (2012). Tools and methods for capturing Twitter data during natural disasters. First Monday, 17(4).
Burnap, P, Rana, O, Williams, M, Housley, W, et. al. (2014), ‘COSMOS: Towards an Integrated and scalable service for analysing social media on demand’, International Journal of Parallel, Emergent and Distributed Systems, 30:2, 80-100
Gaffney, D., & Puschmann, C. (2014). Data collection on Twitter. In Weller, A. Bruns, J. Burgess., M. Mahrt and C. Puschmann (Ed.), Twitter and Society (pp. 55–68). New York: Peter Lang (BOOK)
Hockx-Yu, H (2014), ‘Archiving Social Media in the Context of Non-print Legal Deposit’, IFLA WLIC Libraries, Citizens, Societies: Confluence for Knowledge in Lyon
Kaczmirek, L, Mayr, P, Vatrapu, R, et. al. (31 March 2014), ‘Social Media Monitoring of the Campaigns for the 2013 German Bundestag Elections on Facebook and Twitter’, GESIS Working Papers
Kaczmirek, L, and Mayr, P (2015), ‘German Bundestag Elections 2013: Twitter usage by electoral candidates.’ GESIS Data Archive, Cologne, DOI: 10.4232/1.12319
Risse, T, Peters, W, Senellart, P, and Maynard, D (2014) ‘Documenting Contemporary Society by Preserving Relevant Information from Twitter’, in Weller, K, et. al. (Eds), Twitter and Society, NYC, NY: Peter Lang Publishing.
Investment in social media platforms
Brustein, J (1 October 2014), ‘Twitter Gives MIT $10 Million to Study the Social Impact of Tech’, Bloomberg Business
Gillis, M (1 October 2014), ‘Investing in MIT’s new Laboratory for Social Machines’, Twitter blog,
Halstead, N (10 March 2015), ‘DataSift Partners with Facebook to Bring Facebook Topic Data to Marketers’, DataSift Blog
Messerschmidt, J (15 April 2014), ‘Twitter welcomes Gnip to the Flock’, Twitter blog
MIT News (1 October 2014), ‘MIT launches Laboratory for Social Machines with major Twitter investment’, MIT News
Challenges in collecting tweets/data quality
Bruns, A. (21 June 2011), 'Switching from Twapperkeeper to yourTwapperkeeper'. Mapping Online Publics.
Bruns, A. and Stieglitz, S. (2014), 'Twitter data: what do they represent?', IT Information Technology, 59:5, 240-245, DOI: 10.1515/itit-2014-1049.
Jungherr, A., Jurgens, P. and Schoen, H. (2012), 'Why the Pirate Party won the German Election of 2009 or The trouble with predictions: a response to Tumasjan, A., Sprenger, T. O., Sander, P. G. and Welpe, I. M. Predicting elections with Twitter: what 140 characters reveal about political sentiment', Social Science Computer Review, 30:2, 229-34, DOI: 10.1177/0894439311404119.
Morstatter, Fred, Jürgen Pfeffer, Huan Liu, and Kathleen M. Carley. (2013), 'Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose', ICWSM 2013.
Using social media in research
Ahmed, W (10 July 2015), ‘Using Twitter as a data source: An overview of current social media research tools’, The Impact Blog.
Housley, W and Williams, M, et. al. (Eds) (2013), ‘Computational Social Science: Research Strategies, Design and Methods’, International Journal of Social Research Methodology, Special Issue, 16, 2.
Organisation for Economic Co-operation and Development (OECD) (February 2013), ‘New Data for Understanding the Human Condition’, OECD Global Science Forum Report.
Sloan L, Morgan J, Burnap P, Williams M (2015), 'Who Tweets? Deriving the Demographic Characteristics of Age, Occupation and Social Class from Twitter User Meta-Data'. PLoS ONE, 10:3.
Summers, E. (14 April 2015), ‘Tweets and Deletes: Silences in the Social Media Archive’,On Archivy.
UK Data Forum (2013), ‘UK Strategy for Data Resources for Social and Economic Research’.
Williams, S. A., Terras, M. M., & Warwick, C. (2013) 'What do people study when they study Twitter? Classifying Twitter related academic papers'. Journal of Documentation, 69(3): 384-410.
Williams, S. A., Terras, M. M., & Warwick, C. (2013) 'How Twitter Is Studied in the Medical Professions: A Classification of Twitter Papers Indexed in PubMed'. Medicine 2.0, 2(2)
Weller, K. (2014). What do we get from Twitter – and what not? A close look at Twitter research in the social sciences. Knowledge Organization 41(3), 238-248.
Weller, K and Kinder-Kurlanda, K (2015). ‘Uncovering the Challenges in Collection, Sharing and Documentation: the Hidden Data of Social Media Research?’, Standards and Practices in Large-Scale Social Media Research: Papers from the 2015 ICWSM Workshop.
Zimmer, M., & Proferes, J.N. (2014). 'A topology of Twitter research: disciplines, methods, and ethics'. Aslib Journal of Information Management, 66(3), 250–261. Doi:10.1108/AJIM-09-2013-0083
Social media archiving case studies - research and heritage
Archive Team (2009), ‘GeoCities', ArchiveTeam.org.
British Library (October 2013), ‘Accessing Web Archives'.
D'Orazio, D (25 October 2014), ‘Twitpic saved by Twitter just hours before planned shut down’, The Verge.
Espley, S., Carpentier, F., Pop, R., Medjkoune, L. (August 2014), ‘Collect, Preserve, Access: Applying the Governing Principles of the National Archives UK Government Web Archive to Social Media Content’, Alexandria: The Journal of National and International Library and Information Issues, 25(1-2), 31-50. DOI: 10.7227/ALX.0019
Harrower, N and Heravi, B (2015), ‘How to Archive an Event: The Social Repository of Ireland Project’, 1st Annual Conference on Digital Preservation for the Arts, Humanities, and Social Sciences (DPASSH2015), Dublin, Ireland, 25-26 June 2015.
North Carolina State Universities (NCSU) Libraries (2014-5), ‘Social Media Archives Toolkit’.
Scola, N (11 July 2015) ‘Library of Congress’ Twitter Archive is a Huge #FAIL’, Politico.com.
Storrar, T (8 May 2014), ‘Archiving social media’, TNA Blog.
US Government Accountability Office (31 March 2015), ‘Library of Congress: Strong Leadership Needed to Address Serious Information Technology Management Weaknesses’, Report to Congressional Committees.
Web studies
Banks, M (2008), On the Way to the Web: the Secret History of the Internet and Its Founders, Berkeley, CA: Apress, [e-Book].
Dijck, J. van (2013), The Culture of Connectivity: A Critical History of Social Media, Oxford Scholarship Online, [e-Book]. DOI:10.1093/acprof:oso/9780199970773.001.0001
Helmond, A (23 September 2015), ‘The Web as Platform: Data Flows in Social Media’, PhD Dissertation, University of Amsterdam.
Pennock, M (2013), Web-Archiving. DPC Technology Watch Report 13-01. DOI: 10.7207/twr13-01.
SalahEldeen, H and Nelson, M (2012), ‘Losing My Revolution: How Many Resources Shared on Social Media Have Been Lost?’.
Webster, P (20 March 2015), ‘How fast does the web change and decay? Some evidence’, Web Archives for Historians.
Legal, ethical and regulatory challenges
Beurskens, M. (2014). Legal Questions of Twitter Research. In K. Weller, A. Bruns, J. Burgess., M. Mahrt and C. Puschmann (Eds.), Twitter and Society (pp. 123-133). New York: Peter Lang (BOOK).
Cate, F (2012) ‘Notice and Consent in a World of Big Data’, Microsoft Global Privacy Summit Summary Report and Outcomes.
Digital Curation Centre, ‘Funders' data policies’, DCC website.
Executive Office of the President (May 2014), ‘Big Data and Privacy: A Technological Perspective’, Report to the President, President’s Council of Advisors on Science and Technology.
Foursquare (last updated 5 November 2014), ‘API Platform and Data Use Policy’.
Gates, C (4 June 2015), ‘Eulogy for Politwoops’, Sunlight Foundation blog.
Google+ (last updated 26 February 2013), ‘Platform Terms of Service’.
Koops, B (2011), ‘Forgetting Footprints, Shunning Shadows. A Critical Analysis of the “Right To Be Forgotten” In Big Data Practice.’ SCRIPTed, 8:3, 229-256. DOI: 10.2966/scrip. 080311.229.
Lanigan, C, (29 May 2015), ‘Archiving Tweets: Reckoning with Twitter’s Policy’, Insight News Lab.
Mantelero, A (2013), ‘The EU Proposal for a General Data Protection Regulation and the roots of the “right to be forgotten.”’ Computer Law & Security Review, 29:3, 229-235. DOI: 10.1016/j.clsr.2013.03.010.
Markham, A. and Buchanan, E. (2012), 'Ethical decision-making and internet research 2.0: recommendations from the AoIR Ethics Working Committee'.
Puschmann, C and Burgess, J (2014), ‘The Politics of Twitter Data’, In K Weller et. al. (Eds) Twitter and Society, New York: Peter Lang Publishing (BOOK).
Schroeder, R (December 2014), ‘Big Data and the brave new world of social media research’, Big Data & Society, DOI: 10.1177/2053951714563194
Twitter (last updated 18 May 2015). Developer Agreement & Policy.
Weller, Katrin, and Katharina E. Kinder-Kurlanda. 2014. "I love thinking about ethics: Perspectives on ethics in social media research." In Selected Papers of Internet Research (SPIR). Proceedings of ir15 - Boundaries and Intersections.
Zimmer, M. & Proferes, J.N. (2014). Privacy on Twitter, Twitter on privacy. In Weller, A. Bruns, J. Burgess., M. Mahrt and C. Puschmann (Eds.), Twitter & Society (pp. 169-182), New York: Peter Lang (BOOK).
Zimmer, M. (2010), “But the data is already public: on the ethics of research in Facebook”, Ethics and Information Technology, Vol. 12 No. 4, pp. 313-25, DOI: 10.1007/s10676-010-9227-5.
Activities of the Library of Congress
Allen, E. (2013, January 4). Update on the Twitter Archive at the Library of Congress. Library of Congress.
Library of Congress (April 2010), ‘Twitter Donates Entire Tweet Archive to Library of Congress’, Library of Congress.
Library of Congress (January 2013), ‘Update on the Twitter Archive at the Library of Congress’, White Paper. Library of Congress.
McLemmee, S. (2015). The Archive is closed. Inside Higher Education.
Raymond, M. (2010). How Tweet It Is! Library Acquires Entire Twitter Archive. Library of Congress.
Zimmer, M (6 July 2015), ‘The Twitter Archive at the Library of Congress: Challenges for information practice and information policy’, First Monday, Volume 20, Number 7, DOI: 10.5210/fm.v20i7.5619.
Publicly shared Twitter datasets (a sample)
Hadgu & Jäschke 2014 dataset on Github
MPI-SWS. The Twitter Project Page at MPI-SWS
sananalytics (2011). Public domain twitter sentiment corpus. Twitter Developers Forums.
Kaczmirek, Lars; Mayr, Philipp (2015): German Bundestag Elections 2013: Twitter usage by electoral candidates. GESIS Data Archive, Cologne. ZA5973 Data file Version 1.0.0. DOI:10.4232/1.12319.