skip to content
 

Thanks to Sara Day Thomson and Katrin Weller for contributions to this list. The list was published January 2016, any suggestions for further items can be sent to info@osc.cam.ac.uk

Key background readings
Tools for capturing social media data
Investment in social media platforms
Challenges in collecting tweets/data quality
Using social media in research
Social media archiving case studies - research and heritage
Web studies
Legal, ethical and regulatory challenges
Activities of the Library of Congress
Publicly shared Twitter datasets (a sample)

Key background readings

Boyd, D. & Crawford, K. (2012). Critical questions for big data. Information, Communication & Society, 15:5, 662-679.

Haddad, F. (2012), ‘An Undiscovered Archive? Online Video Sharing, Alternative Narratives and the Documentation of History’, New Media, Alternative Politics Working Papers, No. 3, March

Morstatter, F., Pfeffer, J., Liu, H., and Carley, K. (2013). “Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose.”

SalahEldeen, H and Nelson, M (2012), ‘Losing My Revolution: How Many Resources Shared on Social Media Have Been Lost?’.

Weller, K and Kinder-Kurlanda, K (2015), ‘Uncovering the Challenges in Collection, Sharing and Documentation: the Hidden Data of Social Media Research?’, Standards and Practices in Large-Scale Social Media Research: Papers from the 2015 ICWSM Workshop.

Back to top

 

Tools for capturing social media data

Name of tool Information Open source?
ARCOMEM The ARCOMEM consortium defined a number of pre-packaged tools which can be used independently from each other for implementation of a socially aware and semantic driven Web preservation model Yes
COSMOS  COSMOS is the collaborative online social media observatory - an integrated social media analysis tool, developed for open access within academia. COSMOS is underpinned by a scalable Hadoop infrastructure and can support the rapid analysis of large data-sets and the orchestration of workflows between tools with limited human effort. Yes
Lentil  Lentil is an open-source program created by the North Carolina State University Libraries that allows for the harvesting of Instagram images through the use of hashtags. Yes
Social Feed Manager Social Feed Manager is an application developed by George Washington University Libraries to collect social media data from Twitter. It connects to Twitter's approved API to collect data in bulk and makes it possible for scholars, students, and librarians to identify, select, collect, and preserve Twitter data for research purposes. Yes
TWARC A command line tool (and Python library) for archiving Twitter JSON Yes

Tools and methods to capture social media data – reading list

NCSU Social Media Archives Toolkit

Bandziulis, L (15 July 2014), ‘How to Download and Archive Your Social Media Memories’, WIRED.com

Borra, E., & Rieder, D. (2014). Programmed method: developing a toolset for capturing and analyzing tweets, Aslib Journal of Information Management, 66(3), 262 – 278.

Bruns, A., & Liang, Y. E. (2012). Tools and methods for capturing Twitter data during natural disasters. First Monday, 17(4).

Burnap, P, Rana, O, Williams, M, Housley, W, et. al. (2014), ‘COSMOS: Towards an Integrated and scalable service for analysing social media on demand’, International Journal of Parallel, Emergent and Distributed Systems, 30:2, 80-100

Gaffney, D., & Puschmann, C. (2014). Data collection on Twitter. In Weller, A. Bruns, J. Burgess., M. Mahrt and C. Puschmann (Ed.), Twitter and Society (pp. 55–68). New York: Peter Lang (BOOK)

Hockx-Yu, H (2014), ‘Archiving Social Media in the Context of Non-print Legal Deposit’, IFLA WLIC Libraries, Citizens, Societies: Confluence for Knowledge in Lyon

Kaczmirek, L, Mayr, P, Vatrapu, R, et. al. (31 March 2014), ‘Social Media Monitoring of the Campaigns for the 2013 German Bundestag Elections on Facebook and Twitter’, GESIS Working Papers

Kaczmirek, L, and Mayr, P (2015), ‘German Bundestag Elections 2013: Twitter usage by electoral candidates.’ GESIS Data Archive, Cologne, DOI: 10.4232/1.12319

Risse, T, Peters, W, Senellart, P, and Maynard, D (2014) ‘Documenting Contemporary Society by Preserving Relevant Information from Twitter’, in Weller, K, et. al. (Eds), Twitter and Society, NYC, NY: Peter Lang Publishing.

Back to top

Investment in social media platforms

Brustein, J (1 October 2014), ‘Twitter Gives MIT $10 Million to Study the Social Impact of Tech’, Bloomberg Business

Gillis, M (1 October 2014), ‘Investing in MIT’s new Laboratory for Social Machines’, Twitter blog,

Halstead, N (10 March 2015), ‘DataSift Partners with Facebook to Bring Facebook Topic Data to Marketers’, DataSift Blog

Messerschmidt, J (15 April 2014), ‘Twitter welcomes Gnip to the Flock’, Twitter blog

MIT News (1 October 2014), ‘MIT launches Laboratory for Social Machines with major Twitter investment’, MIT News

Back to top

Challenges in collecting tweets/data quality

Bruns, A. (21 June 2011), 'Switching from Twapperkeeper to yourTwapperkeeper'. Mapping Online Publics.

Bruns, A. and Stieglitz, S. (2014), 'Twitter data: what do they represent?', IT Information Technology, 59:5, 240-245, DOI: 10.1515/itit-2014-1049.

Jungherr, A., Jurgens, P. and Schoen, H. (2012), 'Why the Pirate Party won the German Election of 2009 or The trouble with predictions: a response to Tumasjan, A., Sprenger, T. O., Sander, P. G. and Welpe, I. M. Predicting elections with Twitter: what 140 characters reveal about political sentiment', Social Science Computer Review, 30:2, 229-34, DOI: 10.1177/0894439311404119.

Morstatter, Fred, Jürgen Pfeffer, Huan Liu, and Kathleen M. Carley. (2013), 'Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose', ICWSM 2013.

Back to top

Using social media in research

Ahmed, W (10 July 2015), ‘Using Twitter as a data source: An overview of current social media research tools’, The Impact Blog.

Housley, W and Williams, M, et. al. (Eds) (2013), ‘Computational Social Science: Research Strategies, Design and Methods’, International Journal of Social Research Methodology, Special Issue, 16, 2.

Organisation for Economic Co-operation and Development (OECD) (February 2013), ‘New Data for Understanding the Human Condition’, OECD Global Science Forum Report.

Sloan L, Morgan J, Burnap P, Williams M (2015), 'Who Tweets? Deriving the Demographic Characteristics of Age, Occupation and Social Class from Twitter User Meta-Data'. PLoS ONE, 10:3.

Summers, E. (14 April 2015), ‘Tweets and Deletes: Silences in the Social Media Archive’,On Archivy.

UK Data Forum (2013), ‘UK Strategy for Data Resources for Social and Economic Research’.

Williams, S. A., Terras, M. M., & Warwick, C. (2013) 'What do people study when they study Twitter? Classifying Twitter related academic papers'. Journal of Documentation, 69(3): 384-410.

Williams, S. A., Terras, M. M., & Warwick, C. (2013) 'How Twitter Is Studied in the Medical Professions: A Classification of Twitter Papers Indexed in PubMed'. Medicine 2.0, 2(2)

Weller, K. (2014). What do we get from Twitter – and what not? A close look at Twitter research in the social sciences. Knowledge Organization 41(3), 238-248.

Weller, K and Kinder-Kurlanda, K (2015). ‘Uncovering the Challenges in Collection, Sharing and Documentation: the Hidden Data of Social Media Research?’, Standards and Practices in Large-Scale Social Media Research: Papers from the 2015 ICWSM Workshop.

Zimmer, M., & Proferes, J.N. (2014). 'A topology of Twitter research: disciplines, methods, and ethics'. Aslib Journal of Information Management, 66(3), 250–261. Doi:10.1108/AJIM-09-2013-0083

Back to top

Social media archiving case studies - research and heritage

Archive Team (2009), ‘GeoCities', ArchiveTeam.org.

British Library (October 2013), ‘Accessing Web Archives'.

D'Orazio, D (25 October 2014), ‘Twitpic saved by Twitter just hours before planned shut down’, The Verge.

Espley, S., Carpentier, F., Pop, R., Medjkoune, L. (August 2014), ‘Collect, Preserve, Access: Applying the Governing Principles of the National Archives UK Government Web Archive to Social Media Content’, Alexandria: The Journal of National and International Library and Information Issues, 25(1-2), 31-50. DOI: 10.7227/ALX.0019

Harrower, N and Heravi, B (2015), ‘How to Archive an Event: The Social Repository of Ireland Project’, 1st Annual Conference on Digital Preservation for the Arts, Humanities, and Social Sciences (DPASSH2015), Dublin, Ireland, 25-26 June 2015.

North Carolina State Universities (NCSU) Libraries (2014-5), ‘Social Media Archives Toolkit’.

Scola, N (11 July 2015) ‘Library of Congress’ Twitter Archive is a Huge #FAIL’, Politico.com.

Storrar, T (8 May 2014), ‘Archiving social media’, TNA Blog.

US Government Accountability Office (31 March 2015), ‘Library of Congress: Strong Leadership Needed to Address Serious Information Technology Management Weaknesses’, Report to Congressional Committees.

Back to top

Web studies

Banks, M (2008), On the Way to the Web: the Secret History of the Internet and Its Founders, Berkeley, CA: Apress, [e-Book].

Dijck, J. van (2013), The Culture of Connectivity: A Critical History of Social Media, Oxford Scholarship Online, [e-Book]. DOI:10.1093/acprof:oso/9780199970773.001.0001 

Helmond, A (23 September 2015), ‘The Web as Platform: Data Flows in Social Media’, PhD Dissertation, University of Amsterdam.

Pennock, M (2013), Web-Archiving. DPC Technology Watch Report 13-01. DOI: 10.7207/twr13-01.

SalahEldeen, H and Nelson, M (2012), ‘Losing My Revolution: How Many Resources Shared on Social Media Have Been Lost?’.

Webster, P (20 March 2015), ‘How fast does the web change and decay? Some evidence’, Web Archives for Historians.

Back to top

Legal, ethical and regulatory challenges

Beurskens, M. (2014). Legal Questions of Twitter Research. In K. Weller, A. Bruns, J. Burgess., M. Mahrt and C. Puschmann (Eds.), Twitter and Society (pp. 123-133). New York: Peter Lang (BOOK).

Cate, F (2012) ‘Notice and Consent in a World of Big Data’, Microsoft Global Privacy Summit Summary Report and Outcomes.

Digital Curation Centre, ‘Funders' data policies’, DCC website.

Executive Office of the President (May 2014), ‘Big Data and Privacy: A Technological Perspective’, Report to the President, President’s Council of Advisors on Science and Technology.

Foursquare (last updated 5 November 2014), ‘API Platform and Data Use Policy’.

Gates, C (4 June 2015), ‘Eulogy for Politwoops’, Sunlight Foundation blog.

Google+ (last updated 26 February 2013), ‘Platform Terms of Service’.

Koops, B (2011), ‘Forgetting Footprints, Shunning Shadows. A Critical Analysis of the “Right To Be Forgotten” In Big Data Practice.’ SCRIPTed, 8:3, 229-256. DOI: 10.2966/scrip. 080311.229.

Lanigan, C, (29 May 2015), ‘Archiving Tweets: Reckoning with Twitter’s Policy’, Insight News Lab.

Mantelero, A (2013), ‘The EU Proposal for a General Data Protection Regulation and the roots of the “right to be forgotten.”Computer Law & Security Review, 29:3, 229-235. DOI: 10.1016/j.clsr.2013.03.010.

Markham, A. and Buchanan, E. (2012), 'Ethical decision-making and internet research 2.0: recommendations from the AoIR Ethics Working Committee'.

Puschmann, C and Burgess, J (2014), ‘The Politics of Twitter Data’, In K Weller et. al. (Eds) Twitter and Society, New York: Peter Lang Publishing (BOOK).

Schroeder, R (December 2014), ‘Big Data and the brave new world of social media research’, Big Data & Society, DOI: 10.1177/2053951714563194

Twitter (last updated 18 May 2015). Developer Agreement & Policy.

Weller, Katrin, and Katharina E. Kinder-Kurlanda. 2014. "I love thinking about ethics: Perspectives on ethics in social media research." In Selected Papers of Internet Research (SPIR). Proceedings of ir15 - Boundaries and Intersections.

Zimmer, M. & Proferes, J.N. (2014). Privacy on Twitter, Twitter on privacy. In Weller, A. Bruns, J. Burgess., M. Mahrt and C. Puschmann (Eds.), Twitter & Society (pp. 169-182), New York: Peter Lang (BOOK).

Zimmer, M. (2010), “But the data is already public: on the ethics of research in Facebook”, Ethics and Information Technology, Vol. 12 No. 4, pp. 313-25, DOI: 10.1007/s10676-010-9227-5.

Back to top

Activities of the Library of Congress

Allen, E. (2013, January 4). Update on the Twitter Archive at the Library of Congress. Library of Congress.

Library of Congress (April 2010), ‘Twitter Donates Entire Tweet Archive to Library of Congress’, Library of Congress.

Library of Congress (January 2013), ‘Update on the Twitter Archive at the Library of Congress’, White Paper. Library of Congress.

McLemmee, S. (2015). The Archive is closed. Inside Higher Education.

Raymond, M. (2010). How Tweet It Is! Library Acquires Entire Twitter Archive. Library of Congress.

Zimmer, M (6 July 2015), ‘The Twitter Archive at the Library of Congress: Challenges for information practice and information policy’, First Monday, Volume 20, Number 7, DOI: 10.5210/fm.v20i7.5619.

Back to top

Publicly shared Twitter datasets (a sample)

CrisisLex on Github

Hadgu & Jäschke 2014 dataset on Github

ICWSM 2012 datasets

ICWSM 2014 datasets

MPI-SWS. The Twitter Project Page at MPI-SWS

TREC 2011

sananalytics (2011). Public domain twitter sentiment corpus. Twitter Developers Forums.

Kaczmirek, Lars; Mayr, Philipp (2015): German Bundestag Elections 2013: Twitter usage by electoral candidates. GESIS Data Archive, Cologne. ZA5973 Data file Version 1.0.0. DOI:10.4232/1.12319.

Back to top

Open Research Newsletter sign-up

Please contact us at info@osc.cam.ac.uk to be added to the mailing list to receive our quarterly e-Newsletter.

The Office of Scholarly Communication sends this Newsletter to its subscribers in order to disseminate information relevant to open access, research data management, scholarly communication and open research topics. For details on how the personal information you enter here is used, please see our privacy policy