Big Data Analysis for Social Scientists course - ACSPRI Winter Program 2016
Note: "vosonSML" is the new name for the R package that was previously called "SocialMediaLab". These archived documents may contain materials (e.g. R scripts) that refer to the old package. For up-to date references please visit the Training workshops page or the vosonSML page.
This page provides resources for participants in the Big Data Analysis for Social Scientists short course taught by Robert Ackland and (Australian National University) and Timothy Graham (University of Queensland & Australian National University). The course is running as part of the ACSPRI Winter Program at the University of Queensland, Brisbane, 4-8 July 2016.
Please note that this page is under development, and will change throughout the course.
On this page, you can find:
- R scripts used in the course
- Datasets used in the course
- SocialMediaLab and VOSON
- Information on using R
- Information on using Gephi
- Obtaining API access (Twitter, YouTube, Facebook, Instagram)
- Some background slides
Timetable (updated: 4 July)
- R refresher: r_refresher.R
- The following script is needed if you are running R/RStudio from the ACSPRI-provided USB key: getDependentPackages.R
- SocialMediaLab Tutorial: pdf, Rmd
- Introduction to SNA with igraph: pdf, Rmd
- Text analysis in R (introduction): pdf, Rmd
- Word cloud and comparison cloud: wordCloud.R, comparisonCloud.R
- Gender analysis: pdf, Rmd
- Optimising R code: optimising_R_code_ACSPRI.r
- Dynamic network analysis: dynamic_network_analysis_ACSPRI.r
- Dynamic network visualisation using ndtv: dynamicViz_ndtv.R
- Converting SocialMediaLab networks (facebook, twitter) to ndtv format: export_ndtv.R
- Dynamic network visualisation using ndtv - Facebook example: dynamicViz_ndtv_ex2.R
- Text analysis in R (topic models): text_analysis_session_code.R
- Text analysis in R (supervised machine learning): v2_ACSPRI_machine_learning_for_text_classification.R
- Deep learning: deep_learning_example_acspri.R
Datasets used in the course
We have developed two R packages that will be used in this course:
SocialMediaLab. SocialMediaLab is an R package that provides a suite of tools for collecting and constructing networks from social media data. It provides easy-to-use functions for collecting data across popular platforms (Instagram, Facebook, Twitter, and YouTube) and generating different types of networks for analysis. Some background on SocialMediaLab.
VOSON. VOSON is software for collecting and analysing WWW hyperlink networks and website text content. Note: you will need to apply for a VOSON account from the Uberlink website (Uberlink is a company that was created by Robert Ackland in order to commercialise the VOSON software). You do not need to pay for a VOSON account, and at the end of the ACSPRI course we will be happy to delete your VOSON account if you wish. If you do apply for a VOSON account, please mention the ACSPRI course in the "what do you want to use our software for" box.
The course will be run in a lab with Windows computers. You are welcome to use your own laptop for the course, but we may not be able to offer support if you run into technical problems. The following are instructions for installing R:
- Windows users should download the latest executable from the R-Project website.
- Mac users should download the latest "pkg" binary R-Project website.
- Linux users can also obtain R binaries for different Linux distributions.
There are a lot of free resources on the web for learning how to program in R. A Very Short Introduction to R provides a concise introduction to installing RStudio (a graphical user interface for R), and various major aspects such as console commands, data structures, functions, plots, and reading/writing files.
For users who prefer a more 'interactive' approach to learning R, Data Camp's free introduction to R is a very good resource. The focus of these tutorials are more on data structures in R, but the 'hands on' approach enables users to quickly learn how to operate in the R environment.
Norman Matloff's book, The Art of R Programming, is available for free as a PDF. This book is an excellent introduction and 'refresher' to R, and is also applicable for intermediate R users.
Hadley Wickham's companion site for Advanced R provides an extremely in-depth guide for intermediate or advanced R users who wish to hone their skills and knowledge.
While our course is mainly R-based, we will also use the Gephi open source software for network visualisation. The Gephi website has some excellent information on learning how to use Gephi, and there is also a quick start tutorial.
SocialMediaLab collects Twitter, YouTube, Facebook and Instagram data the respective free APIs. You can find details on how to get API access here.
The following are some background slides on social media network analysis, from Robert Ackland's masters courses at ANU.
- Social media & big data
- Social network analysis
- WWW hyperlink networks
- Threaded conversation networks
- Supervised machine learning for automated coding of websites