February 2019

ACSPRI Summer Program 2019 - Big Data Analysis for Social Scientists

This page provides resources for participants in the Big Data Analysis for Social Scientists short course taught by Robert Ackland (Australian National University).  The course is running as part of the ACSPRI Summer Program 2019 at the University of Melbourne, 4-8 February 2019.

Draft timetable for the course

Please note that this page is under development, and will be added to throughout the course.

On this page, you can find:

Computational Social Science - Draft Teaching Guide

The material in this section is password protected: an email with the password has been sent to you.

We will be working through this draft teaching guide: it contains demonstration R code, exercises, and links to datasets we will be using.  The guide is available in three versions: html, pdf, and Rmd. The R markdown (Rmd) files are the "source files" for the guide, and they can be opened in RStudio for easy access to the R code.  Alternatively, you can copy/paste from the html version (the pdf version may have truncated code if the lines are long).

Note that this guide is in active development and will be updated during the week! Please check the version (see date stamp) matches what I'm using on the screen.

R SCRIPTS AND OTHER MATERIALS USED IN THE COURSE

DATASETS USED IN THE COURSE

  • 2015-03-01_to_2015-03-02_StarWars_FacebookData.csv
  • 2015-03-01_to_2015-03-02_StarTrek_FacebookData.csv
  • g_bimodal_facebook_star_wars.graphml
  • g_bimodal_facebook_star_trek.graphml
  • twitter_pelosi.graphml
  • twitter_trump.graphml
  • Jul_02_21_01_53_2015_AEST_FacebookDynamicBimodalNetwork.graphml
  • facebook_temporal_data_starwars.csv

VOSON Lab software

We use the following software that has been developed in the VOSON Lab at ANU:

  • vosonSML ("social media lab").  An R package that provides a suite of tools for collecting and constructing networks from social media data. It provides easy-to-use functions for collecting data across popular platforms (currently: Twitter, YouTube and Reddit) and generating different types of networks for analysis.  You can find more information on vosonSML, including links to the CRAN and github versions here.

  • VOSON Dashboard. An R/Shiny web app (but we currently run via RStudio) for network/text collection (via vosonSML) and analysis.  More details, including the link to the github page are here.

  • VOSON for hyperlinks.  VOSON is a web app for collecting and analysing WWW hyperlink networks and website text content. Note: you will need to apply for a VOSON account from the Uberlink website (Uberlink is a company that was created by Robert Ackland in order to commercialise the VOSON software).  You do not need to pay for a VOSON account, and at the end of the ACSPRI course we will be happy to delete your VOSON account if you wish.  If you do apply for a VOSON account, please mention the ACSPRI course in the "what do you want to use our software for" box.

General information on using R

Installing R and RStudio

The course this year is being run with participants using their own laptops. It is important that you already have R and R Studio installed and functioning on your computer before the course commences. I will be able to provide technical assistance with using R and R Studio (and we will have a brief refresher/introduction on the first day) but it is important that you have already installed the software.  The following are instructions for installing R:

You can install RStudio Desktop (Open Source Licence) here.

Using R

There are a lot of free resources on the web for learning how to program in R. A Very Short Introduction to R provides a concise introduction to installing RStudio (a graphical user interface for R), and various major aspects such as console commands, data structures, functions, plots, and reading/writing files.

For users who prefer a more 'interactive' approach to learning R, Data Camp's free introduction to R is a very good resource. The focus of these tutorials are more on data structures in R, but the 'hands on' approach enables users to quickly learn how to operate in the R environment.

Norman Matloff's book, The Art of R Programming, is available for free as a PDF. This book is an excellent introduction and 'refresher' to R, and is also applicable for intermediate R users.

Hadley Wickham's companion site for Advanced R provides an extremely in-depth guide for intermediate or advanced R users who wish to hone their skills and knowledge.

General information on using Gephi

While our course is mainly R-based, we will also use the Gephi open source software for network visualisation.  The Gephi website has some excellent information on learning how to use Gephi, and there is also a quick start tutorial.

Obtaining API access (Twitter, YouTube, Reddit)

SocialMediaLab collects Twitter, YouTube and Reddit data the respective free APIs. You can find details on how to get API access here.  Please note that the APIs change over time, so some of the information on that page may be out-of-date: if you have trouble, I will help during the course.

Background slides

The following are some background slides on social media network analysis, from Robert Ackland's masters courses at ANU.

  • Online Reserach Methods - Introduction
  • Social network analysis