July 2015

Big Data Analysis for Social Scientists course - ACSPRI Winter Program 2015


Note: "vosonSML" is the new name for the R package that was previously called "SocialMediaLab". These archived documents may contain materials (e.g. R scripts) that refer to the old package. For up-to date references please visit the Training workshops page or the vosonSML page.

This page provides resources for participants in the Big Data Analysis for Social Scientists short course taught by Robert Ackland (Australian National University) and Timothy Graham (University of Queensland).  The course is running as part of the ACSPRI Winter Program at the University of Queensland, Brisbane, 29 June - 3 July 2015.

More details on the overall objectives of the course are here, and the present page is where you can find everything else: timetable, course notes, information on software, datasets, R scripts etc. Please note that this page is under development, and will change throughout the course.  We advise you to not print any of the notes or handouts on this site, as we will bring hard copies to the course (and there may be version changes).

On this page, you can find:

Timetable and course slides

Timetable (updated: 27 Jun)

Slides:

R scripts used in the course

Note: some of these scripts involve the use of SocialMediaLab (version 0.13) and vosonR (version 0.12), which are now out-of-date (as of 13 July 2015), so some modifications may be required for use with the current versions of SocialMediaLab and vosonR.

Datasets used in the course

  • Sat_Jun_27_13:04:34_2015_EST_YoutubeActorNetwork_FiveFracking.graphml
  • rawtil4_facebook_data.csv
  • thebananagirl_facebook_data.csv
  • Tue_Jun_30_09-05-18_2015_AEST_YoutubeActorNetwork_GitHub.graphml
  • May_26_11_59_47_2015_AEST_FacebookBIMODALNetwork2_TheBananaGirl.graphml
  • StarWars_facebook_bimodal_ACSPRI_Gender_Analysis.graphml
  • abortion4_new_freqALL.csv
  • facebook_temporal_data_starwars.csv
  • Jul_02_21_01_53_2015_AEST_FacebookDynamicBimodalNetwork.graphml
  • Jul_02_16_29_44_2015_EST_FacebookDynamicBimodalNetwork_StopCoalSeamGasBlueMountains.graphml

Exercises

SocialMediaLab and vosonR

We have developed two R packages that will be used in this course:

  • SocialMediaLab.  SocialMediaLab is a tool for collecting social media data and generating networks for analysis.  Version 0.13 (out-of-date): package, documentation

  • vosonR.  vosonR is an R client for the VOSON software (for collecting and analysing WWW hyperlink networks and website text content). Note: you will need to apply for a VOSON account to use this software.  Version 0.12 (out-of-date): package, documentation

General information on using R

Installing R

The course will be run in a lab with Windows computers. You are welcome to use your own laptop for the course,  but we may not be able to offer support if you run into technical problems.  The following are instructions for installing R:

Using R

There are a lot of free resources on the web for learning how to program in R. A Very Short Introduction to R provides a concise introduction to installing RStudio (a graphical user interface for R), and various major aspects such as console commands, data structures, functions, plots, and reading/writing files.

For users who prefer a more 'interactive' approach to learning R, Data Camp's free introduction to R is a very good resource. The focus of these tutorials are more on data structures in R, but the 'hands on' approach enables users to quickly learn how to operate in the R environment.

Norman Matloff's book, The Art of R Programming, is available for free as a PDF. This book is an excellent introduction and 'refresher' to R, and is also applicable for intermediate R users.

Hadley Wickham's companion site for Advanced R provides an extremely in-depth guide for intermediate or advanced R users who wish to hone their skills and knowledge.

General information on using Gephi

While our course is mainly R-based, we will also use the Gephi open source software for network visualisation.  The Gephi website has some excellent information on learning how to use Gephi, and there is also a quick start tutorial.

If you are going to install Gephi youself, please note that as discussed here, it does not run on Java (JDK) 1.8, so Java 1.7 needs to be installed as well, and there is a config file that needs to be edited to point Gephi to the Java 1.7 directory, or otherwise the application won't open.

Here are some additional instructions for installing Gephi on a Mac (this should also work for Java 1.7):

  1. Download and install Java 1.6: http://support.apple.com/kb/DL1572
  2. After downloading Gephi, move it to Applications directory.
  3. Delete your Gephi settings directory by running following command at your terminal: rm -r ~/Library/Application\ Support/gephi
  4. Find your java home by running following command at your terminal: /usr/libexec/java_home -v 1.6 This should print something like this: /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home
  5. Edit /Applications/Gephi/Contents/Resources/gephi/etc/gephi (by right-clicking on Gephi application icon and choosing "Show Package Contents", then following to the mentioned directory path) with a text editor, by addicting the java home-path line (as found in step 4.) to the beginning of the file. This will likely be the following line: jdkhome="/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home”

Obtaining API access (Twitter, YouTube, Facebook, VOSON)

With SocialMediaLab, you can collect network and text data from Twitter, YouTube and Facebook.  With vosonR, you can collect WWW hyperlink and website text content via VOSON.  However for all four data sources (Twitter, YouTube, Facebook and VOSON) you will require access to the respective application programming interfaces (APIs).  This section provides some information on how to get these API credentials.  Note: we will be providing datasets in the course and so you only need to apply for your own API keys if you want to collect your own data from the above sources.

Twitter

To access the Twitter API, you need to have a Twitter account.  When logged into Twitter, then go to the Twitter Apps site, and press the "Create a new app" button.  You need to fill in some information including the name of the app e.g. "BigDataCourse", description (whatever you want), website (again, you can put anything here).  You do not need to supply the "Callback URL".  After agreeing to terms and conditions and (if you haven't already done so) supplying a valid phone number in your Twitter profile, your app will be created.  Go to the "Keys and Access Tokens" tab.  The "API Key" and "API Secret" will need to be supplied to SocialMediaLab.  You should also generate the Access Token and Token Secret - by supplying these to SocialMediaLab, you will avoid having to authenticate via a browser.  For more on Twitter apps, see the Twitter Developers Site.

YouTube

To access the YouTube API, you need to have a Google account.  When logged into Google, then go to the Google APIs Console and create a project (if you already don't have one).  Then go to the APIs&auth->APIs link on the LHS and on the API Library tab, select and enable YouTube Data API.  This API should then appear in the Enabled APIs tab.  Then go to APIs&auth->Credentials and generate a Public API access key.  The API key then needs to be supplied to SocialMediaLab.

Facebook

To access the Facebook API, you need to have a Facebook account.  When logged into Facebook then go to the Facebook Developers Site and go to the "MyApps" page.  There, click on the "Add a New App" button.  You are then asked to select a platform to get started (iOS, Android, Facebook Canvas, WWW), and you select "Website".  Then you get to a page where you are asked to supply a display name - e.g. "BigDataCourse" and choose a category (this is for commercial applications...you can select whatever) and then you press "create App ID".  After getting through the CAPTCHA, you should see a page with your newly-created App.  The App ID and the App Secret are what you need to supply to SocialMediaLab.

VOSON

To use vosonR you need a (free) VOSON account - you can get this from the Uberlink website.  Note that Uberlink is a company that was created by Robert Ackland in order to commercialise the VOSON software.  You do not need to pay for a VOSON account, and at the end of the ACSPRI course we will be happy to delete your VOSON account if you wish.  If you do apply for a VOSON account, please mention the ACSPRI course in the "what do you want to use our software for" box.