January- February 2018
ACSPRI Summer Program 2018 - Big Data Analysis for Social Scientists
This page provides resources for participants in the Big Data Analysis for Social Scientists short course taught by Robert Ackland (Australian National University). The course is running as part of the ACSPRI Summer Program 2018 at the University of Melbourne, 29 January - 2 February 2018.
Please note that this page is under development, and will be added to throughout the course.
On this page, you can find:
- R scripts used in the course
- Datasets used in the course
- Other information:
- SocialMediaLab and VOSON
- Using R
- Using Gephi
- Obtaining API access (Twitter, YouTube, Facebook)
- Some background slides
R scripts and other materials used in the course
- Introduction to R and RStudio: Rmd, pdf
- Introduction to R exercise: pdf; Answers: pdf
- Divided They Blog - Introduction to SNA with R/igraph (part 1): Rmd, pdf
- SocialMediaLab Introduction: pdf
- SocialMediaLab Tutorial: Rmd, pdf
- Introduction to SNA with R/igraph (part 2): Rmd, pdf
- WWW abortion debate SNA exercise: WWW_Abortion_Debate_exercise.pdf
- Introduction to Text Analysis in R - Environmental Activist Websites: Rmd, pdf
- WWW abortion debate text analysis exercise: WWW_Abortion_Debate_Text_Analysis_exercise.pdf
- Gender analysis: Rmd, pdf
- Using igraph for graph projection: projectionsFacebook_using_igraph.R
- Using Jaccard similarity index for graph projection: projectionsFacebook_with_Jaccard.R
- Dynamic network analysis: dynamic_network_analysis_ACSPRI.r
- Collecting and aggregating multiple networks - Twitter: collect_and_aggregate_networks_Twitter.R
- Dynamic network visualisation using ndtv - Facebook example: dynamicViz_ndtv_ex3.R
- Function for converting SocialMediaLab networks (facebook, twitter) to ndtv format: export_ndtv.R
- Dynamic network visualisation using ndtv - Twitter example: dynamicViz_ndtv_twitter.R
- Revised function for converting SocialMediaLab networks (facebook, twitter) to ndtv format: export_ndtv_v2.R
- Sentiment anaysis - Facebook comment data: sentiment_analysis_Facebook_comments.R
- Constructing semantic networks with ngrams: semanticNetworks.R
- Please note that the scripts below are from the 2017 course and some of them will be modified for 2018! We will be covering similar content in 2018 but the following should only be used as a guide. You are welcome to work ahead but please understand that scripts may be different by the time we get to them in the course.
- Read graph and dataframe (from Facebook): readFacebookDataframeGraph.R
- Create a corpus using text files in a directory: createCorpus.R
Datasets used in the course
- Divided They Blog (40 A-listers) csv files: edges, vertices
- Abortion debate WWW hyperlink network csv files: edges, vertices
- Environmental activist website meta keywords: nano2seeds_v2.csv
- YouTube actor network (anti-fracking): Sat_Jun_27_13-04-34_2015_EST_YoutubeActorNetwork_FiveFracking.graphml
- Facebook bimodal network (Coal Seam Gas): g_bimodal_facebook_csg.graphml
- Facebook bimodal network (Star Wars): g_bimodal_facebook_star_wars.graphml
- Facebook bimodal network (Star Trek): g_bimodal_facebook_star_trek.graphml
- Facebook bimodal network (Star Wars - large): Jul_02_21_01_53_2015_AEST_FacebookDynamicBimodalNetwork.graphml
- Facebook bimodal dynamic network (Star Wars): g_bimodal_facebook_starwars_dynamic.graphml
- Facebook dataframe (Coal Seam Gas): 2015-06-23_to_2015-07-23_StopCoalSeamGasBlueMountains_FacebookData.csv
- Abortion debate WWW hyperlinks and meta keywords: ProChoiceProLife_withKeywords.csv
- Facebook dataframe (Star Wars): 2015-03-01_to_2015-03-02_StarWars_FacebookData.csv
- Facebook dataframe (Star Trek): 2015-03-01_to_2015-03-02_StarTrek_FacebookData.csv
- Facebook dataframe (Star Wars - large): facebook_temporal_data_starwars.csv
- Data for network aggregation (Twitter): data_for_agg.zip
- Aggregated/combined Twitter network: twitter_combined.graphml
SocialMediaLab and VOSON
We use the following in the course:
-
SocialMediaLab. SocialMediaLab is an R package that provides a suite of tools for collecting and constructing networks from social media data. It provides easy-to-use functions for collecting data across popular platforms (Facebook, Twitter, and YouTube) and generating different types of networks for analysis. Some background on SocialMediaLab.
-
VOSON. VOSON is software for collecting and analysing WWW hyperlink networks and website text content. Note: you will need to apply for a VOSON account from the Uberlink website (Uberlink is a company that was created by Robert Ackland in order to commercialise the VOSON software). You do not need to pay for a VOSON account, and at the end of the ACSPRI course we will be happy to delete your VOSON account if you wish. If you do apply for a VOSON account, please mention the ACSPRI course in the "what do you want to use our software for" box.
General information on using R
Installing R and RStudio
The course this year is being run with participants using their own laptops. It is important that you already have R and R Studio installed and functioning on your computer before the course commences. I will be able to provide technical assistance with using R and R Studio (and we will have a brief refresher/introduction on the first day) but it is important that you have already installed the software. The following are instructions for installing R:
- Windows users should download the latest executable from the R-Project website.
- Mac users should download the latest "pkg" binary R-Project website.
- Linux users can also obtain R binaries for different Linux distributions.
You can install RStudio (Open Source Licence) here.
Using R
There are a lot of free resources on the web for learning how to program in R. A Very Short Introduction to R provides a concise introduction to installing RStudio (a graphical user interface for R), and various major aspects such as console commands, data structures, functions, plots, and reading/writing files.
For users who prefer a more 'interactive' approach to learning R, Data Camp's free introduction to R is a very good resource. The focus of these tutorials are more on data structures in R, but the 'hands on' approach enables users to quickly learn how to operate in the R environment.
Norman Matloff's book, The Art of R Programming, is available for free as a PDF. This book is an excellent introduction and 'refresher' to R, and is also applicable for intermediate R users.
Hadley Wickham's companion site for Advanced R provides an extremely in-depth guide for intermediate or advanced R users who wish to hone their skills and knowledge.
General information on using Gephi
While our course is mainly R-based, we will also use the Gephi open source software for network visualisation. The Gephi website has some excellent information on learning how to use Gephi, and there is also a quick start tutorial.
Obtaining API access (Twitter, YouTube, Facebook)
SocialMediaLab collects Twitter, YouTube and Facebook data the respective free APIs. You can find details on how to get API access here.
Background slides
The following are some background slides on social media network analysis, from Robert Ackland's masters courses at ANU.
- Online Reserach Methods - Introduction
- Social network analysis
- Homophily
- WWW hyperlink networks
- Threaded conversation networks
- Microblogs