RStudio Public Workshop - San Francisco
Monday, April 28, 2014 at 9:00 AM - Tuesday, April 29, 2014 at 5:00 PM (PDT)
San Francisco, California
London, United Kingdom
April 28-29, 2014 - Introduction to Data Science with R
RStudio is hosting our two day Introduction to Data Science with R course in the San Francisco Bay Area this April. This is a two-day workshop, designed to provide a comprehensive introduction to R that will have you analyzing and modeling data with R in no time. All participants will receive a copy of all slides, exercises, data sets, and R scripts used in the course.
Course Instructor: Garrett Grolemund - RStudio Master Instructor
Discount pricing is available for academics and Non-Profits (33% off) and students (66% off). Space is limited, please contact us to confirm your eligibility.
Who should take this course?
This class will be a good fit for you if you are just starting with R or have dabbled in R, but wish to improve your skills. No prior experience with R or data science is required. A basic familiarity with linear models will be helpful, but is not necessary.
What will you learn?
Practical skills for visualizing, transforming, and modeling data in R. During this two-day course, you will learn how to explore and understand data as well as how to build linear and non-linear models in R. A full list of topics for each day is below.
What should you bring?
Be ready to learn. You need your laptop and the latest version of R. We also recommend that you download the RStudio IDE, as it provides a great learning environment for beginners as well as tools for when you transition into an advanced user.
Our two day course teaches you how to analyze data with R. The course is designed for non-programmers as well as data scientists who are switching to R from other software, such as SAS or Excel. The course has been tested by over 300 students, and has been honed to provide a clear and painless introduction to R.
Topics cover the three skill sets of data science: computer programming (with R), manipulating data sets (including loading, cleaning, and visualizing data), and modeling data with statistical methods. You will learn R’s syntax and grammar as well as how to load, save, and transform data, generate beautiful graphs, and fit statistical models to your data. We’ll give you a theoretical framework to help you understand the process of data science, but our focus is on practical tools that you can use as soon as you get back from the course.
All techniques are motivated by real problems, and you’ll be exposed to a number of real datasets throughout the course. We alternate brief lectures with hands-on practice: you’ll get plenty of experience actually using R (not just hearing about it!), and there’s plenty of help available if you get stuck.
Each day is organized into four topics.
Day 1 - Getting started and working with data
Monday, Jan 27, 2014 9:00am - 5:00pm
An Introduction to R - R does more than most statistical software packages. R is a programming language in its own right, an environment for interactive data analysis, and a community of passionate users. This orientation to the R language will get you up and running with R and RStudio. You'll learn
- How to find resources and help for R
- How to use the R interface and workflow
- How to store and work with data objects in R
R Syntax - Learning to speak R begins with R’s syntax. R has a special notation system that allows you to easily extract, use, or manipulate information inside data objects. In this module you’ll use R’s syntax to clean data and automate tasks that would be nearly impossible to do by hand.
- Learn R’s notation language
- Perform targeted searches within your data
- Use subsetting and missing values to clean data
Visualizing Data - R’s is well known for its beautiful graphics. R packages, like ggplot2, provide an expressive and logical language for building clear and effective data visualizations.
- Visualize the distribution of a variable
- Explore and plot relationships between variables
- Plot very large data sets without overplotting
- Display multivariate relationships in 2d graphs
Customizing Graphs - R gives you complete control over the appearance of your graphics. You can customize them for publication, to highlight important findings, or to enhance your corporate branding.
- Add titles, legends, and guides to your plots
- Control labels and coordinate systems in your plots
- Customize the color schemes in your plots
Day 2 - Manipulating and modeling data in R
Tuesday, Jan 28, 2014 9:00am - 5:00pm
Loading and Cleaning Data - Data comes in many formats, but R prefers just one. You can save yourself hours of time, and build good habits, by shaping your data sets into the optimum layout for R.
- Loading different data formats into R
- Working with factors in R
- How to clean poorly formatted data
- Saving your data
Manipulating Data - R’s methods for data manipulation make it easy and fast to extract information from data sets and to prepare raw data for analysis. In this module, you’ll learn how to
- Subset, transform, summarize, and reorder data sets
- Perform targeted, groupwise operations on data
- Join multiple data sets together
Linear Models - Knowing what variables you should include in a model, and what you can infer from the results, are two of the most tricky skills in modeling. They are also two of the most useful. This module will teach you the R tools that can help.
- Fit linear models with R's modeling syntax
- Build and interpret univariate, multivariate, and second order models
- Interpret model coefficients with t-tests and anova tests
- Calculate model statistics such as R2, Cp, AIC and BIC
Modeling Complex Relationships - There are more modeling methods that target more types of data than we could cover in a single day. This module gets your hands dirty with the most generally useful algorithms and ends with an informative survey of all the rest.
- Perform variable selection with best subsets and stepwise regression
- Generalize linear modeling to non-linear relationships with polynomials and splines
- Explore generalized additive models and logistic regression
- Learn the best R packages for more specific modeling methods
In certain cases, we may need to cancel this workshop due to circumstances beyond our control or otherwise. If this happens, RStudio will refund all registration fees for those who signed up. RStudio is not responsible for any related expenses incurred by registered attendees (including but not limited to travel and hotel expenses).
This workshop will be video taped for use by RStudio. Student faces will not appear in filmed materials.
Until April 13, 2014 - Full refund
April 14, 2014 to Jan 19, 2014 - 50% refund of registration fees
April 21, 2014 and after - No refund available
All public workshops hosted by RStudio come with a no-questions-asked money-back guarantee.
When & Where
RStudio is a company dedicated to providing software, education, and services for the R statistical computing environment. We started RStudio because we were excited and inspired by R. To learn more about us, visit our website.