San Francisco, California
London, United Kingdom
Jun 26-27, 2014 - Introduction to Data Science with R
NYC Data Science Academy, training subbrand of SupStat (Official Training partner with RStudio Inc) is hosting our two day Introduction to Data Science with R course in New York City this June. This is a two-day workshop, designed to provide a comprehensive introduction to R. We'll get you programing and analyzing data with R in no time. All participants will receive a copy of all slides, exercises, data sets, and R scripts used in the course.
We will emphasize how you can get work done easiy with Rstudio IDE.
Vivian S. Zhang (CTO of SupStat, Organizer of NYC Open Data Meetup, Founder of NYC Data Science Academy)
Check our past students' excellent testimonial
Discount pricing available for academics (33% off) and students (66% off). Space is limited, please write to Send Mail to confirm your eligibility.
Who should take this course?
This class will be a good fit for you if you are just starting with R or have dabbled in R, but wish to improve your skills. No prior experience with R or data science is required.
What will you learn?
Practical skills for visualizing, transforming, and modeling data in R. During this two-day course, you will learn how to explore and understand data as well as how to do basic programming in R. A full list of topics for each day is below.
What should you bring?
Be ready to learn. You need your laptop and the latest version of R. We also recommend downloading the Rstudio IDE, as it provides a great learning environment for beginners as well as tools for when you transition into an advanced user.
Are you interested in better understanding your data, and not so interested in mastering a programming language? Have you tried learning R from a book or website, but have been discouraged? If so, this is the course for you. We assume that you've never programmed before (although some experience doesn't hurt), and we teach you the best tools to help analyze your data.
You won't be a master programmer by the end of this two-day course, but through immersion you will have learned the basics of R's syntax and grammar, and you'll have started building an effective R vocabulary for visualizing, transforming, and modeling data. You will learn how to load, save, and transform data as well as how to write functions, generate beautiful graphs, and fit basic statistical models to your data. We'll give you a theoretical framework to help you understand the process of data analysis, but our focus is on practical tools that you can use as soon as you get back from the course.
All techniques are motivated by real problems, and you'll be exposed to a number of real datasets throughout the course. We alternate brief lectures with hands-on practice: you'll get plenty of experience actually using R (not just hearing about it!), and there's plenty of help available if you get stuck. The course concludes with a 90-minute data analysis project. You can use this as an opportunity to start using R with your data, or work on answering some of our questions about a dataset.
This tried and true course has been taken by over 200 students, from biologists to humanists, many of whom had never programmed before. This course teaches the basic skills needed by anyone seriously interested in data.
Day 1 - Getting started and working with data
Thursday, June 26th, 2013
An Introduction to R and data analysis - R is more than just a programming language. R is a statistical software application in its own right, an environment for interactive data analysis, and a community of passionate users. This orientation to the R language will help you get up and running.
- How to download and update R and SupStat
- How to find resources and help for R
- Stages of data analysis
- Best practices of data analysis
Visualizing data - R's is well known for its beautiful graphics. R packages, like `ggplot2`, provide an expressive and logical language for building clear and effective data visualizations.
- Visualize the distribution of a variable
- Exploring and plotting relationships between variables
- Display very large data sets through graphs without over-plotting
- Use best practices for Exploratory Data Analysis in R code
Working with data - R is a programming language with a purpose: to analyze data. Learning how R stores and handles data will help you apply R to any data source.
- Loading different data formats into R
- Working with factors in R
- How to clean poorly formatted data
- Saving your data
Manipulating data - R's methods for data manipulation make it easy and fast to extract information from data sets and to prepare raw data for analysis.
- Subset, transform, summarize, and reorder data sets
- Perform targeted, groupwise operations on data
- Join multiple data sets together
Day 2 - Programming and modeling in R
Friday, Jun 27th, 2013
Programming in R - Many people use R as an application, a sort of statistical calculator, but R is also a programming language. Once you learn to program in R, you will be a more versatile and capable data analyst. You'll learn to write code that provides the precise solutions you are looking for.
- Create an if else statement
- Write and optimize for and while loops in R
- Use best practices for programming in R
R functions - Functions allow you to save your code for later or to share it with other R users. Knowing how to write a function will also streamline your workflow. Functions give code a more efficient structure that avoids duplication and aids debugging.
- Organize a problem into a series of functions
- Write a function in R
- Apply best practices for writing functions in R
Simulation in R - Simulating data provides a way to test hypotheses and discover the uncertainty in your estimates.
- Generate random numbers in R
- Visualize uncertainty with bootstrapping in R
- Construct a confidence interval with bootstrapping in R
- Test a hypothesis with a permutation test in R
Modeling in R - R excels at statistical analysis and modeling, but its methods for modeling may seem unintuitive at first.
- Write a formula in R
- Fit a model to data in R
- Compare models
- Explore data sets with models
In certain cases, we may need to cancel this workshop due to circumstances beyond our control or otherwise. If this happens, SupStat will refund all registration fees for those who signed up. SupStat is not responsible for any related expenses incurred by registered attendees (including but not limited to travel and hotel expenses).
Until Jun 15th, 2014 - Full refund, less 10% of registration fees
Jun 15th, 2014 to June 21st, 2014 - 50% refund of registration fees
Jun 22nd, 2014 and after - No refund available
All public workshops hosted by SupStat with a no-questions-asked money-back guarantee.
When & Where
Our services include consulting on statistical methods, software training on statistical computing and data analysis (mainly R), statistical graphics and data visualization, as well as statistical reports. We have Beijing, Shanghai and New York office. Our team includes top 0.2% ranked Kagglers.(www.kaggle.com hosts excellent data mining competitions and gathers more than 100K data scientists.) For business inquiry, please email :email@example.com