This event has ended

Hands-on Introduction to Data Science - with Paco Nathan (Seattle)

Seattle Data Geeks (Official)

Wednesday, July 10, 2013 from 8:30 AM to 4:30 PM (PDT)

Seattle, WA

Ticket Information

Ticket Type Sales End Price Fee Quantity
Registration` Ended $140.00 $4.49

Share Hands-on Introduction to Data Science - with Paco Nathan (Seattle)

Event Details

Paco Nathan, Chief Scientist at Concurrent and author of the upcoming O'Reilly book Enterprise Data Workflowsis coming to Seattle to teach a one day hands-on Introduction to Data Science. This is based on his workshop for the Bay Area ACM, and the sold out class he recently taught in Austin, Tx.

Course Description

Big Data, Data Science, Cloud Computing... Lots of exciting stuff, lots of media buzz, lots of confusing descriptions. For a programmer armed with a laptop and some knowledge of Bash, Python, Java – what is a good way to begin working with these new tools for handling large-scale unstructured data?

In addition to examining “How” things work, we will take a detailed look at “Why” did MapReduce emerge this way – what factors lead to the popular frameworks and what typical issues confront large-scale deployments – so that each student is prepared to make ongoing assessments and learnings as the field continues to grow and evolve.

Agenda

   * data science history, with video clips from primary sources
  * survey of Big Data frameworks (gentle intro to using CAP theorem to categorize)
  * intro RStudio, simple data visualization in R
  * Hadoop streaming in Python
  * Cascading intro
  * (for those advanced) explore a little Cascalog or Scalding).

 

Installations Required

Recommended platforms: Linux or MacOSX

Caveat: absolutely no Cygwin, it just doesn't work. If someone has Windows, they'll need a VM and be running Linux on it.  Alternatively, I'll have a EC2 server running with several accounts, and the installs already done. RStudio however will run great on Windows.

Git

- install according to vendor instructions

Python 2.7.x

- install according to vendor instructions

RStudio (latest version)

http://www.rstudio.com/ide/

Java 1.6.x

- install according to vendor instructions

Apache Hadoop 1.x

- be sure to install for "Standalone Operation"

Gradle 1.4 or later

- install according to vendor instructions

There will be a few other installs that we perform during the class in class. 

Speaker Bio:

Paco Nathan @pacoid is currently the Director of Data Science at Concurrent in SF, and a committer on the Cascading open source project. A 25 year veteran of the tech industry, for the last ten years Paco has built and led data teams. Paco has a background in math/stats and distributed computing, and expertise in Hadoop, R, AWS, predictive analytics, machine learning, and NLP.

Paco is a frequent speaker at data conferences. Most recently, he spoke at Strata and gave the keynote at Data Day Texas. He will be speaking at OSCON in July.

Paco is author of the upcoming O'Reilly book: Enterprise Data Workflows with Cascading.
Paco's Wikipedia Page 
Paco on TwitterLinkedin, Slideshare, Github 

 

If you have any questions regarding the class, send them to data@lynnbender.com

 

Have questions about Hands-on Introduction to Data Science - with Paco Nathan (Seattle)? Contact Seattle Data Geeks (Official)

When & Where



Hotel 1000
1000 First Avenue
Seattle, WA 98104

Wednesday, July 10, 2013 from 8:30 AM to 4:30 PM (PDT)


  Add to my calendar

Organizer

Please log in or sign up

In order to purchase these tickets in installments, you'll need an Eventbrite account. Log in or sign up for a free account to continue.