Introduction to Data Mining  with Python

Introduction to Data Mining with Python

RCC Workshop

By Research Computing Center

Date and time

Thursday, November 18, 2021 · 2 - 4pm CST

Location

John Crerar Library - Zar Room

5730 South Ellis Avenue Chicago, IL 60637

About this event

Introduction to Data Mining -- Mining News, Public Sentiments, and Basic Textual Analysis with Python

Abstract

The New York Times (NYT) published around 155 articles on a daily basis, between January 2020 and July 2021. Assuming that you are a researcher who wants to read NYT to extract information for your projects, if you spend 10 minutes on each article you would need over 25 hours just to finish reading NYT every day! You can forget about reading the Wall Street Journal, Bloomberg, or any other news outlets, or even sleep for that matter. How can you solve this dilemma when you need more than 24 hours to finish reading all the news articles to do your work?

In this hands-on workshop, we will use data mining to extract (mine) useful information in a large pool of data automatically, with computer programs. Data mining practices have already become an indispensable tool for researchers and industrial professionals in many areas, from experimental data mining in a physics lab to social media data mining of advertisers. Researchers from all backgrounds could benefit from learning about data mining techniques, and best practices.

Objectives:

You will walk away from this workshop with some data mining tools:

  • A working example of textual analysis code
  • Knowledge of how to pull news data
  • Knowledge of how to perform web scraping with Python API
  • Knowledge of NLP sentiment analysis, and how to mine public sentiments
  • Knowledge of how to perform basic string manipulations to handle any web scraping results that do not produce clean metadata
  • Knowledge of basic data structures that would be useful for your own wild data mining endeavor, e.g., graph data structure, and how to work with these data structures

Level: Introductory

Duration: 2 hours

Prerequisites: All participants are expected to join from a laptop with a Mac, Linux, or Windows operating system. If you do not already have a Github account, you can create a Github account here. Prior programming experience is helpful but not required.

Github repository: https://github.com/rcc-uchicago/introduction_to_data_mining

Organized by

Sales Ended