Introduction to Data Mining with Python
RCC Workshop
Date and time
Location
John Crerar Library - Zar Room
5730 South Ellis Avenue Chicago, IL 60637About this event
Introduction to Data Mining -- Mining News, Public Sentiments, and Basic Textual Analysis with Python
Abstract
The New York Times (NYT) published around 155 articles on a daily basis, between January 2020 and July 2021. Assuming that you are a researcher who wants to read NYT to extract information for your projects, if you spend 10 minutes on each article you would need over 25 hours just to finish reading NYT every day! You can forget about reading the Wall Street Journal, Bloomberg, or any other news outlets, or even sleep for that matter. How can you solve this dilemma when you need more than 24 hours to finish reading all the news articles to do your work?
In this hands-on workshop, we will use data mining to extract (mine) useful information in a large pool of data automatically, with computer programs. Data mining practices have already become an indispensable tool for researchers and industrial professionals in many areas, from experimental data mining in a physics lab to social media data mining of advertisers. Researchers from all backgrounds could benefit from learning about data mining techniques, and best practices.
Objectives:
You will walk away from this workshop with some data mining tools:
- A working example of textual analysis code
- Knowledge of how to pull news data
- Knowledge of how to perform web scraping with Python API
- Knowledge of NLP sentiment analysis, and how to mine public sentiments
- Knowledge of how to perform basic string manipulations to handle any web scraping results that do not produce clean metadata
- Knowledge of basic data structures that would be useful for your own wild data mining endeavor, e.g., graph data structure, and how to work with these data structures
Level: Introductory
Duration: 2 hours
Prerequisites: All participants are expected to join from a laptop with a Mac, Linux, or Windows operating system. If you do not already have a Github account, you can create a Github account here. Prior programming experience is helpful but not required.
Github repository: https://github.com/rcc-uchicago/introduction_to_data_mining