Text Mining with the HathiTrust: Empowering Librarians to Support Digital Scholarship Research - University of Houston

Actions Panel

Text Mining with the HathiTrust: Empowering Librarians to Support Digital Scholarship Research - University of Houston

By HathiTrust Research Center

When and where

Date and time

Friday, May 18, 2018 · 9am - 4pm CDT


University of Houston M.D. Anderson Library, Digital Research Commons, Room 266-C 4333 University Drive Houston, TX 77204


This free, all-day workshop will introduce you to text analysis research and the common methods and tools used in this emerging area of scholarship, with particular attention to the tools and data of the HathiTrust Research Center. This workshop will introduce you to the core concepts and methods employed in text mining and related areas of digital scholarship, so that you begin to learn a framework for how libraries can support text data mining, as well as transferable skills useful for many other areas of digital scholarly inquiry.

Topics include:

  • Introduction to gathering, managing, analyzing, and visualizing textual data

  • Hands-on experience with text analysis tools, including the HTRC's off-the-shelf algorithms, datasets, such as the HTRC Extracted Features

  • Using Python and the command line to run basic text analysis processes

No experience necessary! Attendees should bring a laptop.

The workshop will run from 9:00 a.m. to 4:00 p.m. with a one hour break for lunch.

Contact htrc_workshop@library.illinois.edu if you have questions.

Funded by IMLS RE-00-15-0112-15.

About the organizer

The HathiTrust Research Center (HTRC) enables computational access for nonprofit and educational users to published works in the public domain and, in the future, on limited terms to works in-copyright from the HathiTrust.

The HTRC is a collaborative research center launched jointly by Indiana University and the University of Illinois, along with the HathiTrust Digital Library, to help meet the technical challenges of dealing with massive amounts of digital text that researchers face by developing cutting-edge software tools and cyberinfrastructure to enable advanced computational access to the growing digital record of human knowledge.

Leveraging data storage and computational infrastructure at Indiana University and the University of Illinois at Urbana-Champaign, the HTRC will provision a secure computational and data environment for scholars to perform research using the HathiTrust Digital Library. The center will break new ground in the areas of text mining and non-consumptive research, allowing scholars to fully utilize content of the HathiTrust Library while preventing intellectual property misuse within the confines of current U.S. copyright law.