HAI Seminar with Erik Altman

HAI Seminar with Erik Altman

Visiting scholars share their research with the HAI community.

By Stanford HAI

Date and time

Wednesday, May 7 · 12 - 1:15pm PDT

Location

Gates Computer Science Building Room 119

353 Serra Mall Stanford, CA 94305

About this event

  • Event lasts 1 hour 15 minutes

Synthetic Data Sets: Use Cases for the Financial Industry

HAI Seminar with Erik Altman


IBM Synthetic Data Sets (SDS) have been created for use cases in the financial industry.  One key focus is fraud and criminal activity, whose cost runs into the hundreds of billions of dollars per year or more.  SDS labels many of these criminal activities including money laundering, credit card fraud, check fraud, APP (Authorized Push Payment) fraud (scams), and insurance claims fraud.  As such SDS data provides an attractive foundation for training AI detection models.

Unlike much current activity around synthetic data generation, SDS is not built using large language models.  Instead SDS uses an agent-based virtual world approach.  A key advantage of the SDS design is that all labels are correct:  all fraud is labelled fraud, and only fraud is labelled fraud.  By contrast, much criminal activity is missed in the real world, including 95% of money laundering by a UN estimate.  Hence, even if real data is available, it is often of poor quality for training detection models, or for generating synthetic data.

In practice, access to real data is generally limited to a small number of people at the institution (e.g. a bank) that owns the data.  As such real data provides only a narrow view of activity at a single institution – as opposed to the global view provided by SDS data.  The SDS approach also yields a broad set of synthetic personal information.  This information is highly realistic despite using no information from real individuals.

Development of effective techniques for SDS has required deep expertise across diverse areas.  It has also required significant manual effort.  How to automate some of these efforts remains an open challenge, as do calibration, scaling, and other areas.


Details:

Time: 12:00 pm - 1:15 pm PT

Location: Gates Computer Science Building, Room 119, 353 Jane Stanford Way, CA 94503

Organized by