$50 – $125

PyData Indy

Event Information

Share this event

Date and Time

Location

Location

Launch Fishers

12175 Visionary Way

Fishers, IN 46038

View Map

Refund Policy

Refund Policy

Refunds up to 1 day before event

Event description

Description

Join us for this one-day special event when we discuss best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization.

Sparkflow: Utilizing Pyspark for Training Tensorflow Models on Large Datasets
By Derek Miller, LifeOmic
30 Mins, Intermediate

As more public, large datasets are becoming available, distributed data processing tools such as Apache Spark are vital for data scientists. While SparkML provides many machine learning algorithms, standard pipelines, and a basic linear algebra library, it does not support training deep learning models. Due to the rise of Tensorflow in the last two years, Lifeomic built the Sparkflow library to combine the power of the Pipeline api from Spark with training Deep Learning models in Tensorflow. Sparkflow uses the Hogwild algorithm to train deep learning models in a distributed manor, which underneath leverages the driver/executor architecture in Spark to manage copied networks and gradients. In this session, we describe some of the lessons learned in building Sparkflow, the pros and cons of asynchronous distributed deep learning, how to use Spark Pipelines with Tensorflow with very few lines of code, and where we are headed with the library in the near future.


Policy on a Page: Operational Workflow for Ad-Hoc Analyses
By: Aaron Burgess, State of Indiana Family & Social Administration Division of Data & Analytics
30-45 mins, Intermediate

The majority of requests to FSSA Data & Analytics are ad-hoc analyses. Ad-hoc analyses had suffered from two major issues. One was an inferred belief that stakeholders wanted data points when they really wanted a statement of fact that could be cited or applied. The other issue was an over-simplified workflow for ad-hoc requests that featured no version control and "wild west" peer review.

The solution to managing the workflow for ad-hoc analyses was to immediately implement Git and (due to existing licenses) BitBucket policies and procedures. This included a standard ad-hoc repo template, repeated training on best practices such as immediately opening a pull request upon branch creation and peer review responsibilities. JIRA was already in place for task management. In addition, Bamboo was utilized to introduce the concept of continuous integration where ad-hoc requests should be ran automatically on every data refresh with change detection scripts as an early warning system. Finally, making use of Jupyter Notebooks and the respective extensions to deliver well-groomed html exports of deliverables became a standard practice. These deliverables focused on clearly defining an objective, methodology, results, and a "statement of fact" for use by stakeholders.

Implemented changes have resulted in clearer expectations for Data & Analytics team members. The standard Jupyter Notebook html extracts have been well-received by stakeholders and greatly reduced the level of "data heavy; information light" deliverables. The resulting trust from stakeholders has increased our request load and opened up opportunities to work on more complex modeling.



Data Visualization with Bokeh
By: James Alexander, Leaf Software Solutions
30 mins, Beginner

Learn how to create interactive charts and graphs without writing any JavaScript. We'll use Python to generate simple interactive graphs and plots within Jupyter notebooks, and embedded in a running Django site. I'll show examples of streaming data to a Bokeh instance, and interactively intuit about a large dataset using Datashader.


Brief Intro to Natural Language Processing (NLP)
By: Andrew (AJ) Rader, DMC Insurance, Inc
45 mins, Beginner

Natural Language Processing (NLP) is a broad domain that deals with analyzing and understanding human text and words. Some typical areas of application for NLP involve text classification, speech recognition, machine translation, chatbots, and caption generation. Fundamentally, NLP involves converting words into numbers and doing math on these numbers in order to identify relationships between the words and documents they live in. The goal of this talk is to present the basic theory of what NLP is and demonstrate how to utilize machine learning approaches in python to extract insights from text. An example text classification problem is presented; illustrating the steps required to ingest, preprocess, build and test a model for an example text corpus.


Building IoT Data Pipelines with Python
By: Logan Wendholt, Bastian Solutions
30-40 mins, Beginner

So you've learned about the data analytics capabilities of Python, and now you're ready to start churning through data -- great! But do you know how to turn your snippet of code into a system capable of taking in streams of raw sensor data and spitting out insights? This presentation will lay out the basic components of a Python-based data pipeline built for Internet-of-Things (IoT) applications, and will highlight some of the common challenges associated with putting together an efficient data analytics and storage system.

Key topics include:

• An overview of cloud-based "serverless" data pipelines; • Pros and cons of locally-hosted or "edge computing" systems; • Tradeoffs between cost, scalability, complexity, and development time for different architectures By the end of this presentation, the attendee will have gained a broad overview of the ecosystem needed to support a Python analytics solution: how to get data in and out, how to write and deploy scalable code, and how to manage system cost and complexity.

Do You Want to Submit a Talk?

Talk Guidelines:

  • Beginner to Expert levels
  • 30-60 minutes
  • Must include Python

Potential Topics:

  • Data Analytics
  • AI/Machine Learning
  • VR
  • Blockchain
  • IOT
  • Privacy & Security

You can apply here: https://goo.gl/forms/nTc4ZqJrdj6e9Ok53


Program Sponsor:

PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other.


Organizer Sponsor

Title Sponsor:

Giveaway Sponsors:


Media Sponsor


Share with friends

Date and Time

Location

Launch Fishers

12175 Visionary Way

Fishers, IN 46038

View Map

Refund Policy

Refunds up to 1 day before event

Save This Event

Event Saved