Build Your First Machine Learning Application
Get introduced to data science and machine learning with our 2-day Bootcamp at Mosaik Academy. Learn how to develop, evaluate and optimise different machine learning algorithms with ease. No prior coding experience needed!
Data Scientists Needed
“Data Scientist” is the best job in the US according to Glassdoor’s job rankings. There is a huge shortage of qualified data scientists in the field in both the States and Europe, which means attractive compensation packages. Our Bootcamp is a great opportunity to learn the basics to start working on projects and securing a job in the field.
Data-Driven Companies Win
Data-driven companies outperform their competitors financially. However, most organizations are still “data rich but insight poor” - and struggle to transform available data into actionable business decisions. Take the first step to delivering real results with your analysis.
Who is it for?
Anyone who is interested in Machine Learning and thinking about a career change
Analysts who are familiar with statistics and want to build scalable prediction models
Developers who want to learn about machine learning libraries and practices
What will you learn?
Essential data science skills focusing on real world use cases
Getting, preparing and analyzing data sets
Basic theoretical background of machine learning techniques
State-of-the-art techniques for building models
Visualization techniques to present your findings
What technologies will you use?
Easy to use online notebook (Jupyter)
Data storing libraries for fast data processing
Python and all the machine learning libraries written in it
Visualization libraries for gaining data insight more easily
“The shortage of data scientists is becoming a serious constraint in some sectors.” - D.J. Patil, Chief Data Scientist of USA
“...by 2018, the U.S. alone may face a 50 percent to 60 percent gap between supply and requisite demand of deep analytical talent.” - McKinsey Report
“...the average salary for a data scientist is $118,709 versus $64,537 for a skilled programmer.” - Glassdoor Report
FAQ - IS IT FOR ME?
Do I need any programming experience (knowledge of Python) to attend?
If you have any prior programming experience (preferably in Python) you are good to go. If you do not have any prior programming skills you can still attend our workshop but we strongly recommend to take a look at Codecademy's Beginner Python course. It is free and in a couple of hours, you can complete Unit 1-5 and 7-8. If you have more time, you can finish the whole course.
What’s the experience like?
The Workshop will be mostly practical and you will learn by applying techniques to real-world case studies while gaining an understanding of the conceptual and theoretical background of these techniques.
Will I become a data scientist at the end of the workshop?
The Workshop is a great way to familiarize yourself with the basics of the field and also dig deeper and be able to perform a full analysis of a data set. You will be able to build your own machine learning model and understand how these models can power a data driven application. It is definitely a great start to becoming one.
Should I bring a laptop with me?
Absolutely! We make sure that you will be able to code along with us during the sessions. To do that you will need your own laptop (Mac or Windows) to follow the session. We will let you know about any further technical details. Although, it is important that you will not need to purchase any commercial software licenses (we use open-source, free software).
Will you explain the math and statistics backgrounds of the methods?
In general, we will avoid detailed explanations and scientific equations during the lectures. We will focus on the practical applications of the algorithms but you are encouraged to go deep after the Bootcamp to learn more about regressions and classification.
Introduction to Machine Learning
Overview of machine learning, introduction to supervised vs. unsupervised learning concepts. Basics of classification vs. regression algorithms and how they can be used in real-world applications.
Cleaning, Pre-Processing and Analyzing Data
Cleaning and mining of real-world data, pre-processing techniques. How to do exploratory data analysis with summary statistics and visualization.
Building Your First Classification Model
Building your first machine learning model for classification. Learning the K-Nearest Neighbor (KNN) algorithm and the how to measure its performance.
Improving your model with fine-tuning the model parameters and selecting the best variables. The introduction of the bias-variance trade-off.
Mini-Project - Build Your Own Model
Build and optimize a classifier on a new real-world data set.
Drinks with fellow participants and lecturers.
Classification with Decision Trees
Learn one of the most popular classification tool the decision trees (Decision Trees, Random Forests, Ensemble models, Extremely Randomized Trees).
Automatic grouping of similar objects into sets without any knowledge of how many categories you have. Learning how to visualize your results in 2D when you have many dimensions.
Building prediction models with multivariable regression methods.
Expert Talk and Panel Discussion
We invite a senior data scientist to share her experience on how to apply data science in business environments. After a presentation, we will sit down with her for an open panel discussion about the field to answer any questions the participants may have.
About the Instructors
Istvan Szukacs (LinkedIn) - CTO at StreamBright Data
Building data pipelines and analytical systems at massive scale. My experience lies in distributed systems, focusing on data-driven large-scale systems (10.000+ nodes).
For the highly concurrent world, my choice of the development environment is Erlang(BEAM) and Clojure (JVM). Using functional languages that support thousands of lightweight threads communicating with message passing and having inverted concurrency control enables low latency and high throughput with thread safe software.
Storing data at scale has been an interesting subject to me, I am familiar with the RCFile whitepaper and the most recent publication about ORC and Parquet. I have been using columnar stores beside the classical row oriented stores (SQL servers) and key-value stores (Riak, Couchbase).
Analysis of large datasets is sometimes challenging. Using caching and sampling and few other techniques makes it possible to query these sets. I am familiar with few query engines (Hive, PrestoDB, Tez).
Adam Jermann (Linkedin) - CEO at StreamBright Data
Trained as a finance-investment professional, born with entrepreneurial mindset and spirit. Had experience with multinational advisor firms (Big4s); later turned to small businesses and helped them start their journey in Budapest and San Diego.
Passionate about big data and data science, working on the acceleration of big data and data science technology adoption at our Clients with StreamBright Data.
Visioned the growth of eSports from a subculture to the 300M+ USD industry which is today, 10+ years ago. One of the many million people who prefer to watch a CS:GO Major final to the UEFA Champions League Final.