Data Skeptic
Episodes
Mercedes Benz Machine Learning Research
14 Dec 2017
Contributed by Lukas
This episode features an interview with Rigel Smiroldo recorded at NIPS 2017 in Long Beach California. We discuss data privacy, machine learning use...
[MINI] Parallel Algorithms
08 Dec 2017
Contributed by Lukas
When computers became commodity hardware and storage became incredibly cheap, we entered the era of so-call "big" data. Most definitions of big data w...
Quantum Computing
01 Dec 2017
Contributed by Lukas
In this week's episode, Scott Aaronson, a professor at the University of Texas at Austin, explains what a quantum computer is, various possible appli...
Azure Databricks
28 Nov 2017
Contributed by Lukas
I sat down with Ali Ghodsi, CEO and found of Databricks, and John Chirapurath, GM for Data Platform Marketing at Microsoft related to the recent annou...
[MINI] Exponential Time Algorithms
24 Nov 2017
Contributed by Lukas
In this episode we discuss the complexity class of EXP-Time which contains algorithms which require $O(2^{p(n)})$ time to run. In other words, the w...
P vs NP
17 Nov 2017
Contributed by Lukas
In this week's episode, host Kyle Polich interviews author Lance Fortnow about whether P will ever be equal to NP and solve all of life's problems. Fo...
[MINI] Sudoku \in NP
10 Nov 2017
Contributed by Lukas
Algorithms with similar runtimes are said to be in the same complexity class. That runtime is measured in the how many steps an algorithm takes relati...
The Computational Complexity of Machine Learning
03 Nov 2017
Contributed by Lukas
In this episode, Professor Michael Kearns from the University of Pennsylvania joins host Kyle Polich to talk about the computational complexity of mac...
[MINI] Turing Machines
27 Oct 2017
Contributed by Lukas
TMs are a model of computation at the heart of algorithmic analysis. A Turing Machine has two components. An infinitely long piece of tape (memory...
The Complexity of Learning Neural Networks
20 Oct 2017
Contributed by Lukas
Over the past several years, we have seen many success stories in machine learning brought about by deep learning techniques. While the practical succ...
[MINI] Big Oh Analysis
13 Oct 2017
Contributed by Lukas
How long an algorithm takes to run depends on many factors including implementation details and hardware. However, the formal analysis of algorithms...
Data science tools and other announcements from Ignite
06 Oct 2017
Contributed by Lukas
In this episode, Microsoft's Corporate Vice President for Cloud Artificial Intelligence, Joseph Sirosh, joins host Kyle Polich to share some of the Mi...
Generative AI for Content Creation
29 Sep 2017
Contributed by Lukas
Last year, the film development and production company End Cue produced a short film, called Sunspring, that was entirely written by an artificial int...
[MINI] One Shot Learning
22 Sep 2017
Contributed by Lukas
One Shot Learning is the class of machine learning procedures that focuses learning something from a small number of examples. This is in contrast t...
Recommender Systems Live from FARCON 2017
15 Sep 2017
Contributed by Lukas
Recommender systems play an important role in providing personalized content to online users. Yet, typical data mining techniques are not well suited ...
[MINI] Long Short Term Memory
08 Sep 2017
Contributed by Lukas
Thanks to our sponsor brilliant.org/dataskeptics A Long Short Term Memory (LSTM) is a neural unit, often used in Recurrent Neural Network (RNN) which...
Zillow Zestimate
01 Sep 2017
Contributed by Lukas
Zillow is a leading real estate information and home-related marketplace. We interviewed Andrew Martin, a data science Research Manager at Zillow, to ...
Cardiologist Level Arrhythmia Detection with CNNs
25 Aug 2017
Contributed by Lukas
Our guest Pranav Rajpurkar and his coauthored recently published Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks, a paper ...
[MINI] Recurrent Neural Networks
18 Aug 2017
Contributed by Lukas
RNNs are a class of deep learning models designed to capture sequential behavior. An RNN trains a set of weights which depend not just on new input ...
Project Common Voice
11 Aug 2017
Contributed by Lukas
Thanks to our sponsor Springboard. In this week's episode, guest Andre Natal from Mozilla joins our host, Kyle Polich, to discuss a couple exciting n...
[MINI] Bayesian Belief Networks
04 Aug 2017
Contributed by Lukas
A Bayesian Belief Network is an acyclic directed graph composed of nodes that represent random variables and edges that imply a conditional dependence...
pix2code
28 Jul 2017
Contributed by Lukas
In this episode, Tony Beltramelli of UIzard Technologies joins our host, Kyle Polich, to talk about the ideas behind his latest app that can trans...
[MINI] Conditional Independence
21 Jul 2017
Contributed by Lukas
In statistics, two random variables might depend on one another (for example, interest rates and new home purchases). We call this conditional depende...
Estimating Sheep Pain with Facial Recognition
14 Jul 2017
Contributed by Lukas
Animals can't tell us when they're experiencing pain, so we have to rely on other cues to help treat their discomfort. But it is often difficult to te...
CosmosDB
07 Jul 2017
Contributed by Lukas
This episode collects interviews from my recent trip to Microsoft Build where I had the opportunity to speak with Dharma Shukla and Syam Nair about ...
[MINI] The Vanishing Gradient
30 Jun 2017
Contributed by Lukas
This episode discusses the vanishing gradient - a problem that arises when training deep neural networks in which nearly all the gradients are very cl...
Doctor AI
23 Jun 2017
Contributed by Lukas
hen faced with medical issues, would you want to be seen by a human or a machine? In this episode, guest Edward Choi, co-author of the study titled Do...
[MINI] Activation Functions
16 Jun 2017
Contributed by Lukas
In a neural network, the output value of a neuron is almost always transformed in some way using a function. A trivial choice would be a linear transf...
MS Build 2017
09 Jun 2017
Contributed by Lukas
This episode recaps the Microsoft Build Conference. Kyle recently attended and shares some thoughts on cloud, databases, cognitive services, and art...
[MINI] Max-pooling
02 Jun 2017
Contributed by Lukas
Max-pooling is a procedure in a neural network which has several benefits. It performs dimensionality reduction by taking a collection of neurons and ...
Unsupervised Depth Perception
26 May 2017
Contributed by Lukas
This episode is an interview with Tinghui Zhou. In the recent paper "Unsupervised Learning of Depth and Ego-motion from Video", Tinghui and collabor...
[MINI] Convolutional Neural Networks
19 May 2017
Contributed by Lukas
CNNs are characterized by their use of a group of neurons typically referred to as a filter or kernel. In image recognition, this kernel is repeated...
Multi-Agent Diverse Generative Adversarial Networks
12 May 2017
Contributed by Lukas
Despite the success of GANs in imaging, one of its major drawbacks is the problem of 'mode collapse,' where the generator learns to produce samples wi...
[MINI] Generative Adversarial Networks
05 May 2017
Contributed by Lukas
GANs are an unsupervised learning method involving two neural networks iteratively competing. The discriminator is a typical learning system. It attem...
Opinion Polls for Presidential Elections
28 Apr 2017
Contributed by Lukas
Recently, we've seen opinion polls come under some skepticism. But is that skepticism truly justified? The recent Brexit referendum and US 2016 Pr...
OpenHouse
21 Apr 2017
Contributed by Lukas
No reliable, complete database cataloging home sales data at a transaction level is available for the average person to access. To a data scientist in...
[MINI] GPU CPU
14 Apr 2017
Contributed by Lukas
There's more than one type of computer processor. The central processing unit (CPU) is typically what one means when they say "processor". GPUs were i...
[MINI] Backpropagation
07 Apr 2017
Contributed by Lukas
Backpropagation is a common algorithm for training a neural network. It works by computing the gradient of each weight with respect to the overall e...
Data Science at Patreon
31 Mar 2017
Contributed by Lukas
In this week's episode of Data Skeptic, host Kyle Polich talks with guest Maura Church, Patreon's data science manager. Patreon is a fast-growing ...
[MINI] Feed Forward Neural Networks
24 Mar 2017
Contributed by Lukas
Feed Forward Neural Networks In a feed forward neural network, neurons cannot form a cycle. In this episode, we explore how such a network would be ab...
Reinventing Sponsored Search Auctions
17 Mar 2017
Contributed by Lukas
In this Data Skeptic episode, Kyle is joined by guest Ruggiero Cavallo to discuss his latest efforts to mitigate the problems presented in this new wo...
[MINI] The Perceptron
10 Mar 2017
Contributed by Lukas
Today's episode overviews the perceptron algorithm. This rather simple approach is characterized by a few particular features. It updates its weights ...
The Data Refuge Project
03 Mar 2017
Contributed by Lukas
DataRefuge is a public collaborative, grassroots effort around the United States in which scientists, researchers, computer scientists, librarians and...
[MINI] Automated Feature Engineering
24 Feb 2017
Contributed by Lukas
If a CEO wants to know the state of their business, they ask their highest ranking executives. These executives, in turn, should know the state of the...
Big Data Tools and Trends
17 Feb 2017
Contributed by Lukas
In this episode, I speak with Raghu Ramakrishnan, CTO for Data at Microsoft. We discuss services, tools, and developments in the big data sphere as ...
[MINI] Primer on Deep Learning
10 Feb 2017
Contributed by Lukas
In this episode, we talk about a high-level description of deep learning. Kyle presents a simple game (pictured below), which is more of a puzzle re...
Data Provenance and Reproducibility with Pachyderm
03 Feb 2017
Contributed by Lukas
Versioning isn't just for source code. Being able to track changes to data is critical for answering questions about data provenance, quality, and rep...
[MINI] Logistic Regression on Audio Data
27 Jan 2017
Contributed by Lukas
Logistic Regression is a popular classification algorithm. In this episode, we discuss how it can be used to determine if an audio clip represents one...
Studying Competition and Gender Through Chess
20 Jan 2017
Contributed by Lukas
Prior work has shown that people's response to competition is in part predicted by their gender. Understanding why and when this occurs is important i...
[MINI] Dropout
13 Jan 2017
Contributed by Lukas
Deep learning can be prone to overfit a given problem. This is especially frustrating given how much time and computational resources are often requir...
The Police Data and the Data Driven Justice Initiatives
06 Jan 2017
Contributed by Lukas
In this episode I speak with Clarence Wardell and Kelly Jin about their mutual service as part of the White House's Police Data Initiative and Data Dr...
The Library Problem
30 Dec 2016
Contributed by Lukas
We close out 2016 with a discussion of a basic interview question which might get asked when applying for a data science job. Specifically, how a libr...
2016 Holiday Special
23 Dec 2016
Contributed by Lukas
Today's episode is a reading of Isaac Asimov's Franchise. As mentioned on the show, this is just a work of fiction to be enjoyed and not in any way...
[MINI] Entropy
16 Dec 2016
Contributed by Lukas
Classically, entropy is a measure of disorder in a system. From a statistical perspective, it is more useful to say it's a measure of the unpredictabi...
MS Connect Conference
09 Dec 2016
Contributed by Lukas
Cloud services are now ubiquitous in data science and more broadly in technology as well. This week, I speak to Mark Souza, Tobias Ternström, and Cor...
Causal Impact
02 Dec 2016
Contributed by Lukas
Today's episode is all about Causal Impact, a technique for estimating the impact of a particular event on a time series. We talk to William Martin ab...
[MINI] The Bootstrap
25 Nov 2016
Contributed by Lukas
The Bootstrap is a method of resampling a dataset to possibly refine it's accuracy and produce useful metrics on the result. The bootstrap is a useful...
[MINI] Gini Coefficients
18 Nov 2016
Contributed by Lukas
The Gini Coefficient (as it relates to decision trees) is one approach to determining the optimal decision to introduce which splits your dataset as p...
Unstructured Data for Finance
11 Nov 2016
Contributed by Lukas
Financial analysis techniques for studying numeric, well structured data are very mature. While using unstructured data in finance is not necessarily ...
[MINI] AdaBoost
04 Nov 2016
Contributed by Lukas
AdaBoost is a canonical example of the class of AnyBoost algorithms that create ensembles of weak learners. We discuss how a complex problem like pred...
Stealing Models from the Cloud
28 Oct 2016
Contributed by Lukas
Platform as a service is a growing trend in data science where services like fraud analysis and face detection can be provided via APIs. Such services...
[MINI] Calculating Feature Importance
21 Oct 2016
Contributed by Lukas
For machine learning models created with the random forest algorithm, there is no obvious diagnostic to inform you which features are more important i...
NYC Bike Share Rebalancing
14 Oct 2016
Contributed by Lukas
As cities provide bike sharing services, they must also plan for how to redistribute bicycles as they inevitably build up at more popular destination ...
[MINI] Random Forest
07 Oct 2016
Contributed by Lukas
Random forest is a popular ensemble learning algorithm which leverages bagging both for sampling and feature selection. In this episode we make an ana...
Election Predictions
30 Sep 2016
Contributed by Lukas
Jo Hardin joins us this week to discuss the ASA's Election Prediction Contest. This is a competition aimed at forecasting the results of the upcoming ...
[MINI] F1 Score
23 Sep 2016
Contributed by Lukas
The F1 score is a model diagnostic that combines precision and recall to provide a singular evaluation for model comparison. In this episode we disc...
Urban Congestion
16 Sep 2016
Contributed by Lukas
Urban congestion effects every person living in a city of any reasonable size. Lewis Lehe joins us in this episode to share his work on downtown conge...
[MINI] Heteroskedasticity
09 Sep 2016
Contributed by Lukas
Heteroskedasticity is a term used to describe a relationship between two variables which has unequal variance over the range. For example, the varia...
Music21
02 Sep 2016
Contributed by Lukas
Our guest today is Michael Cuthbert, an associate professor of music at MIT and principal investigator of the Music21 project, which we focus our disc...
[MINI] Paxos
26 Aug 2016
Contributed by Lukas
Paxos is a protocol for arriving a consensus in a distributed computing system which accounts for unreliability of the nodes. We discuss how this mi...
Trusting Machine Learning Models with LIME
19 Aug 2016
Contributed by Lukas
Machine learning models are often criticized for being black boxes. If a human cannot determine why the model arrives at the decision it made, there's...
[MINI] ANOVA
12 Aug 2016
Contributed by Lukas
Analysis of variance is a method used to evaluate differences between the two or more groups. It works by breaking down the total variance of the sy...
Machine Learning on Images with Noisy Human-centric Labels
05 Aug 2016
Contributed by Lukas
When humans describe images, they have a reporting bias, in that the report only what they consider important. Thus, in addition to considering whethe...
[MINI] Survival Analysis
29 Jul 2016
Contributed by Lukas
Survival analysis techniques are useful for studying the longevity of groups of elements or individuals, taking into account time considerations and r...
Predictive Models on Random Data
22 Jul 2016
Contributed by Lukas
This week is an insightful discussion with Claudia Perlich about some situations in machine learning where models can be built, perhaps by well-intent...
[MINI] Receiver Operating Characteristic (ROC) Curve
15 Jul 2016
Contributed by Lukas
An ROC curve is a plot that compares the trade off of true positives and false positives of a binary classifier under different thresholds. The area u...
Multiple Comparisons and Conversion Optimization
08 Jul 2016
Contributed by Lukas
I'm joined by Chris Stucchio this week to discuss how deliberate or uninformed statistical practitioners can derive spurious and arbitrary results via...
[MINI] Leakage
01 Jul 2016
Contributed by Lukas
If you'd like to make a good prediction, your best bet is to invent a time machine, visit the future, observe the value, and return to the past. For t...
Predictive Policing
24 Jun 2016
Contributed by Lukas
Kristian Lum (@KLdivergence) joins me this week to discuss her work at @hrdag on predictive policing. We also discuss Multiple Systems Estimation, a ...
[MINI] The CAP Theorem
17 Jun 2016
Contributed by Lukas
Distributed computing cannot guarantee consistency, accuracy, and partition tolerance. Most system architects need to think carefully about how they s...
Detecting Terrorists with Facial Recognition?
10 Jun 2016
Contributed by Lukas
A startup is claiming that they can detect terrorists purely through facial recognition. In this solo episode, Kyle explores the plausibility of these...
[MINI] Goodhart's Law
03 Jun 2016
Contributed by Lukas
Goodhart's law states that "When a measure becomes a target, it ceases to be a good measure". In this mini-episode we discuss how this affects SEO, ca...
Data Science at eHarmony
27 May 2016
Contributed by Lukas
I'm joined this week by Jon Morra, director of data science at eHarmony to discuss a variety of ways in which machine learning and data science are be...
[MINI] Stationarity and Differencing
20 May 2016
Contributed by Lukas
Mystery shoppers and fruit cultivation help us discuss stationarity - a property of some time serieses that are invariant to time in several ways. Di...
Feather
13 May 2016
Contributed by Lukas
I'm joined by Wes McKinney (@wesmckinn) and Hadley Wickham (@hadleywickham) on this episode to discuss their joint project Feather. Feather is a file ...
[MINI] Bargaining
06 May 2016
Contributed by Lukas
Bargaining is the process of two (or more) parties attempting to agree on the price for a transaction. Game theoretic approaches attempt to find two...
deepjazz
29 Apr 2016
Contributed by Lukas
Deepjazz is a project from Ji-Sung Kim, a computer science student at Princeton University. It is built using Theano, Keras, music21, and Evan Chow's ...
[MINI] Auto-correlative functions and correlograms
22 Apr 2016
Contributed by Lukas
When working with time series data, there are a number of important diagnostics one should consider to help understand more about the data. The aut...
Early Identification of Violent Criminal Gang Members
15 Apr 2016
Contributed by Lukas
This week I spoke with Elham Shaabani and Paulo Shakarian (@PauloShakASU) about their recent paper Early Identification of Violent Criminal Gang Membe...
[MINI] Fractional Factorial Design
08 Apr 2016
Contributed by Lukas
A dinner party at Data Skeptic HQ helps teach the uses of fractional factorial design for studying 2-way interactions.
Machine Learning Done Wrong
01 Apr 2016
Contributed by Lukas
Cheng-tao Chu (@chengtao_chu) joins us this week to discuss his perspective on common mistakes and pitfalls that are made when doing machine learning....
Potholes
25 Mar 2016
Contributed by Lukas
Co-host Linh Da was in a biking accident after hitting a pothole. She sustained an injury that required stitches. This is the story of our quest to fi...
[MINI] The Elbow Method
18 Mar 2016
Contributed by Lukas
Certain data mining algorithms (including k-means clustering and k-nearest neighbors) require a user defined parameter k. A user of these algorithms i...
Too Good to be True
11 Mar 2016
Contributed by Lukas
Today on Data Skeptic, Lachlan Gunn joins us to discuss his recent paper Too Good to be True. This paper highlights a somewhat paradoxical / counteri...
[MINI] R-squared
04 Mar 2016
Contributed by Lukas
How well does your model explain your data? R-squared is a useful statistic for answering this question. In this episode we explore how it applies to ...
Models of Mental Simulation
26 Feb 2016
Contributed by Lukas
Jessica Hamrick joins us this week to discuss her work studying mental simulation. Her research combines machine learning approaches iwth beh...
[MINI] Multiple Regression
19 Feb 2016
Contributed by Lukas
This episode is a discussion of multiple regression: the use of observations that are a vector of values to predict a response variable. For this e...
Scientific Studies of People's Relationship to Music
12 Feb 2016
Contributed by Lukas
Samuel Mehr joins us this week to share his perspective on why people are musical, where music comes from, and why it works the way it does. We discus...
[MINI] k-d trees
05 Feb 2016
Contributed by Lukas
This episode reviews the concept of k-d trees: an efficient data structure for holding multidimensional objects. Kyle gives Linhda a dictionary and as...
Auditing Algorithms
29 Jan 2016
Contributed by Lukas
Algorithms are pervasive in our society and make thousands of automated decisions on our behalf every day. The possibility of digital discrimination i...