Data Skeptic
Episodes
[MINI] The Elbow Method
18 Mar 2016
Contributed by Lukas
Certain data mining algorithms (including k-means clustering and k-nearest neighbors) require a user defined parameter k. A user of these algorithms i...
Too Good to be True
11 Mar 2016
Contributed by Lukas
Today on Data Skeptic, Lachlan Gunn joins us to discuss his recent paper Too Good to be True. This paper highlights a somewhat paradoxical / counteri...
[MINI] R-squared
04 Mar 2016
Contributed by Lukas
How well does your model explain your data? R-squared is a useful statistic for answering this question. In this episode we explore how it applies to ...
Models of Mental Simulation
26 Feb 2016
Contributed by Lukas
Jessica Hamrick joins us this week to discuss her work studying mental simulation. Her research combines machine learning approaches iwth beh...
[MINI] Multiple Regression
19 Feb 2016
Contributed by Lukas
This episode is a discussion of multiple regression: the use of observations that are a vector of values to predict a response variable. For this e...
Scientific Studies of People's Relationship to Music
12 Feb 2016
Contributed by Lukas
Samuel Mehr joins us this week to share his perspective on why people are musical, where music comes from, and why it works the way it does. We discus...
[MINI] k-d trees
05 Feb 2016
Contributed by Lukas
This episode reviews the concept of k-d trees: an efficient data structure for holding multidimensional objects. Kyle gives Linhda a dictionary and as...
Auditing Algorithms
29 Jan 2016
Contributed by Lukas
Algorithms are pervasive in our society and make thousands of automated decisions on our behalf every day. The possibility of digital discrimination i...
[MINI] The Bonferroni Correction
22 Jan 2016
Contributed by Lukas
Today's episode begins by asking how many left handed employees we should expect to be at a company before anyone should claim left handedness discrim...
Detecting Pseudo-profound BS
15 Jan 2016
Contributed by Lukas
A recent paper in the journal of Judgment and Decision Making titled On the reception and detection of pseudo-profound bullshit explores empirical qu...
[MINI] Gradient Descent
08 Jan 2016
Contributed by Lukas
Today's mini episode discusses the widely known optimization algorithm gradient descent in the context of hiking in a foggy hillside.
Let's Kill the Word Cloud
01 Jan 2016
Contributed by Lukas
This episode is a discussion of data visualization and a proposed New Year's resolution for Data Skeptic listeners. Let's kill the word cloud.
2015 Holiday Special
25 Dec 2015
Contributed by Lukas
Today's episode is a reading of Isaac Asimov's The Machine that Won the War. I can't think of a story that's more appropriate for Data Skeptic.
Wikipedia Revision Scoring as a Service
18 Dec 2015
Contributed by Lukas
In this interview with Aaron Halfaker of the Wikimedia Foundation, we discuss his research and career related to the study of Wikipedia. In his paper ...
[MINI] Term Frequency - Inverse Document Frequency
11 Dec 2015
Contributed by Lukas
Today's topic is term frequency inverse document frequency, which is a statistic for estimating the importance of words and phrases in a set of docume...
The Hunt for Vulcan
04 Dec 2015
Contributed by Lukas
Early astronomers could see several of the planets with the naked eye. The invention of the telescope allowed for further understanding of our solar s...
[MINI] The Accuracy Paradox
27 Nov 2015
Contributed by Lukas
Today's episode discusses the accuracy paradox. There are cases when one might prefer a less accurate model because it yields more predictive power or...
Neuroscience from a Data Scientist's Perspective
20 Nov 2015
Contributed by Lukas
... or should this have been called data science from a neuroscientist's perspective? Either way, I'm sure you'll enjoy this discussion with Laurie Sk...
[MINI] Bias Variance Tradeoff
13 Nov 2015
Contributed by Lukas
A discussion of the expected number of cars at a stoplight frames today's discussion of the bias variance tradeoff. The central ideal of this concept ...
Big Data Doesn't Exist
06 Nov 2015
Contributed by Lukas
The recent opinion piece Big Data Doesn't Exist on Tech Crunch by Slater Victoroff is an interesting discussion about the usefulness of data both big ...
[MINI] Covariance and Correlation
30 Oct 2015
Contributed by Lukas
The degree to which two variables change together can be calculated in the form of their covariance. This value can be normalized to the correlation c...
Bayesian A/B Testing
23 Oct 2015
Contributed by Lukas
Today's guest is Cameron Davidson-Pilon. Cameron has a masters degree in quantitative finance from the University of Waterloo. Think of it as statisti...
[MINI] The Central Limit Theorem
16 Oct 2015
Contributed by Lukas
The central limit theorem is an important statistical result which states that typically, the mean of a large enough set of independent trials is appr...
Accessible Technology
09 Oct 2015
Contributed by Lukas
Today's guest is Chris Hofstader (@gonz_blinko), an accessibility researcher and advocate, as well as an activist for causes such as improving access ...
[MINI] Multi-armed Bandit Problems
02 Oct 2015
Contributed by Lukas
The multi-armed bandit problem is named with reference to slot machines (one armed bandits). Given the chance to play from a pool of slot machines, al...
Shakespeare, Abiogenesis, and Exoplanets
25 Sep 2015
Contributed by Lukas
Our episode this week begins with a correction. Back in episode 28 (Monkeys on Typewriters), Kyle made some bold claims about the probability that mon...
[MINI] Sample Sizes
18 Sep 2015
Contributed by Lukas
There are several factors that are important to selecting an appropriate sample size and dealing with small samples. The most important questions are ...
The Model Complexity Myth
11 Sep 2015
Contributed by Lukas
There's an old adage which says you cannot fit a model which has more parameters than you have data. While this is often the case, it's not a universa...
[MINI] Distance Measures
04 Sep 2015
Contributed by Lukas
There are many occasions in which one might want to know the distance or similarity between two things, for which the means of calculating that distan...
ContentMine
28 Aug 2015
Contributed by Lukas
ContentMine is a project which provides the tools and workflow to convert scientific literature into machine readable and machine interpretable data i...
[MINI] Structured and Unstructured Data
21 Aug 2015
Contributed by Lukas
Today's mini-episode explains the distinction between structured and unstructured data, and debates which of these categories best describe recipes.
Measuring the Influence of Fashion Designers
14 Aug 2015
Contributed by Lukas
Yusan Lin shares her research on using data science to explore the fashion industry in this episode. She has applied techniques from data mining, natu...
[MINI] PageRank
07 Aug 2015
Contributed by Lukas
PageRank is the algorithm most famous for being one of the original innovations that made Google stand out as a search engine. It was defined in the c...
Data Science at Work in LA County
29 Jul 2015
Contributed by Lukas
In this episode, Benjamin Uminsky enlightens us about some of the ways the Los Angeles County Registrar-Recorder/County Clerk leverages data science a...
[MINI] k-Nearest Neighbors
24 Jul 2015
Contributed by Lukas
This episode explores the k-nearest neighbors algorithm which is an unsupervised, non-parametric method that can be used for both classification and r...
Crypto
17 Jul 2015
Contributed by Lukas
How do people think rationally about small probability events? What is the optimal statistical process by which one can update their beliefs in light ...
[MINI] MapReduce
10 Jul 2015
Contributed by Lukas
This mini-episode is a high level explanation of the basic idea behind MapReduce, which is a fundamental concept in big data. The origin of the idea ...
Genetically Engineered Food and Trends in Herbicide Usage
03 Jul 2015
Contributed by Lukas
The Credible Hulk joins me in this episode to discuss a recent blog post he wrote about glyphosate and the data about how it's introduction chang...
[MINI] The Curse of Dimensionality
26 Jun 2015
Contributed by Lukas
More features are not always better! With an increasing number of features to consider, machine learning algorithms suffer from the curse of dimensio...
Video Game Analytics
19 Jun 2015
Contributed by Lukas
This episode discusses video game analytics with guest Anders Drachen. The way in which people get access to games and the opportunity for game desi...
[MINI] Anscombe's Quartet
12 Jun 2015
Contributed by Lukas
This mini-episode discusses Anscombe's Quartet, a series of four datasets which are clearly very different but share some similar statistical proper...
Proposing Annoyance Mining
09 Jun 2015
Contributed by Lukas
A recent episode of the Skeptics Guide to the Universe included a slight rant by Dr. Novella and the rouges about a shortcoming in operating systems. ...
Preserving History at Cyark
05 Jun 2015
Contributed by Lukas
Elizabeth Lee from CyArk joins us in this episode to share stories of the work done capturing important historical sites digitally. CyArk is a non-...
[MINI] A Critical Examination of a Study of Marriage by Political Affiliation
29 May 2015
Contributed by Lukas
Linhda and Kyle review a New York Times article titled How Your Hometown Affects Your Chances of Marriage. This article explores research about what...
Detecting Cheating in Chess
22 May 2015
Contributed by Lukas
With the advent of algorithms capable of beating highly ranked chess players, the temptation to cheat has emmerged as a potential threat to the integ...
[MINI] z-scores
15 May 2015
Contributed by Lukas
This week's episode dicusses z-scores, also known as standard score. This score describes the distance (in standard deviations) that an observation i...
Using Data to Help Those in Crisis
08 May 2015
Contributed by Lukas
This week Noelle Sio Saldana discusses her volunteer work at Crisis Text Line - a 24/7 service that connects anyone with crisis counselors. In the ep...
The Ghost in the MP3
01 May 2015
Contributed by Lukas
Have you ever wondered what is lost when you compress a song into an MP3? This week's guest Ryan Maguire did more than that. He worked on software to...
Data Fest 2015
28 Apr 2015
Contributed by Lukas
This episode contains converage of the 2015 Data Fest hosted at UCLA. Data Fest is an analysis competition that gives teams of students 48 hours to...
[MINI] Cornbread and Overdispersion
24 Apr 2015
Contributed by Lukas
For our 50th episode we enduldge a bit by cooking Linhda's previously mentioned "healthy" cornbread. This leads to a discussion of the statistical ...
[MINI] Natural Language Processing
17 Apr 2015
Contributed by Lukas
This episode overviews some of the fundamental concepts of natural language processing including stemming, n-grams, part of speech tagging, and th ba...
Computer-based Personality Judgments
10 Apr 2015
Contributed by Lukas
Guest Youyou Wu discuses the work she and her collaborators did to measure the accuracy of computer based personality judgments. Using Facebook "like...
[MINI] Markov Chain Monte Carlo
03 Apr 2015
Contributed by Lukas
This episode explores how going wine testing could teach us about using markov chain monte carlo (mcmc).
[MINI] Markov Chains
20 Mar 2015
Contributed by Lukas
This episode introduces the idea of a Markov Chain. A Markov Chain has a set of states describing a particular system, and a probability of moving fr...
Oceanography and Data Science
13 Mar 2015
Contributed by Lukas
Nicole Goebel joins us this week to share her experiences in oceanography studying phytoplankton and other aspects of the ocean and how data plays a ...
[MINI] Ordinary Least Squares Regression
06 Mar 2015
Contributed by Lukas
This episode explores Ordinary Least Squares or OLS - a method for finding a good fit which describes a given dataset.
NYC Speed Camera Analysis with Tim Schmeier
27 Feb 2015
Contributed by Lukas
New York State approved the use of automated speed cameras within a specific range of schools. Tim Schmeier did an analysis of publically available d...
[MINI] k-means clustering
20 Feb 2015
Contributed by Lukas
The k-means clustering algorithm is an algorithm that computes a deterministic label for a given "k" number of clusters from an n-dimensional datset. ...
Shadow Profiles on Social Networks
13 Feb 2015
Contributed by Lukas
Emre Sarigol joins me this week to discuss his paper Online Privacy as a Collective Phenomenon. This paper studies data collected from social netwo...
[MINI] The Chi-Squared Test
06 Feb 2015
Contributed by Lukas
The Chi-Squared test is a methodology for hypothesis testing. When one has categorical data, in the form of frequency counts or observations (e.g. Veg...
Mapping Reddit Topics with Randy Olson
30 Jan 2015
Contributed by Lukas
My quest this week is noteworthy a.i. researcher Randy Olson who joins me to share his work creating the Reddit World Map - a visualization that ...
[MINI] Partially Observable State Spaces
23 Jan 2015
Contributed by Lukas
When dealing with dynamic systems that are potentially undergoing constant change, its helpful to describe what "state" they are in. In many applica...
Easily Fooling Deep Neural Networks
16 Jan 2015
Contributed by Lukas
My guest this week is Anh Nguyen, a PhD student at the University of Wyoming working in the Evolving AI lab. The episode discusses the paper Deep N...
[MINI] Data Provenance
09 Jan 2015
Contributed by Lukas
This episode introduces a high level discussion on the topic of Data Provenance, with more MINI episodes to follow to get into specific topics. Thank...
Doubtful News, Geology, Investigating Paranormal Groups, and Thinking Scientifically with Sharon Hill
03 Jan 2015
Contributed by Lukas
I had the change to speak with well known Sharon Hill (@idoubtit) for the first episode of 2015. We discuss a number of interesting topics includin...
[MINI] Belief in Santa
26 Dec 2014
Contributed by Lukas
In this quick holiday episode, we touch on how one would approach modeling the statistical distribution over the probability of belief in Santa Claus ...
Economic Modeling and Prediction, Charitable Giving, and a Follow Up with Peter Backus
19 Dec 2014
Contributed by Lukas
Economist Peter Backus joins me in this episode to discuss a few interesting topics. You may recall Linhda and I previously discussed his paper "The ...
[MINI] The Battle of the Sexes
12 Dec 2014
Contributed by Lukas
Love and Data is the continued theme in this mini-episode as we discuss the game theory example of The Battle of the Sexes. In this textbook example,...
The Science of Online Data at Plenty of Fish with Thomas Levi
05 Dec 2014
Contributed by Lukas
Can algorithms help you find love? Many happy couples successfully brought together via online dating websites show us that data science can help you...
[MINI] The Girlfriend Equation
28 Nov 2014
Contributed by Lukas
Economist Peter Backus put forward "The Girlfriend Equation" while working on his PhD - a probabilistic model attempting to estimate the likelihood o...
The Secret and the Global Consciousness Project with Alex Boklin
21 Nov 2014
Contributed by Lukas
I'm joined this week by Alex Boklin to explore the topic of magical thinking especially in the context of Rhonda Byrne's "The Secret", and the simila...
[MINI] Monkeys on Typewriters
14 Nov 2014
Contributed by Lukas
What is randomness? How can we determine if some results are randomly generated or not? Why are random numbers important to us in our everyday life? T...
Mining the Social Web with Matthew Russell
07 Nov 2014
Contributed by Lukas
This week's episode explores the possibilities of extracting novel insights from the many great social web APIs available. Matthew Russell's Mining ...
[MINI] Is the Internet Secure?
31 Oct 2014
Contributed by Lukas
This episode explores the basis of why we can trust encryption. Suprisingly, a discussion of looking up a word in the dictionary (binary search) and...
Practicing and Communicating Data Science with Jeff Stanton
24 Oct 2014
Contributed by Lukas
Jeff Stanton joins me in this episode to discuss his book An Introduction to Data Science, and some of the unique challenges and issues faced by som...
[MINI] The T-Test
17 Oct 2014
Contributed by Lukas
The t-test is this week's mini-episode topic. The t-test is a statistical testing procedure used to determine if the mean of two datasets differs by a...
Data Myths with Karl Mamer
10 Oct 2014
Contributed by Lukas
This week I'm joined by Karl Mamer to discuss the data behind three well known urban legends. Did a large blackout in New York and surrounding areas ...
Contest Announcement
08 Oct 2014
Contributed by Lukas
The Data Skeptic Podcast is launching a contest- not one of chance, but one of skill. Listeners are encouraged to put their data science skills to go...
[MINI] Selection Bias
03 Oct 2014
Contributed by Lukas
A discussion about conducting US presidential election polls helps frame a converation about selection bias.
[MINI] Confidence Intervals
26 Sep 2014
Contributed by Lukas
Commute times and BBQ invites help frame a discussion about the statistical concept of confidence intervals.
[MINI] Value of Information
19 Sep 2014
Contributed by Lukas
A discussion about getting ready in the morning, negotiating a used car purchase, and selecting the best AirBnB place to stay at help frame a conversa...
Game Science Dice with Louis Zocchi
17 Sep 2014
Contributed by Lukas
In this bonus episode, guest Louis Zocchi discusses his background in the gaming industry, specifically, how he became a manufacturer of dice designed...
Data Science at ZestFinance with Marick Sinay
12 Sep 2014
Contributed by Lukas
Marick Sinay from ZestFianance is our guest this weel. This episode explores how data science techniques are applied in the financial world, specifi...
[MINI] Decision Tree Learning
05 Sep 2014
Contributed by Lukas
Linhda and Kyle talk about Decision Tree Learning in this miniepisode. Decision Tree Learning is the algorithmic process of trying to generate an op...
Jackson Pollock Authentication Analysis with Kate Jones-Smith
29 Aug 2014
Contributed by Lukas
Our guest this week is Hamilton physics professor Kate Jones-Smith who joins us to discuss the evidence for the claim that drip paintings of Jackson...
[MINI] Noise!!
22 Aug 2014
Contributed by Lukas
Our topic for this week is "noise" as in signal vs. noise. This is not a signal processing discussions, but rather a brief introduction to how the w...
Guerilla Skepticism on Wikipedia with Susan Gerbic
15 Aug 2014
Contributed by Lukas
Our guest this week is Susan Gerbic. Susan is a skeptical activist involved in many activities, the one we focus on most in this episode is Guerrilla...
[MINI] Ant Colony Optimization
08 Aug 2014
Contributed by Lukas
In this week's mini episode, Linhda and Kyle discuss Ant Colony Optimization - a numerical / stochastic optimization technique which models its search...
Data in Healthcare IT with Shahid Shah
01 Aug 2014
Contributed by Lukas
Our guest this week is Shahid Shah. Shahid is CEO at Netspective, and writes three blogs: Health Care Guy, Shahid Shah, and HitSphere - the Healthcare...
[MINI] Cross Validation
25 Jul 2014
Contributed by Lukas
This miniepisode discusses the technique called Cross Validation - a process by which one randomly divides up a dataset into numerous small partitions...
Streetlight Outage and Crime Rate Analysis with Zach Seeskin
18 Jul 2014
Contributed by Lukas
This episode features a discussion with statistics PhD student Zach Seeskin about a project he was involved in as part of the Eric and Wendy Schmidt D...
[MINI] Experimental Design
11 Jul 2014
Contributed by Lukas
This episode loosely explores the topic of Experimental Design including hypothesis testing, the importance of statistical tests, and an everyday and ...
The Right (big data) Tool for the Job with Jay Shankar
07 Jul 2014
Contributed by Lukas
In this week's episode, we discuss applied solutions to big data problem with big data engineer Jay Shankar. The episode explores approaches and des...
[MINI] Bayesian Updating
27 Jun 2014
Contributed by Lukas
In this minisode, we discuss Bayesian Updating - the process by which one can calculate the most likely hypothesis might be true given one's older / p...
Personalized Medicine with Niki Athanasiadou
20 Jun 2014
Contributed by Lukas
In the second full length episode of the podcast, we discuss the current state of personalized medicine and the advancements in genetics that have ma...
[MINI] p-values
13 Jun 2014
Contributed by Lukas
In this mini, we discuss p-values and their use in hypothesis testing, in the context of an hypothetical experiment on plant flowering, and end with a...
Advertising Attribution with Nathan Janos
06 Jun 2014
Contributed by Lukas
A conversation with Convertro's Nathan Janos about methodologies used to help advertisers understand the affect each of their marketing efforts (print...
[MINI] type i / type ii errors
30 May 2014
Contributed by Lukas
In this first mini-episode of the Data Skeptic Podcast, we define and discuss type i and type ii errors (a.k.a. false positives and false negatives).
Introduction
23 May 2014
Contributed by Lukas
The Data Skeptic Podcast features conversations with topics related to data science, statistics, machine learning, artificial intelligence and the lik...