Data Skeptic
Episodes
Team Data Science Process
03 Dec 2019
Contributed by Lukas
Buck Woody joins Kyle to share experiences from the field and the application of the Team Data Science Process - a popular six-phase workflow for doi...
Ancient Text Restoration
01 Dec 2019
Contributed by Lukas
Thea Sommerschield joins us this week to discuss the development of Pythia - a machine learning model trained to assist in the reconstruction of anci...
ML Ops
27 Nov 2019
Contributed by Lukas
Kyle met up with Damian Brady at MS Ignite 2019 to discuss machine learning operations.
Annotator Bias
23 Nov 2019
Contributed by Lukas
The modern deep learning approaches to natural language processing are voracious in their demands for large corpora to train on. Folk wisdom estimat...
NLP for Developers
20 Nov 2019
Contributed by Lukas
While at MS Build 2019, Kyle sat down with Lance Olson from the Applied AI team about how tools like cognitive services and cognitive search enable no...
Indigenous American Language Research
13 Nov 2019
Contributed by Lukas
Manuel Mager joins us to discuss natural language processing for low and under-resourced languages. We discuss current work in this area and the Nak...
Talking to GPT-2
31 Oct 2019
Contributed by Lukas
GPT-2 is yet another in a succession of models like ELMo and BERT which adopt a similar deep learning architecture and train an unsupervised model on ...
Reproducing Deep Learning Models
23 Oct 2019
Contributed by Lukas
Rajiv Shah attempted to reproduce an earthquake-predicting deep learning model. His results exposed some issues with the model. Kyle and Rajiv dis...
What BERT is Not
14 Oct 2019
Contributed by Lukas
Allyson Ettinger joins us to discuss her work in computational linguistics, specifically in exploring some of the ways in which the popular natural l...
SpanBERT
08 Oct 2019
Contributed by Lukas
Omer Levy joins us to discuss "SpanBERT: Improving Pre-training by Representing and Predicting Spans". https://arxiv.org/abs/1907.10529
BERT is Shallow
23 Sep 2019
Contributed by Lukas
Tim Niven joins us this week to discuss his work exploring the limits of what BERT can do on certain natural language tasks such as adversarial attack...
BERT is Magic
16 Sep 2019
Contributed by Lukas
Kyle pontificates on how impressed he is with BERT.
Applied Data Science in Industry
06 Sep 2019
Contributed by Lukas
Kyle sits down with Jen Stirrup to inquire about her experiences helping companies deploy data science solutions in a variety of different settings.
Building the howto100m Video Corpus
19 Aug 2019
Contributed by Lukas
Video annotation is an expensive and time-consuming process. As a consequence, the available video datasets are useful but small. The availability of ...
BERT
29 Jul 2019
Contributed by Lukas
Kyle provides a non-technical overview of why Bidirectional Encoder Representations from Transformers (BERT) is a powerful tool for natural language p...
Onnx
22 Jul 2019
Contributed by Lukas
Kyle interviews Prasanth Pulavarthi about the Onnx format for deep neural networks.
Catastrophic Forgetting
15 Jul 2019
Contributed by Lukas
Kyle and Linhda discuss some high level theory of mind and overview the concept machine learning concept of catastrophic forgetting.
Transfer Learning
08 Jul 2019
Contributed by Lukas
Sebastian Ruder is a research scientist at DeepMind. In this episode, he joins us to discuss the state of the art in transfer learning and his cont...
Facebook Bargaining Bots Invented a Language
21 Jun 2019
Contributed by Lukas
In 2017, Facebook published a paper called Deal or No Deal? End-to-End Learning for Negotiation Dialogues. In this research, the reinforcement learni...
Under Resourced Languages
15 Jun 2019
Contributed by Lukas
Priyanka Biswas joins us in this episode to discuss natural language processing for languages that do not have as many resources as those that are mor...
Named Entity Recognition
08 Jun 2019
Contributed by Lukas
Kyle and Linh Da discuss the class of approaches called "Named Entity Recognition" or NER. NER algorithms take any string as input and return a list...
The Death of a Language
01 Jun 2019
Contributed by Lukas
USC students from the CAIS++ student organization have created a variety of novel projects under the mission statement of "artificial intelligence f...
Neural Turing Machines
25 May 2019
Contributed by Lukas
Kyle and Linh Da discuss the concepts behind the neural Turing machine.
Data Infrastructure in the Cloud
18 May 2019
Contributed by Lukas
Kyle chats with Rohan Kumar about hyperscale, data at the edge, and a variety of other trends in data engineering in the cloud.
NCAA Predictions on Spark
11 May 2019
Contributed by Lukas
In this episode, Kyle interviews Laura Edell at MS Build 2019. The conversation covers a number of topics, notably her NCAA Final 4 prediction model...
The Transformer
03 May 2019
Contributed by Lukas
Kyle and Linhda discuss attention and the transformer - an encoder/decoder architecture that extends the basic ideas of vector embeddings like word2ve...
Mapping Dialects with Twitter Data
26 Apr 2019
Contributed by Lukas
When users on Twitter post with geographic tags, it creates the opportunity for a variety of interesting questions to be posed having to do with langu...
Sentiment Analysis
20 Apr 2019
Contributed by Lukas
This is an interview with Ellen Loeshelle, Director of Product Management at Clarabridge. We primarily discuss sentiment analysis.
Attention Primer
13 Apr 2019
Contributed by Lukas
A gentle introduction to the very high-level idea of "attention" in machine learning, as it will play a major role in some upcoming episodes over the ...
Cross-lingual Short-text Matching
05 Apr 2019
Contributed by Lukas
Modern messaging technology has facilitated a trend towards highly compact, short messages send by users who can presume a great amount of context hel...
ELMo
29 Mar 2019
Contributed by Lukas
ELMo (Embeddings from Language Models) introduced the idea of deep contextualized word representations. It extends previous ideas like word2vec and Gl...
BLEU
23 Mar 2019
Contributed by Lukas
Bilingual evaluation understudy (or BLEU) is a metric for evaluating the quality of machine translation using human translation as examples of accepta...
Simultaneous Translation at Baidu
15 Mar 2019
Contributed by Lukas
While at NeurIPS 2018, Kyle chatted with Liang Huang about his work with Baidu research on simultaneous translation, which was demoed at the conferenc...
Human vs Machine Transcription
08 Mar 2019
Contributed by Lukas
Machine transcription (the process of translating audio recordings of language to text) has come a long way in recent years. But how do the errors mad...
seq2seq
01 Mar 2019
Contributed by Lukas
A sequence to sequence (or seq2seq) model is neural architecture used for translation (and other tasks) which consists of an encoder and a decoder. Th...
Text Mining in R
22 Feb 2019
Contributed by Lukas
Kyle interviews Julia Silge about her path into data science, her book Text Mining with R, and some of the ways in which she's used natural languag...
Recurrent Relational Networks
15 Feb 2019
Contributed by Lukas
One of the most challenging NLP tasks is natural language understanding and reasoning. How can we construct algorithms that are able to achieve human ...
Text World and Word Embedding Lower Bounds
08 Feb 2019
Contributed by Lukas
In the first half of this episode, Kyle speaks with Marc-Alexandre Côté and Wendy Tay about Text World. Text World is an engine that simulates tex...
word2vec
01 Feb 2019
Contributed by Lukas
Word2vec is an unsupervised machine learning model which is able to capture semantic information from the text it is trained on. The model is based on...
Authorship Attribution
25 Jan 2019
Contributed by Lukas
In a recent paper, Leveraging Discourse Information Effectively for Authorship Attribution, authors Su Wang, Elisa Ferracane, and Raymond J. Mooney de...
Very Large Corpora and Zipf's Law
18 Jan 2019
Contributed by Lukas
The earliest efforts to apply machine learning to natural language tended to convert every token (every word, more or less) into a unique feature. Whi...
Semantic search at Github
11 Jan 2019
Contributed by Lukas
Github is many things besides source control. It's a social network, even though not everyone realizes it. It's a vast repository of code. It's a tick...
Let's Talk About Natural Language Processing
04 Jan 2019
Contributed by Lukas
This episode reboots our podcast with the theme of Natural Language Processing for the next few months. We begin with introductions of Yoshi and Linh ...
Data Science Hiring Processes
28 Dec 2018
Contributed by Lukas
Kyle shares a few thoughts on mistakes observed by job applicants and also shares a few procedural insights listeners at early stages in their careers...
Holiday Reading - Epicac
25 Dec 2018
Contributed by Lukas
Epicac by Kurt Vonnegut.
Drug Discovery with Machine Learning
21 Dec 2018
Contributed by Lukas
In today's episode, Kyle chats with Alexander Zhebrak, CTO of Insilico Medicine, Inc. Insilico self describes as artificial intelligence for drug disc...
Sign Language Recognition
14 Dec 2018
Contributed by Lukas
At the NeurIPS 2018 conference, Stradigi AI premiered a training game which helps players learn American Sign Language. This episode brings the fir...
Data Ethics
07 Dec 2018
Contributed by Lukas
This week, Kyle interviews Scott Nestler on the topic of Data Ethics. Today, no ubiquitous, formal ethical protocol exists for data science, althoug...
Escaping the Rabbit Hole
30 Nov 2018
Contributed by Lukas
Kyle interviews Mick West, author of Escaping the Rabbit Hole: How to Debunk Conspiracy Theories Using Facts, Logic, and Respect about the nature of...
[MINI] Theorem Provers
23 Nov 2018
Contributed by Lukas
Fake news attempts to lead readers/listeners/viewers to conclusions that are not descriptions of reality. They do this most often by presenting fals...
Automated Fact Checking
16 Nov 2018
Contributed by Lukas
Fake news can be responded to with fact-checking. However, it's easier to create fake news than the fact check it. Full Fact is the UK's independent ...
[MINI] Single Source of Truth
09 Nov 2018
Contributed by Lukas
In mathematics, truth is universal. In data, truth lies in the where clause of the query. As large organizations have grown to rely on their data mo...
Detecting Fast Radio Bursts with Deep Learning
02 Nov 2018
Contributed by Lukas
Fast radio bursts are an astrophysical phenomenon first observed in 2007. While many observations have been made, science has yet to explain the mecha...
Being Bayesian
26 Oct 2018
Contributed by Lukas
This episode explores the root concept of what it is to be Bayesian: describing knowledge of a system probabilistically, having an appropriate prior p...
Modeling Fake News
19 Oct 2018
Contributed by Lukas
This is our interview with Dorje Brody about his recent paper with David Meier, How to model fake news. This paper uses the tools of communication the...
The Louvain Method for Community Detection
12 Oct 2018
Contributed by Lukas
Without getting into definitions, we have an intuitive sense of what a "community" is. The Louvain Method for Community Detection is one of the best k...
Cultural Cognition of Scientific Consensus
05 Oct 2018
Contributed by Lukas
In this episode, our guest is Dan Kahan about his research into how people consume and interpret science news. In an era of fake news, motivated reaso...
False Discovery Rates
28 Sep 2018
Contributed by Lukas
A false discovery rate (FDR) is a methodology that can be useful when struggling with the problem of multiple comparisons. In any experiment, if the e...
Deep Fakes
21 Sep 2018
Contributed by Lukas
Digital videos can be described as sequences of still images and associated audio. Audio is easy to fake. What about video? A video can easily be brok...
Fake News Midterm
14 Sep 2018
Contributed by Lukas
In this episode, Kyle reviews what we've learned so far in our series on Fake News and talks briefly about where we're going next.
Quality Score
07 Sep 2018
Contributed by Lukas
Two weeks ago we discussed click through rates or CTRs and their usefulness and limits as a metric. Today, we discuss a related metric known as qual...
The Knowledge Illusion
31 Aug 2018
Contributed by Lukas
Kyle interviews Steven Sloman, Professor in the school of Cognitive, Linguistic, and Psychological Sciences at Brown University. Steven is co-author o...
Click Through Rates
24 Aug 2018
Contributed by Lukas
A Click Through Rate (CTR) is the proportion of clicks to impressions of some item of content shared online. This terminology is most commonly used in...
Algorithmic Detection of Fake News
17 Aug 2018
Contributed by Lukas
The scale and frequency with which information can be distributed on social media makes the problem of fake news a rapidly metastasizing issue. To do ...
Ant Intelligence
10 Aug 2018
Contributed by Lukas
If you prepared a list of creatures regarded as highly intelligent, it's unlikely ants would make the cut. This is expected, as on an individual level...
Human Detection of Fake News
03 Aug 2018
Contributed by Lukas
With publications such as "Prior exposure increases perceived accuracy of fake news", "Lazy, not biased: Susceptibility to partisan fake news is bett...
Spam Filtering with Naive Bayes
27 Jul 2018
Contributed by Lukas
Today's spam filters are advanced data driven tools. They rely on a variety of techniques to effectively and often seamlessly filter out junk email fr...
The Spread of Fake News
20 Jul 2018
Contributed by Lukas
How does fake news get spread online? Its not just a matter of manipulating search algorithms. The social platforms for sharing play a major role in t...
Fake News
13 Jul 2018
Contributed by Lukas
This episode kicks off our new theme of "Fake News" with guests Robert Sheaffer and Brad Schwartz. Fake news is a new label for an old idea. For ou...
Dev Ops for Data Science
11 Jul 2018
Contributed by Lukas
We revisit the 2018 Microsoft Build in this episode, focusing on the latest ideas in DevOps. Kyle interviews Cloud Developer Advocates Damien Brady, P...
First Order Logic
06 Jul 2018
Contributed by Lukas
Logic is a fundamental of mathematical systems. It's roots are the values true and false and it's power is in what it's rules allow you to prove. Prep...
Blind Spots in Reinforcement Learning
29 Jun 2018
Contributed by Lukas
An intelligent agent trained in a simulated environment may be prone to making mistakes in the real world due to discrepancies between the training an...
Defending Against Adversarial Attacks
22 Jun 2018
Contributed by Lukas
In this week's episode, our host Kyle interviews Gokula Krishnan from ETH Zurich, about his recent contributions to defenses against adversarial attac...
Transfer Learning
15 Jun 2018
Contributed by Lukas
On a long car ride, Linhda and Kyle record a short episode. This discussion is about transfer learning, a technique using in machine learning to lever...
Medical Imaging Training Techniques
08 Jun 2018
Contributed by Lukas
Medical imaging is a highly effective tool used by clinicians to diagnose a wide array of diseases and injuries. However, it often requires exceptiona...
Kalman Filters
01 Jun 2018
Contributed by Lukas
Thanks to our sponsor Galvanize A Kalman Filter is a technique for taking a sequence of observations about an object or variable and determining the ...
AI in Industry
25 May 2018
Contributed by Lukas
There's so much to discuss on the AI side, it's hard to know where to begin. Luckily, Steve Guggenheimer, Microsoft's corporate vice president of AI...
AI in Games
18 May 2018
Contributed by Lukas
Today's interview is with the authors of the textbook Artificial Intelligence and Games.
Game Theory
11 May 2018
Contributed by Lukas
Thanks to our sponsor The Great Courses. This week's episode is a short primer on game theory. For tickets to the free Data Skeptic meetup in Chicago...
The Experimental Design of Paranormal Claims
04 May 2018
Contributed by Lukas
In this episode of Data Skeptic, Kyle chats with Jerry Schwarz from the Independent Investigations Group (IIG)'s SF Bay Area chapter about testing c...
Winograd Schema Challenge
27 Apr 2018
Contributed by Lukas
Our guest this week, Hector Levesque, joins us to discuss an alternative way to measure a machine's intelligence, called Winograd Schemas Challenge. ...
The Imitation Game
20 Apr 2018
Contributed by Lukas
This week on Data Skeptic, we begin with a skit to introduce the topic of this show: The Imitation Game. We open with a scene in the distant future. T...
Eugene Goostman
13 Apr 2018
Contributed by Lukas
In this episode, Kyle shares his perspective on the chatbot Eugene Goostman which (some claim) "passed" the Turing Test. As a second topic Kyle also d...
The Theory of Formal Languages
06 Apr 2018
Contributed by Lukas
In this episode, Kyle and Linhda discuss the theory of formal languages. Any language can (theoretically) be a formal language. The requirement is tha...
The Loebner Prize
30 Mar 2018
Contributed by Lukas
The Loebner Prize is a competition in the spirit of the Turing Test. Participants are welcome to submit conversational agent software to be judged b...
Chatbots
23 Mar 2018
Contributed by Lukas
In this episode, Kyle chats with Vince from iv.ai and Heather Shapiro who works on the Microsoft Bot Framework. We solicit their advice on building a ...
The Master Algorithm
16 Mar 2018
Contributed by Lukas
In this week's episode, Kyle Polich interviews Pedro Domingos about his book, The Master Algorithm: How the quest for the ultimate learning machine w...
The No Free Lunch Theorems
09 Mar 2018
Contributed by Lukas
What's the best machine learning algorithm to use? I hear that XGBoost wins most of the Kaggle competitions that aren't won with deep learning. Should...
ML at Sloan Kettering Cancer Center
02 Mar 2018
Contributed by Lukas
For a long time, physicians have recognized that the tools they have aren't powerful enough to treat complex diseases, like cancer. In addition to dat...
Optimal Decision Making with POMDPs
23 Feb 2018
Contributed by Lukas
In a previous episode, we discussed Markov Decision Processes or MDPs, a framework for decision making and planning. This episode explores the general...
AI Decision-Making
16 Feb 2018
Contributed by Lukas
Making a decision is a complex task. Today's guest Dongho Kim discusses how he and his team at Prowler has been building a platform that will be acces...
[MINI] Reinforcement Learning
09 Feb 2018
Contributed by Lukas
In many real world situations, a person/agent doesn't necessarily know their own objectives or the mechanics of the world they're interacting with. Ho...
Evolutionary Computation
02 Feb 2018
Contributed by Lukas
In this week's episode, Kyle is joined by Risto Miikkulainen, a professor of computer science and neuroscience at the University of Texas at Austin. T...
[MINI] Markov Decision Processes
26 Jan 2018
Contributed by Lukas
Formally, an MDP is defined as the tuple containing states, actions, the transition function, and the reward function. This podcast examines each of t...
Neuroscience Frontiers
19 Jan 2018
Contributed by Lukas
Last week on Data Skeptic, we visited the Laboratory of Neuroimaging, or LONI, at USC and learned about their data-driven platform that enables scient...
Neuroimaging and Big Data
12 Jan 2018
Contributed by Lukas
Last year, Kyle had a chance to visit the Laboratory of Neuroimaging, or LONI, at USC, and learn about how some researchers are using data science to ...
The Agent Model of Artificial Intelligence
05 Jan 2018
Contributed by Lukas
In artificial intelligence, the term 'agent' is used to mean an autonomous, thinking agent with the ability to interact with their environment. An age...
Artificial Intelligence, a Podcast Approach
29 Dec 2017
Contributed by Lukas
This episode kicks off the next theme on Data Skeptic: artificial intelligence. Kyle discusses what's to come for the show in 2018, why this topic i...
Holiday reading 2017
22 Dec 2017
Contributed by Lukas
We break format from our regular programming today and bring you an excerpt from Max Tegmark's book "Life 3.0". The first chapter is a short story t...
Complexity and Cryptography
15 Dec 2017
Contributed by Lukas
This week, our host Kyle Polich is joined by guest Tim Henderson from Google to talk about the computational complexity foundations of modern cryptogr...