Data Engineering Podcast
Episodes
Make Sure Your Records Are Reliable With The BookKeeper Distributed Storage Layer
09 Jun 2021
Contributed by Lukas
Summary The way to build maintainable software and systems is through composition of individual pieces. By making those pieces high quality and flexi...
Build Your Analytics With A Collaborative And Expressive SQL IDE Using Querybook
03 Jun 2021
Contributed by Lukas
Summary SQL is the most widely used language for working with data, and yet the tools available for writing and collaborating on it are still clunky ...
Making Data Pipelines Self-Serve For Everyone With Shipyard
02 Jun 2021
Contributed by Lukas
Summary Every part of the business relies on data, yet only a small team has the context and expertise to build and maintain workflows and data pipel...
Paving The Road For Fast Analytics On Distributed Clouds With The Yellowbrick Data Warehouse
28 May 2021
Contributed by Lukas
Summary The data warehouse has become the focal point of the modern data platform. With increased usage of data across businesses, and a diversity of...
Easily Build Advanced Similarity Search With The Pinecone Vector Database
25 May 2021
Contributed by Lukas
Summary Machine learning models use vectors as the natural mechanism for representing their internal state. The problem is that in order for the mode...
A Holistic Approach To Data Governance Through Self Reflection At Collibra
21 May 2021
Contributed by Lukas
Summary Data governance is a phrase that means many different things to many different people. This is because it is actually a concept that encompas...
Unlocking The Power of Data Lineage In Your Platform with OpenLineage
18 May 2021
Contributed by Lukas
Summary Data lineage is the common thread that ties together all of your data pipelines, workflows, and systems. In order to get a holistic understan...
Building Your Data Warehouse On Top Of PostgreSQL
14 May 2021
Contributed by Lukas
Summary There is a lot of attention on the database market and cloud data warehouses. While they provide a measure of convenience, they also require ...
Making Analytical APIs Fast With Tinybird
11 May 2021
Contributed by Lukas
Summary Building an API for real-time data is a challenging project. Making it robust, scalable, and fast is a full time job. The team at Tinybird wa...
Making Spark Cloud Native At Data Mechanics
07 May 2021
Contributed by Lukas
Summary Spark is one of the most well-known frameworks for data processing, whether for batch or streaming, ETL or ML, and at any scale. Because of i...
The Grand Vision And Present Reality of DataOps
04 May 2021
Contributed by Lukas
Summary The Data industry is changing rapidly, and one of the most active areas of growth is automation of data workflows. Taking cues from the DevOp...
Self Service Data Exploration And Dashboarding With Superset
27 Apr 2021
Contributed by Lukas
Summary The reason for collecting, cleaning, and organizing data is to make it usable by the organization. One of the most common and widely used...
Moving Machine Learning Into The Data Pipeline at Cherre
20 Apr 2021
Contributed by Lukas
Summary Most of the time when you think about a data pipeline or ETL job what comes to mind is a purely mechanistic progression of functions that mov...
Exploring The Expanding Landscape Of Data Professions with Josh Benamram of Databand
13 Apr 2021
Contributed by Lukas
Summary "Business as usual" is changing, with more companies investing in data as a first class concern. As a result, the data team is grow...
Put Your Whole Data Team On The Same Page With Atlan
06 Apr 2021
Contributed by Lukas
Summary One of the biggest obstacles to success in delivering data products is cross-team collaboration. Part of the problem is the difference in the...
Data Quality Management For The Whole Team With Soda Data
30 Mar 2021
Contributed by Lukas
Summary Data quality is on the top of everyone’s mind recently, but getting it right is as challenging as ever. One of the contributing factors...
Real World Change Data Capture At Datacoral
23 Mar 2021
Contributed by Lukas
Summary The world of business is becoming increasingly dependent on information that is accurate up to the minute. For analytical systems, the only w...
Managing The DoorDash Data Platform
16 Mar 2021
Contributed by Lukas
Summary The team at DoorDash has a complex set of optimization challenges to deal with using data that they collect from a multi-sided marketplace. I...
Leave Your Data Where It Is And Automate Feature Extraction With Molecula
09 Mar 2021
Contributed by Lukas
Summary A majority of the time spent in data engineering is copying data between systems to make the information available for different purposes. Th...
Bridging The Gap Between Machine Learning And Operations At Iguazio
02 Mar 2021
Contributed by Lukas
Summary The process of building and deploying machine learning projects requires a staggering number of systems and stakeholders to work in concert. ...
Self Service Open Source Data Integration With AirByte
23 Feb 2021
Contributed by Lukas
Summary Data integration is a critical piece of every data pipeline, yet it is still far from being a solved problem. There are a number of managed p...
Building The Foundations For Data Driven Businesses at 5xData
16 Feb 2021
Contributed by Lukas
Summary Every business aims to be data driven, but not all of them succeed in that effort. In order to be able to truly derive insights from the data...
How Shopify Is Building Their Production Data Warehouse Using DBT
09 Feb 2021
Contributed by Lukas
Summary With all of the tools and services available for building a data platform it can be difficult to separate the signal from the noise. One of t...
System Observability For The Cloud Native Era With Chronosphere
02 Feb 2021
Contributed by Lukas
Summary Collecting and processing metrics for monitoring use cases is an interesting data problem. It is eminently possible to generate millions or b...
Making It Easier To Stick B2B Data Integration Pipelines Together With Hotglue
26 Jan 2021
Contributed by Lukas
Summary Businesses often need to be able to ingest data from their customers in order to power the services that they provide. For each new source th...
Using Your Data Warehouse As The Source Of Truth For Customer Data With Hightouch
19 Jan 2021
Contributed by Lukas
Summary The data warehouse has become the central component of the modern data stack. Building on this pattern, the team at Hightouch have created a ...
Enabling Version Controlled Data Collaboration With TerminusDB
11 Jan 2021
Contributed by Lukas
Summary As data professionals we have a number of tools available for storing, processing, and analyzing data. We also have tools for collaborating o...
Bringing Feature Stores and MLOps to the Enterprise at Tecton
05 Jan 2021
Contributed by Lukas
Summary As more organizations are gaining experience with data management and incorporating analytics into their decision making, their next move is ...
Off The Shelf Data Governance With Satori
28 Dec 2020
Contributed by Lukas
Summary One of the core responsibilities of data engineers is to manage the security of the information that they process. The team at Satori has a b...
Low Friction Data Governance With Immuta
21 Dec 2020
Contributed by Lukas
Summary Data governance is a term that encompasses a wide range of responsibilities, both technical and process oriented. One of the more complex asp...
Building A Self Service Data Platform For Alternative Data Analytics At YipitData
15 Dec 2020
Contributed by Lukas
Summary As a data engineer you’re familiar with the process of collecting data from databases, customer data platforms, APIs, etc. At Yipit...
Proven Patterns For Building Successful Data Teams
07 Dec 2020
Contributed by Lukas
Summary Building data products are complicated by the fact that there are so many different stakeholders with competing goals and priorities. It is a...
Streaming Data Integration Without The Code at Equalum
30 Nov 2020
Contributed by Lukas
Summary The first stage of every good pipeline is to perform data integration. With the increasing pace of change and the need for up to date analyti...
Keeping A Bigeye On The Data Quality Market
23 Nov 2020
Contributed by Lukas
Summary One of the oldest aphorisms about data is "garbage in, garbage out", which is why the current boom in data quality solutions is no ...
Self Service Data Management From Ingest To Insights With Isima
17 Nov 2020
Contributed by Lukas
Summary The core mission of data engineers is to provide the business with a way to ask and answer questions of their data. This often takes the form...
Building A Cost Effective Data Catalog With Tree Schema
10 Nov 2020
Contributed by Lukas
Summary A data catalog is a critical piece of infrastructure for any organization who wants to build analytics products, whether internal or external...
Add Version Control To Your Data Lake With LakeFS
03 Nov 2020
Contributed by Lukas
Summary Data lakes are gaining popularity due to their flexibility and reduced cost of storage. Along with the benefits there are some additional com...
Cloud Native Data Security As Code With Cyral
26 Oct 2020
Contributed by Lukas
Summary One of the most challenging aspects of building a data platform has nothing to do with pipelines and transformations. If you are putting your...
Better Data Quality Through Observability With Monte Carlo
19 Oct 2020
Contributed by Lukas
Summary In order for analytics and machine learning projects to be useful, they require a high degree of data quality. To ensure that your pipelines ...
Rapid Delivery Of Business Intelligence Using Power BI
12 Oct 2020
Contributed by Lukas
Summary Business intelligence efforts are only as useful as the outcomes that they inform. Power BI aims to reduce the time and effort required to go...
Self Service Real Time Data Integration Without The Headaches With Meroxa
05 Oct 2020
Contributed by Lukas
Summary Analytical workloads require a well engineered and well maintained data integration process to ensure that your information is reliable and u...
Speed Up And Simplify Your Streaming Data Workloads With Red Panda
29 Sep 2020
Contributed by Lukas
Summary Kafka has become a de facto standard interface for building decoupled systems and working with streaming data. Despite its widespread popular...
Cutting Through The Noise And Focusing On The Fundamentals Of Data Engineering With The Data Janitor
22 Sep 2020
Contributed by Lukas
Summary Data engineering is a constantly growing and evolving discipline. There are always new tools, systems, and design patterns to learn, which le...
Distributed In Memory Processing And Streaming With Hazelcast
15 Sep 2020
Contributed by Lukas
Summary In memory computing provides significant performance benefits, but brings along challenges for managing failures and scaling up. Hazelcast is...
Simplify Your Data Architecture With The Presto Distributed SQL Engine
07 Sep 2020
Contributed by Lukas
Summary Databases are limited in scope to the information that they directly contain. For analytical use cases you often want to combine data across ...
Building A Better Data Warehouse For The Cloud At Firebolt
01 Sep 2020
Contributed by Lukas
Summary Data warehouse technology has been around for decades and has gone through several generational shifts in that time. The current trends in da...
Metadata Management And Integration At LinkedIn With DataHub
25 Aug 2020
Contributed by Lukas
Summary In order to scale the use of data across an organization there are a number of challenges related to discovery, governance, and integration t...
Exploring The TileDB Universal Data Engine
17 Aug 2020
Contributed by Lukas
Summary Most databases are designed to work with textual data, with some special purpose engines that support domain specific formats. TileDB is a da...
Closing The Loop On Event Data Collection With Iteratively
10 Aug 2020
Contributed by Lukas
Summary Event based data is a rich source of information for analytics, unless none of the event structures are consistent. The team at Iteratively a...
A Practical Introduction To Graph Data Applications
04 Aug 2020
Contributed by Lukas
Summary Finding connections between data and the entities that they represent is a complex problem. Graph data models and the applications built on t...
Build More Reliable Distributed Systems By Breaking Them With Jepsen
28 Jul 2020
Contributed by Lukas
Summary A majority of the scalable data processing platforms that we rely on are built as distributed systems. This brings with it a vast number of s...
Making Wind Energy More Efficient With Data At Turbit Systems
21 Jul 2020
Contributed by Lukas
Summary Wind energy is an important component of an ecologically friendly power system, but there are a number of variables that can affect the overa...
Open Source Production Grade Data Integration With Meltano
13 Jul 2020
Contributed by Lukas
Summary The first stage of every data pipeline is extracting the information from source systems. There are a number of platforms for managing data i...
DataOps For Streaming Systems With Lenses.io
06 Jul 2020
Contributed by Lukas
Summary There are an increasing number of use cases for real time data, and the systems to power them are becoming more mature. Once you have a strea...
Data Collection And Management To Power Sound Recognition At Audio Analytic
30 Jun 2020
Contributed by Lukas
Summary We have machines that can listen to and process human speech in a variety of languages, but dealing with unstructured sounds in our environme...
Bringing Business Analytics To End Users With GoodData
23 Jun 2020
Contributed by Lukas
Summary The majority of analytics platforms are focused on use internal to an organization by business stakeholders. As the availability of data incr...
Accelerate Your Machine Learning With The StreamSQL Feature Store
15 Jun 2020
Contributed by Lukas
Summary Machine learning is a process driven by iteration and experimentation which requires fast and easy access to relevant features of the data be...
Data Management Trends From An Investor Perspective
08 Jun 2020
Contributed by Lukas
Summary The landscape of data management and processing is rapidly changing and evolving. There are certain foundational elements that have remained ...
Building A Data Lake For The Database Administrator At Upsolver
02 Jun 2020
Contributed by Lukas
Summary Data lakes offer a great deal of flexibility and the potential for reduced cost for your analytics, but they also introduce a great deal of c...
Mapping The Customer Journey For B2B Companies At Dreamdata
25 May 2020
Contributed by Lukas
Summary Gaining a complete view of the customer journey is especially difficult in B2B companies. This is due to the number of different individuals ...
Power Up Your PostgreSQL Analytics With Swarm64
18 May 2020
Contributed by Lukas
Summary The PostgreSQL database is massively popular due to its flexibility and extensive ecosystem of extensions, but it is still not the first choi...
StreamNative Brings Streaming Data To The Cloud Native Landscape With Pulsar
11 May 2020
Contributed by Lukas
Summary There have been several generations of platforms for managing streaming data, each with their own strengths and weaknesses, and different are...
Enterprise Data Operations And Orchestration At Infoworks
04 May 2020
Contributed by Lukas
Summary Data management is hard at any scale, but working in the context of an enterprise organization adds even greater complexity. Infoworks is a p...
Taming Complexity In Your Data Driven Organization With DataOps
28 Apr 2020
Contributed by Lukas
Summary Data is a critical element to every role in an organization, which is also what makes managing it so challenging. With so many different opin...
Building Real Time Applications On Streaming Data With Eventador
20 Apr 2020
Contributed by Lukas
Summary Modern applications frequently require access to real-time data, but building and maintaining the systems that make that possible is a comple...
Making Data Collection In Your Code Easy With Rookout
14 Apr 2020
Contributed by Lukas
Summary The software applications that we build for our businesses are a rich source of data, but accessing and extracting that data is often a slow ...
Building A Knowledge Graph Of Commercial Real Estate At Cherre
07 Apr 2020
Contributed by Lukas
Summary Knowledge graphs are a data resource that can answer questions beyond the scope of traditional data analytics. By organizing and storing data...
The Life Of A Non-Profit Data Professional
30 Mar 2020
Contributed by Lukas
Summary Building and maintaining a system that integrates and analyzes all of the data for your organization is a complex endeavor. Operating on a sh...
Behind The Scenes Of The Linode Object Storage Service
23 Mar 2020
Contributed by Lukas
Summary There are a number of platforms available for object storage, including self-managed open source projects. But what goes on behind the scenes...
Building A New Foundation For CouchDB
17 Mar 2020
Contributed by Lukas
Summary CouchDB is a distributed document database built for scale and ease of operation. With a built-in synchronization protocol and a HTTP interfa...
Scaling Data Governance For Global Businesses With A Data Hub Architecture
09 Mar 2020
Contributed by Lukas
Summary Data governance is a complex endeavor, but scaling it to meet the needs of a complex or globally distributed organization requires a well con...
Easier Stream Processing On Kafka With ksqlDB
02 Mar 2020
Contributed by Lukas
Summary Building applications on top of unbounded event streams is a complex endeavor, requiring careful integration of multiple disparate systems th...
Shining A Light on Shadow IT In Data And Analytics
25 Feb 2020
Contributed by Lukas
Summary Misaligned priorities across business units can lead to tensions that drive members of the organization to build data and analytics projects ...
Data Infrastructure Automation For Private SaaS At Snowplow
18 Feb 2020
Contributed by Lukas
Summary One of the biggest challenges in building reliable platforms for processing event pipelines is managing the underlying infrastructure. At Sno...
Data Modeling That Evolves With Your Business Using Data Vault
09 Feb 2020
Contributed by Lukas
Summary Designing the structure for your data warehouse is a complex and challenging process. As businesses deal with a growing number of sources and...
The Benefits And Challenges Of Building A Data Trust
03 Feb 2020
Contributed by Lukas
Summary Every business collects data in some fashion, but sometimes the true value of the collected information only comes when it is combined with o...
Pay Down Technical Debt In Your Data Pipeline With Great Expectations
27 Jan 2020
Contributed by Lukas
Summary Data pipelines are complicated and business critical pieces of technical infrastructure. Unfortunately they are also complex and difficult to...
Replatforming Production Dataflows
20 Jan 2020
Contributed by Lukas
Summary Building a reliable data platform is a neverending task. Even if you have a process that works for you and your business there can be unexpec...
Planet Scale SQL For The New Generation Of Applications With YugabyteDB
13 Jan 2020
Contributed by Lukas
SummaryThe modern era of software development is identified by ubiquitous access to elastic infrastructure for computation and easy automation of depl...
Change Data Capture For All Of Your Databases With Debezium
06 Jan 2020
Contributed by Lukas
Summary Databases are useful for inspecting the current state of your application, but inspecting the history of that data can get messy without a wa...
Building The DataDog Platform For Processing Timeseries Data At Massive Scale
30 Dec 2019
Contributed by Lukas
Summary DataDog is one of the most successful companies in the space of metrics and monitoring for servers and cloud infrastructure. In order to supp...
Building The Materialize Engine For Interactive Streaming Analytics In SQL
23 Dec 2019
Contributed by Lukas
Summary Transactional databases used in applications are optimized for fast reads and writes with relatively simple queries on a small number of reco...
Solving Data Lineage Tracking And Data Discovery At WeWork
16 Dec 2019
Contributed by Lukas
Summary Building clean datasets with reliable and reproducible ingestion pipelines is completely useless if it’s not possible to find them and ...
SnowflakeDB: The Data Warehouse Built For The Cloud
09 Dec 2019
Contributed by Lukas
Summary Data warehouses have gone through many transformations, from standard relational databases on powerful hardware, to column oriented storage e...
Organizing And Empowering Data Engineers At Citadel
03 Dec 2019
Contributed by Lukas
Summary The financial industry has long been driven by data, requiring a mature and robust capacity for discovering and integrating valuable sources ...
Building A Real Time Event Data Warehouse For Sentry
26 Nov 2019
Contributed by Lukas
Summary The team at Sentry has built a platform for anyone in the world to send software errors and events. As they scaled the volume of customers an...
Escaping Analysis Paralysis For Your Data Platform With Data Virtualization
18 Nov 2019
Contributed by Lukas
Summary With the constant evolution of technology for data management it can seem impossible to make an informed decision about whether to build a da...
Designing For Data Protection
11 Nov 2019
Contributed by Lukas
Summary The practice of data management is one that requires technical acumen, but there are also many policy and regulatory issues that inform and i...
Automating Your Production Dataflows On Spark
04 Nov 2019
Contributed by Lukas
Summary As data engineers the health of our pipelines is our highest priority. Unfortunately, there are countless ways that our dataflows can break o...
Build Maintainable And Testable Data Applications With Dagster
28 Oct 2019
Contributed by Lukas
Summary Despite the fact that businesses have relied on useful and accurate data to succeed for decades now, the state of the art for obtaining and m...
Data Orchestration For Hybrid Cloud Analytics
22 Oct 2019
Contributed by Lukas
Summary The scale and complexity of the systems that we build to satisfy business requirements is increasing as the available tools become more sophi...
Keeping Your Data Warehouse In Order With DataForm
15 Oct 2019
Contributed by Lukas
Summary Managing a data warehouse can be challenging, especially when trying to maintain a common set of patterns. Dataform is a platform that helps ...
Fast Analytics On Semi-Structured And Structured Data In The Cloud
08 Oct 2019
Contributed by Lukas
Summary The process of exposing your data through a SQL interface has many possible pathways, each with their own complications and tradeoffs. One of...
Ship Faster With An Opinionated Data Pipeline Framework
01 Oct 2019
Contributed by Lukas
Summary Building an end-to-end data pipeline for your machine learning projects is a complex task, made more difficult by the variety of ways that yo...
Open Source Object Storage For All Of Your Data
23 Sep 2019
Contributed by Lukas
Summary Object storage is quickly becoming the unifying layer for data intensive applications and analytics. Modern, cloud oriented data warehouses a...
Navigating Boundless Data Streams With The Swim Kernel
18 Sep 2019
Contributed by Lukas
Summary The conventional approach to analytics involves collecting large amounts of data that can be cleaned, followed by a separate step for analysi...
Building A Reliable And Performant Router For Observability Data
10 Sep 2019
Contributed by Lukas
Summary The first stage in every data project is collecting information and routing it to a storage system for later analysis. For operational data t...
Building A Community For Data Professionals at Data Council
02 Sep 2019
Contributed by Lukas
Summary Data professionals are working in a domain that is rapidly evolving. In order to stay current we need access to deeply technical presentation...
Building Tools And Platforms For Data Analytics
26 Aug 2019
Contributed by Lukas
Summary Data engineers are responsible for building tools and platforms to power the workflows of other members of the business. Each group of users ...
A High Performance Platform For The Full Big Data Lifecycle
19 Aug 2019
Contributed by Lukas
Summary Managing big data projects at scale is a perennial problem, with a wide variety of solutions that have evolved over the past 20 years. One of...