Data Engineering Podcast
Episodes
A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore
30 May 2022
Contributed by Lukas
Summary A large fraction of data engineering work involves moving data from one storage location to another in order to support different access and ...
Unlocking The Value Of Data Across The Organization Through User Friendly Data Tools With Prophecy
23 May 2022
Contributed by Lukas
Summary The interfaces and design cues that a tool offers can have a massive impact on who is able to use it and the tasks that they are able to perf...
Cloud Native Data Orchestration For Machine Learning And Data Engineering With Flyte
23 May 2022
Contributed by Lukas
Summary Machine learning has become a meaningful target for data applications, bringing with it an increase in the complexity of orchestrating the en...
Designing And Deploying IoT Analytics For Industrial Applications At Vopak
16 May 2022
Contributed by Lukas
Summary Industrial applications are one of the primary adopters of Internet of Things (IoT) technologies, with business critical operations being inf...
Insights And Advice On Building A Data Lake Platform From Someone Who Learned The Hard Way
16 May 2022
Contributed by Lukas
Summary Designing a data platform is a complex and iterative undertaking which requires accounting for many conflicting needs. Designing a platform t...
Scaling Analysis of Connected Data And Modeling Complex Relationships With The TigerGraph Graph Database
09 May 2022
Contributed by Lukas
Summary Many of the events, ideas, and objects that we try to represent through data have a high degree of connectivity in the real world. These conn...
Exploring The Insights And Impact Of Dan Delorey's Distinguished Career In Data
09 May 2022
Contributed by Lukas
Summary Dan Delorey helped to build the core technologies of Google’s cloud data services for many years before embarking on his latest adventu...
Leading The Charge For The ELT Data Integration Pattern For Cloud Data Warehouses At Matillion
02 May 2022
Contributed by Lukas
Summary The predominant pattern for data integration in the cloud has become extract, load, and then transform or ELT. Matillion was an early innovat...
Evolving And Scaling The Data Platform at Yotpo
02 May 2022
Contributed by Lukas
Summary Building a data platform is an iterative and evolutionary process that requires collaboration with internal stakeholders to ensure that their...
Operational Analytics At Speed With Minimal Busy Work Using Incorta
24 Apr 2022
Contributed by Lukas
Summary A huge amount of effort goes into modeling and shaping data to make it available for analytical purposes. This is often due to the need to si...
Gain Visibility Into Your Entire Machine Learning System Using Data Logging With WhyLogs
24 Apr 2022
Contributed by Lukas
Summary There are very few tools which are equally useful for data engineers, data scientists, and machine learning engineers. WhyLogs is a powerful ...
Connecting To The Next Frontier Of Computing With Quantum Networks
18 Apr 2022
Contributed by Lukas
Summary The next paradigm shift in computing is coming in the form of quantum technologies. Quantum procesors have gained significant attention for t...
What Does It Really Mean To Do MLOps And What Is The Data Engineer's Role?
16 Apr 2022
Contributed by Lukas
Summary Putting machine learning models into production and keeping them there requires investing in well-managed systems to manage the full lifecycl...
DataOps As A Service For Your Data Integration Workflows With Rivery
11 Apr 2022
Contributed by Lukas
Summary Data engineering is a practice that is multi-faceted and requires integration with a large number of systems. This often means working across...
Synthetic Data As A Service For Simplifying Privacy Engineering With Gretel
10 Apr 2022
Contributed by Lukas
Summary Any time that you are storing data about people there are a number of privacy and security considerations that come with it. Privacy engineer...
Accelerate Development Of Enterprise Analytics With The Coalesce Visual Workflow Builder
03 Apr 2022
Contributed by Lukas
Summary The flexibility of software oriented data workflows is useful for fulfilling complex requirements, but for simple and repetitious use cases i...
Repeatable Patterns For Designing Data Platforms And When To Customize Them
03 Apr 2022
Contributed by Lukas
Summary Building a data platform for your organization is a challenging undertaking. Building multiple data platforms for other organizations as a se...
Eliminate The Bottlenecks In Your Key/Value Storage With SpeeDB
27 Mar 2022
Contributed by Lukas
Summary At the foundational layer many databases and data processing engines rely on key/value storage for managing the layout of information on the ...
Building A Data Governance Bridge Between Cloud And Datacenters For The Enterprise At Privacera
27 Mar 2022
Contributed by Lukas
Summary Data governance is a practice that requires a high degree of flexibility and collaboration at the organizational and technical levels. The gr...
Exploring Incident Management Strategies For Data Teams
20 Mar 2022
Contributed by Lukas
Summary Data assets and the pipelines that create them have become critical production infrastructure for companies. This adds a requirement for reli...
Accelerate Your Embedded Analytics With Apache Pinot
20 Mar 2022
Contributed by Lukas
Summary Data and analytics are permeating every system, including customer-facing applications. The introduction of embedded analytics to an end-user...
Accelerating Adoption Of The Modern Data Stack At 5X Data
14 Mar 2022
Contributed by Lukas
Summary The modern data stack is a constantly moving target which makes it difficult to adopt without prior experience. In order to accelerate the ti...
Taking A Multidimensional Approach To Data Observability At Acceldata
14 Mar 2022
Contributed by Lukas
Summary Data observability is a term that has been co-opted by numerous vendors with varying ideas of what it should mean. At Acceldata, they view it...
Move Your Database To The Data And Speed Up Your Analytics With DuckDB
05 Mar 2022
Contributed by Lukas
Summary When you think about selecting a database engine for your project you typically consider options focused on serving multiple concurrent users...
Developer Friendly Application Persistence That Is Fast And Scalable With HarperDB
05 Mar 2022
Contributed by Lukas
Summary Databases are an important component of application architectures, but they are often difficult to work with. HarperDB was created with the c...
Reflections On Designing A Data Platform From Scratch
28 Feb 2022
Contributed by Lukas
Summary Building a data platform is a complex journey that requires a significant amount of planning to do well. It requires knowledge of the availab...
Manage Your Unstructured Data Assets Across Cloud And Hybrid Environments With Komprise
28 Feb 2022
Contributed by Lukas
Summary There are a wealth of options for managing structured and textual data, but unstructured binary data assets are not as well supported across ...
Build Your Python Data Processing Your Way And Run It Anywhere With Fugue
21 Feb 2022
Contributed by Lukas
Summary Python has grown to be one of the top languages used for all aspects of data, from collection and cleaning, to analysis and machine learning....
Understanding The Immune System With Data At ImmunAI
21 Feb 2022
Contributed by Lukas
Summary The life sciences as an industry has seen incredible growth in scale and sophistication, along with the advances in data technology that make...
Bring Your Code To Your Streaming And Static Data Without Effort With The Deephaven Real Time Query Engine
14 Feb 2022
Contributed by Lukas
Summary Streaming data sources are becoming more widely available as tools to handle their storage and distribution mature. However it is still a cha...
Build Your Own End To End Customer Data Platform With Rudderstack
14 Feb 2022
Contributed by Lukas
Summary Collecting, integrating, and activating data are all challenging activities. When that data pertains to your customers it can become even mor...
Scale Your Spatial Analysis By Building It In SQL With Syntax Extensions
07 Feb 2022
Contributed by Lukas
Summary Along with globalization of our societies comes the need to analyze the geospatial and geotemporal data that is needed to manage the growth i...
Scalable Strategies For Protecting Data Privacy In Your Shared Data Sets
06 Feb 2022
Contributed by Lukas
Summary There are many dimensions to the work of protecting the privacy of users in our data. When you need to share a data set with other teams, dep...
A Reflection On Learning A Lot More Than 97 Things Every Data Engineer Should Know
31 Jan 2022
Contributed by Lukas
Summary The Data Engineering Podcast has been going for five years now and has included conversations and interviews with a huge number of guests, co...
Effective Pandas Patterns For Data Engineering
31 Jan 2022
Contributed by Lukas
Summary Pandas is a powerful tool for cleaning, transforming, manipulating, or enriching data, among many other potential uses. As a result it has be...
The Importance Of Data Contracts As The Interface For Data Integration With Abhi Sivasailam
23 Jan 2022
Contributed by Lukas
Summary Data platforms are exemplified by a complex set of connections that are subject to a set of constantly evolving requirements. In order to mak...
Building And Managing Data Teams And Data Platforms In Large Organizations With Ashish Mrig
23 Jan 2022
Contributed by Lukas
Summary Data engineering is a relatively young and rapidly expanding field, with practitioners having a wide array of experiences as they navigate th...
Automated Data Quality Management Through Machine Learning With Anomalo
15 Jan 2022
Contributed by Lukas
Summary Data quality control is a requirement for being able to trust the various reports and machine learning models that are relying on the informa...
An Introduction To Data And Analytics Engineering For Non-Programmers
15 Jan 2022
Contributed by Lukas
Summary Applications of data have grown well beyond the venerable business intelligence dashboards that organizations have relied on for decades. Now...
Open Source Reverse ETL For Everyone With Grouparoo
08 Jan 2022
Contributed by Lukas
Summary Reverse ETL is a product category that evolved from the landscape of customer data platforms with a number of companies offering their own im...
Data Observability Out Of The Box With Metaplane
08 Jan 2022
Contributed by Lukas
Summary Data observability is a set of technical and organizational capabilities related to understanding how your data is being processed and used s...
Creating Shared Context For Your Data Warehouse With A Controlled Vocabulary
02 Jan 2022
Contributed by Lukas
Summary Communication and shared context are the hardest part of any data system. In recent years the focus has been on data catalogs as the means fo...
A Reflection On The Data Ecosystem For The Year 2021
02 Jan 2022
Contributed by Lukas
Summary This has been an active year for the data ecosystem, with a number of new product categories and substantial growth in existing areas. In an ...
Exploring The Evolving Role Of Data Engineers
27 Dec 2021
Contributed by Lukas
Summary Data Engineering is still a relatively new field that is going through a continued evolution as new technologies are introduced and new requi...
Revisiting The Technical And Social Benefits Of The Data Mesh
27 Dec 2021
Contributed by Lukas
Summary The data mesh is a thesis that was presented to address the technical and organizational challenges that businesses face in managing their an...
Fast And Flexible Headless Data Analytics With Cube.JS
21 Dec 2021
Contributed by Lukas
Summary One of the perennial challenges of data analytics is having a consistent set of definitions, along with a flexible and performant API endpoin...
Building A System Of Record For Your Organization's Data Ecosystem At Metaphor
20 Dec 2021
Contributed by Lukas
Summary Building a well managed data ecosystem for your organization requires a holistic view of all of the producers, consumers, and processors of i...
Building Auditable Spark Pipelines At Capital One
13 Dec 2021
Contributed by Lukas
Summary Spark is a powerful and battle tested framework for building highly scalable data pipelines. Because of its proven ability to handle large vo...
Deliver Personal Experiences In Your Applications With The Unomi Open Source Customer Data Platform
12 Dec 2021
Contributed by Lukas
Summary The core to providing your users with excellent service is to understand them and provide a personalized experience. Unfortunately many sites...
Data Driven Hiring For Data Professionals With Alooba
04 Dec 2021
Contributed by Lukas
Summary Hiring data professionals is challenging for a multitude of reasons, and as with every interview process there is a potential for bias to cre...
Experimentation and A/B Testing For Modern Data Teams With Eppo
04 Dec 2021
Contributed by Lukas
Summary A/B testing and experimentation are the most reliable way to determine whether a change to your product will have the desired effect on your ...
Creating A Unified Experience For The Modern Data Stack At Mozart Data
27 Nov 2021
Contributed by Lukas
Summary The modern data stack has been gaining a lot of attention recently with a rapidly growing set of managed services for different stages of the...
Doing DataOps For External Data Sources As A Service at Demyst
27 Nov 2021
Contributed by Lukas
Summary The data that you have access to affects the questions that you can answer. By using external data sources you can drastically increase the r...
Exploring Processing Patterns For Streaming Data Integration In Your Data Lake
20 Nov 2021
Contributed by Lukas
Summary One of the perennial challenges posed by data lakes is how to keep them up to date as new data is collected. With the improvements in streami...
Laying The Foundation Of Your Data Platform For The Era Of Big Complexity With Dagster
20 Nov 2021
Contributed by Lukas
Summary The technology for scaling storage and processing of data has gone through massive evolution over the past decade, leaving us with the abilit...
Data Quality Starts At The Source
14 Nov 2021
Contributed by Lukas
Summary The most important gauge of success for a data platform is the level of trust in the accuracy of the information that it provides. In order t...
Eliminate Friction In Your Data Platform Through Unified Metadata Using OpenMetadata
10 Nov 2021
Contributed by Lukas
Summary A significant source of friction and wasted effort in building and integrating data management systems is the fragmentation of metadata acros...
Business Intelligence Beyond The Dashboard With ClicData
06 Nov 2021
Contributed by Lukas
Summary Business intelligence is often equated with a collection of dashboards that show various charts and graphs representing data for an organizat...
Exploring The Evolution And Adoption of Customer Data Platforms and Reverse ETL
05 Nov 2021
Contributed by Lukas
Summary The precursor to widespread adoption of cloud data warehouses was the creation of customer data platforms. Acting as a centralized repository...
Removing The Barrier To Exploratory Analytics with Activity Schema and Narrator
29 Oct 2021
Contributed by Lukas
Summary The perennial question of data warehousing is how to model the information that you are storing. This has given rise to methods as varied as ...
Streaming Data Pipelines Made SQL With Decodable
29 Oct 2021
Contributed by Lukas
Summary Streaming data systems have been growing more capable and flexible over the past few years. Despite this, it is still challenging to build re...
Data Exploration For Business Users Powered By Analytics Engineering With Lightdash
23 Oct 2021
Contributed by Lukas
Summary The market for business intelligence has been going through an evolutionary shift in recent years. One of the driving forces for that change ...
Completing The Feedback Loop Of Data Through Operational Analytics With Census
21 Oct 2021
Contributed by Lukas
Summary The focus of the past few years has been to consolidate all of the organization’s data into a cloud data warehouse. As a result there h...
Bringing The Power Of The DataHub Real-Time Metadata Graph To Everyone At Acryl Data
16 Oct 2021
Contributed by Lukas
Summary The binding element of all data work is the metadata graph that is generated by all of the workflows that produce the assets used by teams ac...
How And Why To Become Data Driven As A Business
14 Oct 2021
Contributed by Lukas
Summary Organizations of all sizes are striving to become data driven, starting in earnest with the rise of big data a decade ago. With the never-end...
Make Your Business Metrics Reusable With Open Source Headless BI Using Metriql
08 Oct 2021
Contributed by Lukas
Summary The key to making data valuable to business users is the ability to calculate meaningful metrics and explore them along useful dimensions. Bu...
Adding Support For Distributed Transactions To The Redpanda Streaming Engine
06 Oct 2021
Contributed by Lukas
Summary Transactions are a necessary feature for ensuring that a set of actions are all performed as a single unit of work. In streaming systems this...
Building Real-Time Data Platforms For Large Volumes Of Information With Aerospike
02 Oct 2021
Contributed by Lukas
Summary Aerospike is a database engine that is designed to provide millisecond response times for queries across terabytes or petabytes. In this epis...
Delivering Your Personal Data Cloud With Prifina
30 Sep 2021
Contributed by Lukas
Summary The promise of online services is that they will make your life easier in exchange for collecting data about you. The reality is that they us...
Digging Into Data Reliability Engineering
26 Sep 2021
Contributed by Lukas
Summary The accuracy and availability of data has become critically important to the day-to-day operation of businesses. Similar to the practice of s...
Massively Parallel Data Processing In Python Without The Effort Using Bodo
25 Sep 2021
Contributed by Lukas
Summary Python has beome the de facto language for working with data. That has brought with it a number of challenges having to do with the speed and...
Declarative Machine Learning Without The Operational Overhead Using Continual
19 Sep 2021
Contributed by Lukas
Summary Building, scaling, and maintaining the operational components of a machine learning workflow are all hard problems. Add the work of creating ...
An Exploration Of The Data Engineering Requirements For Bioinformatics
19 Sep 2021
Contributed by Lukas
Summary Biology has been gaining a lot of attention in recent years, even before the pandemic. As an outgrowth of that popularity, a new field has gr...
Setting The Stage For The Next Chapter Of The Cassandra Database
12 Sep 2021
Contributed by Lukas
Summary The Cassandra database is one of the first open source options for globally scalable storage systems. Since its introduction in 2008 it has b...
A View From The Round Table Of Gartner's Cool Vendors
09 Sep 2021
Contributed by Lukas
Summary Gartner analysts are tasked with identifying promising companies each year that are making an impact in their respective categories. For busi...
Designing And Building Data Platforms As A Product
04 Sep 2021
Contributed by Lukas
Summary The term "data platform" gets thrown around a lot, but have you stopped to think about what it actually means for you and your orga...
Presto Powered Cloud Data Lakes At Speed Made Easy With Ahana
02 Sep 2021
Contributed by Lukas
Summary The Presto project has become the de facto option for building scalable open source analytics in SQL for the data lake. In recent months the ...
Do Away With Data Integration Through A Dataware Architecture With Cinchy
28 Aug 2021
Contributed by Lukas
Summary The reason that so much time and energy is spent on data integration is because of how our applications are designed. By making the software ...
Decoupling Data Operations From Data Infrastructure Using Nexla
25 Aug 2021
Contributed by Lukas
Summary The technological and social ecosystem of data engineering and data management has been reaching a stage of maturity recently. As part of thi...
Let Your Analysts Build A Data Lakehouse With Cuelake
21 Aug 2021
Contributed by Lukas
Summary Data lakes have been gaining popularity alongside an increase in their sophistication and usability. Despite improvements in performance and ...
Migrate And Modify Your Data Platform Confidently With Compilerworks
18 Aug 2021
Contributed by Lukas
Summary A major concern that comes up when selecting a vendor or technology for storing and managing your data is vendor lock-in. What happens if the...
Prepare Your Unstructured Data For Machine Learning And Computer Vision Without The Toil Using Activeloop
15 Aug 2021
Contributed by Lukas
Summary The vast majority of data tools and platforms that you hear about are designed for working with structured, text-based data. What do you do w...
Build Trust In Your Data By Understanding Where It Comes From And How It Is Used With Stemma
10 Aug 2021
Contributed by Lukas
Summary All of the fancy data platform tools and shiny dashboards that you use are pointless if the consumers of your analysis don’t have trust...
Data Discovery From Dashboards To Databases With Castor
07 Aug 2021
Contributed by Lukas
Summary Every organization needs to be able to use data to answer questions about their business. The trouble is that the data is usually spread acro...
Charting A Path For Streaming Data To Fill Your Data Lake With Hudi
03 Aug 2021
Contributed by Lukas
Summary Data lake architectures have largely been biased toward batch processing workflows due to the volume of data that they are designed for. With...
Adding Context And Comprehension To Your Analytics Through Data Discovery With SelectStar
31 Jul 2021
Contributed by Lukas
Summary Companies of all sizes and industries are trying to use the data that they and their customers generate to survive and thrive in the modern e...
Building a Multi-Tenant Managed Platform For Streaming Data With Pulsar at Datastax
28 Jul 2021
Contributed by Lukas
Summary Everyone expects data to be transmitted, processed, and updated instantly as more and more products integrate streaming data. The technology ...
Bringing The Metrics Layer To The Masses With Transform
23 Jul 2021
Contributed by Lukas
Summary Collecting and cleaning data is only useful if someone can make sense of it afterward. The latest evolution in the data ecosystem is the intr...
Strategies For Proactive Data Quality Management
20 Jul 2021
Contributed by Lukas
Summary Data quality is a concern that has been gaining attention alongside the rising importance of analytics for business success. Many solutions r...
Low Code And High Quality Data Engineering For The Whole Organization With Prophecy
16 Jul 2021
Contributed by Lukas
Summary There is a wealth of tools and systems available for processing data, but the user experience of integrating them and building workflows is s...
Exploring The Design And Benefits Of The Modern Data Stack
13 Jul 2021
Contributed by Lukas
Summary We have been building platforms and workflows to store, process, and analyze data since the earliest days of computing. Over that time there ...
Democratize Data Cleaning Across Your Organization With Trifacta
09 Jul 2021
Contributed by Lukas
Summary Every data project, whether it’s analytics, machine learning, or AI, starts with the work of data cleaning. This is a critical step and...
Stick All Of Your Systems And Data Together With SaaSGlue As Your Workflow Manager
05 Jul 2021
Contributed by Lukas
Summary At the core of every data pipeline is an workflow manager (or several). Deploying, managing, and scaling that orchestration can consume a lar...
Leveling Up Open Source Data Integration With Meltano Hub And The Singer SDK
03 Jul 2021
Contributed by Lukas
Summary Data integration in the form of extract and load is the critical first step of every data project. There are a large number of commercial and...
A Candid Exploration Of Timeseries Data Analysis With InfluxDB
29 Jun 2021
Contributed by Lukas
Summary While the overall concept of timeseries data is uniform, its usage and applications are far from it. One of the most demanding applications o...
Lessons Learned From The Pipeline Data Engineering Academy
26 Jun 2021
Contributed by Lukas
Summary Data Engineering is a broad and constantly evolving topic, which makes it difficult to teach in a concise and effective manner. Despite that,...
Make Database Performance Optimization A Playful Experience With OtterTune
23 Jun 2021
Contributed by Lukas
Summary The database is the core of any system because it holds the data that drives your entire experience. We spend countless hours designing the d...
Bring Order To The Chaos Of Your Unstructured Data Assets With Unstruk
18 Jun 2021
Contributed by Lukas
Summary Working with unstructured data has typically been a motivation for a data lake. The challenge is imposing enough order on the platform to mak...
Accelerating ML Training And Delivery With In-Database Machine Learning
15 Jun 2021
Contributed by Lukas
Summary When you build a machine learning model, the first step is always to load your data. Typically this means downloading files from object stora...
Taking A Tour Of The Google Cloud Platform For Data And Analytics
12 Jun 2021
Contributed by Lukas
Summary Google pioneered an impressive number of the architectural underpinnings of the broader big data ecosystem. Now they offer the technologies t...