Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing
Podcast Image

Data Engineering Podcast

Technology Education

Episodes

Showing 201-300 of 508
«« ← Prev Page 3 of 6 Next → »»

Joe Reis Flips The Script And Interviews Tobias Macey About The Data Engineering Podcast

17 Jul 2022

Contributed by Lukas

Summary Data engineering is a large and growing subject, with new technologies, specializations, and "best practices" emerging at an accele...

Charting the Path of Riskified's Data Platform Journey

10 Jul 2022

Contributed by Lukas

Summary Building a data platform is a journey, not a destination. Beyond the work of assembling a set of technologies and building integrations acros...

Maintain Your Data Engineers' Sanity By Embracing Automation

10 Jul 2022

Contributed by Lukas

Summary Building and maintaining reliable data assets is the prime directive for data engineers. While it is easy to say, it is endlessly complex to ...

Be Confident In Your Data Integration By Quickly Validating Matching Records With data-diff

03 Jul 2022

Contributed by Lukas

Summary The perennial challenge of data engineers is ensuring that information is integrated reliably. While it is straightforward to know whether a ...

The View From The Lakehouse Of Architectural Patterns For Your Data Platform

03 Jul 2022

Contributed by Lukas

Summary The ecosystem for data tools has been going through rapid and constant evolution over the past several years. These technological shifts have...

Bring Geospatial Analytics Across Disparate Datasets Into Your Toolkit With The Unfolded Platform

27 Jun 2022

Contributed by Lukas

Summary The proliferation of sensors and GPS devices has dramatically increased the number of applications for spatial data, and the need for scalabl...

Strategies And Tactics For A Successful Master Data Management Implementation

27 Jun 2022

Contributed by Lukas

Summary The most complicated part of data engineering is the effort involved in making the raw data fit into the narrative of the business. Master Da...

Combining The Simplicity Of Spreadsheets With The Power Of Modern Data Infrastructure At Canvas

19 Jun 2022

Contributed by Lukas

Summary Data analysis is a valuable exercise that is often out of reach of non-technical users as a result of the complexity of data systems. In orde...

Level Up Your Data Platform With Active Metadata

19 Jun 2022

Contributed by Lukas

Summary Metadata is the lifeblood of your data platform, providing information about what is happening in your systems. A variety of platforms have b...

Discover And De-Clutter Your Unstructured Data With Aparavi

13 Jun 2022

Contributed by Lukas

Summary Unstructured data takes many forms in an organization. From a data engineering perspective that often means things like JSON files, audio or ...

Hire And Scale Your Data Team With Intention

13 Jun 2022

Contributed by Lukas

Summary Building a well rounded and effective data team is an iterative process, and the first hire can set the stage for future success or failure. ...

Simplify Data Security For Sensitive Information With The Skyflow Data Privacy Vault

06 Jun 2022

Contributed by Lukas

Summary The best way to make sure that you don’t leak sensitive data is to never have it in the first place. The team at Skyflow decided that t...

Bringing The Modern Data Stack To Everyone With Y42

06 Jun 2022

Contributed by Lukas

Summary Cloud services have made highly scalable and performant data platforms economical and manageable for data teams. However, they are still chal...

Data Cloud Cost Optimization With Bluesky Data

30 May 2022

Contributed by Lukas

Summary The latest generation of data warehouse platforms have brought unprecedented operational simplicity and effectively infinite scale. Along wit...

A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore

30 May 2022

Contributed by Lukas

Summary A large fraction of data engineering work involves moving data from one storage location to another in order to support different access and ...

Unlocking The Value Of Data Across The Organization Through User Friendly Data Tools With Prophecy

23 May 2022

Contributed by Lukas

Summary The interfaces and design cues that a tool offers can have a massive impact on who is able to use it and the tasks that they are able to perf...

Cloud Native Data Orchestration For Machine Learning And Data Engineering With Flyte

23 May 2022

Contributed by Lukas

Summary Machine learning has become a meaningful target for data applications, bringing with it an increase in the complexity of orchestrating the en...

Designing And Deploying IoT Analytics For Industrial Applications At Vopak

16 May 2022

Contributed by Lukas

Summary Industrial applications are one of the primary adopters of Internet of Things (IoT) technologies, with business critical operations being inf...

Insights And Advice On Building A Data Lake Platform From Someone Who Learned The Hard Way

16 May 2022

Contributed by Lukas

Summary Designing a data platform is a complex and iterative undertaking which requires accounting for many conflicting needs. Designing a platform t...

Scaling Analysis of Connected Data And Modeling Complex Relationships With The TigerGraph Graph Database

09 May 2022

Contributed by Lukas

Summary Many of the events, ideas, and objects that we try to represent through data have a high degree of connectivity in the real world. These conn...

Exploring The Insights And Impact Of Dan Delorey's Distinguished Career In Data

09 May 2022

Contributed by Lukas

Summary Dan Delorey helped to build the core technologies of Google’s cloud data services for many years before embarking on his latest adventu...

Leading The Charge For The ELT Data Integration Pattern For Cloud Data Warehouses At Matillion

02 May 2022

Contributed by Lukas

Summary The predominant pattern for data integration in the cloud has become extract, load, and then transform or ELT. Matillion was an early innovat...

Evolving And Scaling The Data Platform at Yotpo

02 May 2022

Contributed by Lukas

Summary Building a data platform is an iterative and evolutionary process that requires collaboration with internal stakeholders to ensure that their...

Operational Analytics At Speed With Minimal Busy Work Using Incorta

24 Apr 2022

Contributed by Lukas

Summary A huge amount of effort goes into modeling and shaping data to make it available for analytical purposes. This is often due to the need to si...

Gain Visibility Into Your Entire Machine Learning System Using Data Logging With WhyLogs

24 Apr 2022

Contributed by Lukas

Summary There are very few tools which are equally useful for data engineers, data scientists, and machine learning engineers. WhyLogs is a powerful ...

Connecting To The Next Frontier Of Computing With Quantum Networks

18 Apr 2022

Contributed by Lukas

Summary The next paradigm shift in computing is coming in the form of quantum technologies. Quantum procesors have gained significant attention for t...

What Does It Really Mean To Do MLOps And What Is The Data Engineer's Role?

16 Apr 2022

Contributed by Lukas

Summary Putting machine learning models into production and keeping them there requires investing in well-managed systems to manage the full lifecycl...

DataOps As A Service For Your Data Integration Workflows With Rivery

11 Apr 2022

Contributed by Lukas

Summary Data engineering is a practice that is multi-faceted and requires integration with a large number of systems. This often means working across...

Synthetic Data As A Service For Simplifying Privacy Engineering With Gretel

10 Apr 2022

Contributed by Lukas

Summary Any time that you are storing data about people there are a number of privacy and security considerations that come with it. Privacy engineer...

Accelerate Development Of Enterprise Analytics With The Coalesce Visual Workflow Builder

03 Apr 2022

Contributed by Lukas

Summary The flexibility of software oriented data workflows is useful for fulfilling complex requirements, but for simple and repetitious use cases i...

Repeatable Patterns For Designing Data Platforms And When To Customize Them

03 Apr 2022

Contributed by Lukas

Summary Building a data platform for your organization is a challenging undertaking. Building multiple data platforms for other organizations as a se...

Eliminate The Bottlenecks In Your Key/Value Storage With SpeeDB

27 Mar 2022

Contributed by Lukas

Summary At the foundational layer many databases and data processing engines rely on key/value storage for managing the layout of information on the ...

Building A Data Governance Bridge Between Cloud And Datacenters For The Enterprise At Privacera

27 Mar 2022

Contributed by Lukas

Summary Data governance is a practice that requires a high degree of flexibility and collaboration at the organizational and technical levels. The gr...

Exploring Incident Management Strategies For Data Teams

20 Mar 2022

Contributed by Lukas

Summary Data assets and the pipelines that create them have become critical production infrastructure for companies. This adds a requirement for reli...

Accelerate Your Embedded Analytics With Apache Pinot

20 Mar 2022

Contributed by Lukas

Summary Data and analytics are permeating every system, including customer-facing applications. The introduction of embedded analytics to an end-user...

Accelerating Adoption Of The Modern Data Stack At 5X Data

14 Mar 2022

Contributed by Lukas

Summary The modern data stack is a constantly moving target which makes it difficult to adopt without prior experience. In order to accelerate the ti...

Taking A Multidimensional Approach To Data Observability At Acceldata

14 Mar 2022

Contributed by Lukas

Summary Data observability is a term that has been co-opted by numerous vendors with varying ideas of what it should mean. At Acceldata, they view it...

Move Your Database To The Data And Speed Up Your Analytics With DuckDB

05 Mar 2022

Contributed by Lukas

Summary When you think about selecting a database engine for your project you typically consider options focused on serving multiple concurrent users...

Developer Friendly Application Persistence That Is Fast And Scalable With HarperDB

05 Mar 2022

Contributed by Lukas

Summary Databases are an important component of application architectures, but they are often difficult to work with. HarperDB was created with the c...

Reflections On Designing A Data Platform From Scratch

28 Feb 2022

Contributed by Lukas

Summary Building a data platform is a complex journey that requires a significant amount of planning to do well. It requires knowledge of the availab...

Manage Your Unstructured Data Assets Across Cloud And Hybrid Environments With Komprise

28 Feb 2022

Contributed by Lukas

Summary There are a wealth of options for managing structured and textual data, but unstructured binary data assets are not as well supported across ...

Build Your Python Data Processing Your Way And Run It Anywhere With Fugue

21 Feb 2022

Contributed by Lukas

Summary Python has grown to be one of the top languages used for all aspects of data, from collection and cleaning, to analysis and machine learning....

Understanding The Immune System With Data At ImmunAI

21 Feb 2022

Contributed by Lukas

Summary The life sciences as an industry has seen incredible growth in scale and sophistication, along with the advances in data technology that make...

Bring Your Code To Your Streaming And Static Data Without Effort With The Deephaven Real Time Query Engine

14 Feb 2022

Contributed by Lukas

Summary Streaming data sources are becoming more widely available as tools to handle their storage and distribution mature. However it is still a cha...

Build Your Own End To End Customer Data Platform With Rudderstack

14 Feb 2022

Contributed by Lukas

Summary Collecting, integrating, and activating data are all challenging activities. When that data pertains to your customers it can become even mor...

Scale Your Spatial Analysis By Building It In SQL With Syntax Extensions

07 Feb 2022

Contributed by Lukas

Summary Along with globalization of our societies comes the need to analyze the geospatial and geotemporal data that is needed to manage the growth i...

Scalable Strategies For Protecting Data Privacy In Your Shared Data Sets

06 Feb 2022

Contributed by Lukas

Summary There are many dimensions to the work of protecting the privacy of users in our data. When you need to share a data set with other teams, dep...

A Reflection On Learning A Lot More Than 97 Things Every Data Engineer Should Know

31 Jan 2022

Contributed by Lukas

Summary The Data Engineering Podcast has been going for five years now and has included conversations and interviews with a huge number of guests, co...

Effective Pandas Patterns For Data Engineering

31 Jan 2022

Contributed by Lukas

Summary Pandas is a powerful tool for cleaning, transforming, manipulating, or enriching data, among many other potential uses. As a result it has be...

The Importance Of Data Contracts As The Interface For Data Integration With Abhi Sivasailam

23 Jan 2022

Contributed by Lukas

Summary Data platforms are exemplified by a complex set of connections that are subject to a set of constantly evolving requirements. In order to mak...

Building And Managing Data Teams And Data Platforms In Large Organizations With Ashish Mrig

23 Jan 2022

Contributed by Lukas

Summary Data engineering is a relatively young and rapidly expanding field, with practitioners having a wide array of experiences as they navigate th...

Automated Data Quality Management Through Machine Learning With Anomalo

15 Jan 2022

Contributed by Lukas

Summary Data quality control is a requirement for being able to trust the various reports and machine learning models that are relying on the informa...

An Introduction To Data And Analytics Engineering For Non-Programmers

15 Jan 2022

Contributed by Lukas

Summary Applications of data have grown well beyond the venerable business intelligence dashboards that organizations have relied on for decades. Now...

Open Source Reverse ETL For Everyone With Grouparoo

08 Jan 2022

Contributed by Lukas

Summary Reverse ETL is a product category that evolved from the landscape of customer data platforms with a number of companies offering their own im...

Data Observability Out Of The Box With Metaplane

08 Jan 2022

Contributed by Lukas

Summary Data observability is a set of technical and organizational capabilities related to understanding how your data is being processed and used s...

Creating Shared Context For Your Data Warehouse With A Controlled Vocabulary

02 Jan 2022

Contributed by Lukas

Summary Communication and shared context are the hardest part of any data system. In recent years the focus has been on data catalogs as the means fo...

A Reflection On The Data Ecosystem For The Year 2021

02 Jan 2022

Contributed by Lukas

Summary This has been an active year for the data ecosystem, with a number of new product categories and substantial growth in existing areas. In an ...

Exploring The Evolving Role Of Data Engineers

27 Dec 2021

Contributed by Lukas

Summary Data Engineering is still a relatively new field that is going through a continued evolution as new technologies are introduced and new requi...

Revisiting The Technical And Social Benefits Of The Data Mesh

27 Dec 2021

Contributed by Lukas

Summary The data mesh is a thesis that was presented to address the technical and organizational challenges that businesses face in managing their an...

Fast And Flexible Headless Data Analytics With Cube.JS

21 Dec 2021

Contributed by Lukas

Summary One of the perennial challenges of data analytics is having a consistent set of definitions, along with a flexible and performant API endpoin...

Building A System Of Record For Your Organization's Data Ecosystem At Metaphor

20 Dec 2021

Contributed by Lukas

Summary Building a well managed data ecosystem for your organization requires a holistic view of all of the producers, consumers, and processors of i...

Building Auditable Spark Pipelines At Capital One

13 Dec 2021

Contributed by Lukas

Summary Spark is a powerful and battle tested framework for building highly scalable data pipelines. Because of its proven ability to handle large vo...

Deliver Personal Experiences In Your Applications With The Unomi Open Source Customer Data Platform

12 Dec 2021

Contributed by Lukas

Summary The core to providing your users with excellent service is to understand them and provide a personalized experience. Unfortunately many sites...

Data Driven Hiring For Data Professionals With Alooba

04 Dec 2021

Contributed by Lukas

Summary Hiring data professionals is challenging for a multitude of reasons, and as with every interview process there is a potential for bias to cre...

Experimentation and A/B Testing For Modern Data Teams With Eppo

04 Dec 2021

Contributed by Lukas

Summary A/B testing and experimentation are the most reliable way to determine whether a change to your product will have the desired effect on your ...

Creating A Unified Experience For The Modern Data Stack At Mozart Data

27 Nov 2021

Contributed by Lukas

Summary The modern data stack has been gaining a lot of attention recently with a rapidly growing set of managed services for different stages of the...

Doing DataOps For External Data Sources As A Service at Demyst

27 Nov 2021

Contributed by Lukas

Summary The data that you have access to affects the questions that you can answer. By using external data sources you can drastically increase the r...

Exploring Processing Patterns For Streaming Data Integration In Your Data Lake

20 Nov 2021

Contributed by Lukas

Summary One of the perennial challenges posed by data lakes is how to keep them up to date as new data is collected. With the improvements in streami...

Laying The Foundation Of Your Data Platform For The Era Of Big Complexity With Dagster

20 Nov 2021

Contributed by Lukas

Summary The technology for scaling storage and processing of data has gone through massive evolution over the past decade, leaving us with the abilit...

Data Quality Starts At The Source

14 Nov 2021

Contributed by Lukas

Summary The most important gauge of success for a data platform is the level of trust in the accuracy of the information that it provides. In order t...

Eliminate Friction In Your Data Platform Through Unified Metadata Using OpenMetadata

10 Nov 2021

Contributed by Lukas

Summary A significant source of friction and wasted effort in building and integrating data management systems is the fragmentation of metadata acros...

Business Intelligence Beyond The Dashboard With ClicData

06 Nov 2021

Contributed by Lukas

Summary Business intelligence is often equated with a collection of dashboards that show various charts and graphs representing data for an organizat...

Exploring The Evolution And Adoption of Customer Data Platforms and Reverse ETL

05 Nov 2021

Contributed by Lukas

Summary The precursor to widespread adoption of cloud data warehouses was the creation of customer data platforms. Acting as a centralized repository...

Removing The Barrier To Exploratory Analytics with Activity Schema and Narrator

29 Oct 2021

Contributed by Lukas

Summary The perennial question of data warehousing is how to model the information that you are storing. This has given rise to methods as varied as ...

Streaming Data Pipelines Made SQL With Decodable

29 Oct 2021

Contributed by Lukas

Summary Streaming data systems have been growing more capable and flexible over the past few years. Despite this, it is still challenging to build re...

Data Exploration For Business Users Powered By Analytics Engineering With Lightdash

23 Oct 2021

Contributed by Lukas

Summary The market for business intelligence has been going through an evolutionary shift in recent years. One of the driving forces for that change ...

Completing The Feedback Loop Of Data Through Operational Analytics With Census

21 Oct 2021

Contributed by Lukas

Summary The focus of the past few years has been to consolidate all of the organization’s data into a cloud data warehouse. As a result there h...

Bringing The Power Of The DataHub Real-Time Metadata Graph To Everyone At Acryl Data

16 Oct 2021

Contributed by Lukas

Summary The binding element of all data work is the metadata graph that is generated by all of the workflows that produce the assets used by teams ac...

How And Why To Become Data Driven As A Business

14 Oct 2021

Contributed by Lukas

Summary Organizations of all sizes are striving to become data driven, starting in earnest with the rise of big data a decade ago. With the never-end...

Make Your Business Metrics Reusable With Open Source Headless BI Using Metriql

08 Oct 2021

Contributed by Lukas

Summary The key to making data valuable to business users is the ability to calculate meaningful metrics and explore them along useful dimensions. Bu...

Adding Support For Distributed Transactions To The Redpanda Streaming Engine

06 Oct 2021

Contributed by Lukas

Summary Transactions are a necessary feature for ensuring that a set of actions are all performed as a single unit of work. In streaming systems this...

Building Real-Time Data Platforms For Large Volumes Of Information With Aerospike

02 Oct 2021

Contributed by Lukas

Summary Aerospike is a database engine that is designed to provide millisecond response times for queries across terabytes or petabytes. In this epis...

Delivering Your Personal Data Cloud With Prifina

30 Sep 2021

Contributed by Lukas

Summary The promise of online services is that they will make your life easier in exchange for collecting data about you. The reality is that they us...

Digging Into Data Reliability Engineering

26 Sep 2021

Contributed by Lukas

Summary The accuracy and availability of data has become critically important to the day-to-day operation of businesses. Similar to the practice of s...

Massively Parallel Data Processing In Python Without The Effort Using Bodo

25 Sep 2021

Contributed by Lukas

Summary Python has beome the de facto language for working with data. That has brought with it a number of challenges having to do with the speed and...

Declarative Machine Learning Without The Operational Overhead Using Continual

19 Sep 2021

Contributed by Lukas

Summary Building, scaling, and maintaining the operational components of a machine learning workflow are all hard problems. Add the work of creating ...

An Exploration Of The Data Engineering Requirements For Bioinformatics

19 Sep 2021

Contributed by Lukas

Summary Biology has been gaining a lot of attention in recent years, even before the pandemic. As an outgrowth of that popularity, a new field has gr...

Setting The Stage For The Next Chapter Of The Cassandra Database

12 Sep 2021

Contributed by Lukas

Summary The Cassandra database is one of the first open source options for globally scalable storage systems. Since its introduction in 2008 it has b...

A View From The Round Table Of Gartner's Cool Vendors

09 Sep 2021

Contributed by Lukas

Summary Gartner analysts are tasked with identifying promising companies each year that are making an impact in their respective categories. For busi...

Designing And Building Data Platforms As A Product

04 Sep 2021

Contributed by Lukas

Summary The term "data platform" gets thrown around a lot, but have you stopped to think about what it actually means for you and your orga...

Presto Powered Cloud Data Lakes At Speed Made Easy With Ahana

02 Sep 2021

Contributed by Lukas

Summary The Presto project has become the de facto option for building scalable open source analytics in SQL for the data lake. In recent months the ...

Do Away With Data Integration Through A Dataware Architecture With Cinchy

28 Aug 2021

Contributed by Lukas

Summary The reason that so much time and energy is spent on data integration is because of how our applications are designed. By making the software ...

Decoupling Data Operations From Data Infrastructure Using Nexla

25 Aug 2021

Contributed by Lukas

Summary The technological and social ecosystem of data engineering and data management has been reaching a stage of maturity recently. As part of thi...

Let Your Analysts Build A Data Lakehouse With Cuelake

21 Aug 2021

Contributed by Lukas

Summary Data lakes have been gaining popularity alongside an increase in their sophistication and usability. Despite improvements in performance and ...

Migrate And Modify Your Data Platform Confidently With Compilerworks

18 Aug 2021

Contributed by Lukas

Summary A major concern that comes up when selecting a vendor or technology for storing and managing your data is vendor lock-in. What happens if the...

Prepare Your Unstructured Data For Machine Learning And Computer Vision Without The Toil Using Activeloop

15 Aug 2021

Contributed by Lukas

Summary The vast majority of data tools and platforms that you hear about are designed for working with structured, text-based data. What do you do w...

Build Trust In Your Data By Understanding Where It Comes From And How It Is Used With Stemma

10 Aug 2021

Contributed by Lukas

Summary All of the fancy data platform tools and shiny dashboards that you use are pointless if the consumers of your analysis don’t have trust...

Data Discovery From Dashboards To Databases With Castor

07 Aug 2021

Contributed by Lukas

Summary Every organization needs to be able to use data to answer questions about their business. The trouble is that the data is usually spread acro...

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

03 Aug 2021

Contributed by Lukas

Summary Data lake architectures have largely been biased toward batch processing workflows due to the volume of data that they are designed for. With...

Adding Context And Comprehension To Your Analytics Through Data Discovery With SelectStar

31 Jul 2021

Contributed by Lukas

Summary Companies of all sizes and industries are trying to use the data that they and their customers generate to survive and thrive in the modern e...

«« ← Prev Page 3 of 6 Next → »»