Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

Data Engineering Podcast

Technology Education

Episodes

Showing 201-300 of 494
«« ← Prev Page 3 of 5 Next → »»

A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore

30 May 2022

Contributed by Lukas

Summary A large fraction of data engineering work involves moving data from one storage location to another in order to support different access and ...

Unlocking The Value Of Data Across The Organization Through User Friendly Data Tools With Prophecy

23 May 2022

Contributed by Lukas

Summary The interfaces and design cues that a tool offers can have a massive impact on who is able to use it and the tasks that they are able to perf...

Cloud Native Data Orchestration For Machine Learning And Data Engineering With Flyte

23 May 2022

Contributed by Lukas

Summary Machine learning has become a meaningful target for data applications, bringing with it an increase in the complexity of orchestrating the en...

Designing And Deploying IoT Analytics For Industrial Applications At Vopak

16 May 2022

Contributed by Lukas

Summary Industrial applications are one of the primary adopters of Internet of Things (IoT) technologies, with business critical operations being inf...

Insights And Advice On Building A Data Lake Platform From Someone Who Learned The Hard Way

16 May 2022

Contributed by Lukas

Summary Designing a data platform is a complex and iterative undertaking which requires accounting for many conflicting needs. Designing a platform t...

Scaling Analysis of Connected Data And Modeling Complex Relationships With The TigerGraph Graph Database

09 May 2022

Contributed by Lukas

Summary Many of the events, ideas, and objects that we try to represent through data have a high degree of connectivity in the real world. These conn...

Exploring The Insights And Impact Of Dan Delorey's Distinguished Career In Data

09 May 2022

Contributed by Lukas

Summary Dan Delorey helped to build the core technologies of Google’s cloud data services for many years before embarking on his latest adventu...

Leading The Charge For The ELT Data Integration Pattern For Cloud Data Warehouses At Matillion

02 May 2022

Contributed by Lukas

Summary The predominant pattern for data integration in the cloud has become extract, load, and then transform or ELT. Matillion was an early innovat...

Evolving And Scaling The Data Platform at Yotpo

02 May 2022

Contributed by Lukas

Summary Building a data platform is an iterative and evolutionary process that requires collaboration with internal stakeholders to ensure that their...

Operational Analytics At Speed With Minimal Busy Work Using Incorta

24 Apr 2022

Contributed by Lukas

Summary A huge amount of effort goes into modeling and shaping data to make it available for analytical purposes. This is often due to the need to si...

Gain Visibility Into Your Entire Machine Learning System Using Data Logging With WhyLogs

24 Apr 2022

Contributed by Lukas

Summary There are very few tools which are equally useful for data engineers, data scientists, and machine learning engineers. WhyLogs is a powerful ...

Connecting To The Next Frontier Of Computing With Quantum Networks

18 Apr 2022

Contributed by Lukas

Summary The next paradigm shift in computing is coming in the form of quantum technologies. Quantum procesors have gained significant attention for t...

What Does It Really Mean To Do MLOps And What Is The Data Engineer's Role?

16 Apr 2022

Contributed by Lukas

Summary Putting machine learning models into production and keeping them there requires investing in well-managed systems to manage the full lifecycl...

DataOps As A Service For Your Data Integration Workflows With Rivery

11 Apr 2022

Contributed by Lukas

Summary Data engineering is a practice that is multi-faceted and requires integration with a large number of systems. This often means working across...

Synthetic Data As A Service For Simplifying Privacy Engineering With Gretel

10 Apr 2022

Contributed by Lukas

Summary Any time that you are storing data about people there are a number of privacy and security considerations that come with it. Privacy engineer...

Accelerate Development Of Enterprise Analytics With The Coalesce Visual Workflow Builder

03 Apr 2022

Contributed by Lukas

Summary The flexibility of software oriented data workflows is useful for fulfilling complex requirements, but for simple and repetitious use cases i...

Repeatable Patterns For Designing Data Platforms And When To Customize Them

03 Apr 2022

Contributed by Lukas

Summary Building a data platform for your organization is a challenging undertaking. Building multiple data platforms for other organizations as a se...

Eliminate The Bottlenecks In Your Key/Value Storage With SpeeDB

27 Mar 2022

Contributed by Lukas

Summary At the foundational layer many databases and data processing engines rely on key/value storage for managing the layout of information on the ...

Building A Data Governance Bridge Between Cloud And Datacenters For The Enterprise At Privacera

27 Mar 2022

Contributed by Lukas

Summary Data governance is a practice that requires a high degree of flexibility and collaboration at the organizational and technical levels. The gr...

Exploring Incident Management Strategies For Data Teams

20 Mar 2022

Contributed by Lukas

Summary Data assets and the pipelines that create them have become critical production infrastructure for companies. This adds a requirement for reli...

Accelerate Your Embedded Analytics With Apache Pinot

20 Mar 2022

Contributed by Lukas

Summary Data and analytics are permeating every system, including customer-facing applications. The introduction of embedded analytics to an end-user...

Accelerating Adoption Of The Modern Data Stack At 5X Data

14 Mar 2022

Contributed by Lukas

Summary The modern data stack is a constantly moving target which makes it difficult to adopt without prior experience. In order to accelerate the ti...

Taking A Multidimensional Approach To Data Observability At Acceldata

14 Mar 2022

Contributed by Lukas

Summary Data observability is a term that has been co-opted by numerous vendors with varying ideas of what it should mean. At Acceldata, they view it...

Move Your Database To The Data And Speed Up Your Analytics With DuckDB

05 Mar 2022

Contributed by Lukas

Summary When you think about selecting a database engine for your project you typically consider options focused on serving multiple concurrent users...

Developer Friendly Application Persistence That Is Fast And Scalable With HarperDB

05 Mar 2022

Contributed by Lukas

Summary Databases are an important component of application architectures, but they are often difficult to work with. HarperDB was created with the c...

Reflections On Designing A Data Platform From Scratch

28 Feb 2022

Contributed by Lukas

Summary Building a data platform is a complex journey that requires a significant amount of planning to do well. It requires knowledge of the availab...

Manage Your Unstructured Data Assets Across Cloud And Hybrid Environments With Komprise

28 Feb 2022

Contributed by Lukas

Summary There are a wealth of options for managing structured and textual data, but unstructured binary data assets are not as well supported across ...

Build Your Python Data Processing Your Way And Run It Anywhere With Fugue

21 Feb 2022

Contributed by Lukas

Summary Python has grown to be one of the top languages used for all aspects of data, from collection and cleaning, to analysis and machine learning....

Understanding The Immune System With Data At ImmunAI

21 Feb 2022

Contributed by Lukas

Summary The life sciences as an industry has seen incredible growth in scale and sophistication, along with the advances in data technology that make...

Bring Your Code To Your Streaming And Static Data Without Effort With The Deephaven Real Time Query Engine

14 Feb 2022

Contributed by Lukas

Summary Streaming data sources are becoming more widely available as tools to handle their storage and distribution mature. However it is still a cha...

Build Your Own End To End Customer Data Platform With Rudderstack

14 Feb 2022

Contributed by Lukas

Summary Collecting, integrating, and activating data are all challenging activities. When that data pertains to your customers it can become even mor...

Scale Your Spatial Analysis By Building It In SQL With Syntax Extensions

07 Feb 2022

Contributed by Lukas

Summary Along with globalization of our societies comes the need to analyze the geospatial and geotemporal data that is needed to manage the growth i...

Scalable Strategies For Protecting Data Privacy In Your Shared Data Sets

06 Feb 2022

Contributed by Lukas

Summary There are many dimensions to the work of protecting the privacy of users in our data. When you need to share a data set with other teams, dep...

A Reflection On Learning A Lot More Than 97 Things Every Data Engineer Should Know

31 Jan 2022

Contributed by Lukas

Summary The Data Engineering Podcast has been going for five years now and has included conversations and interviews with a huge number of guests, co...

Effective Pandas Patterns For Data Engineering

31 Jan 2022

Contributed by Lukas

Summary Pandas is a powerful tool for cleaning, transforming, manipulating, or enriching data, among many other potential uses. As a result it has be...

The Importance Of Data Contracts As The Interface For Data Integration With Abhi Sivasailam

23 Jan 2022

Contributed by Lukas

Summary Data platforms are exemplified by a complex set of connections that are subject to a set of constantly evolving requirements. In order to mak...

Building And Managing Data Teams And Data Platforms In Large Organizations With Ashish Mrig

23 Jan 2022

Contributed by Lukas

Summary Data engineering is a relatively young and rapidly expanding field, with practitioners having a wide array of experiences as they navigate th...

Automated Data Quality Management Through Machine Learning With Anomalo

15 Jan 2022

Contributed by Lukas

Summary Data quality control is a requirement for being able to trust the various reports and machine learning models that are relying on the informa...

An Introduction To Data And Analytics Engineering For Non-Programmers

15 Jan 2022

Contributed by Lukas

Summary Applications of data have grown well beyond the venerable business intelligence dashboards that organizations have relied on for decades. Now...

Open Source Reverse ETL For Everyone With Grouparoo

08 Jan 2022

Contributed by Lukas

Summary Reverse ETL is a product category that evolved from the landscape of customer data platforms with a number of companies offering their own im...

Data Observability Out Of The Box With Metaplane

08 Jan 2022

Contributed by Lukas

Summary Data observability is a set of technical and organizational capabilities related to understanding how your data is being processed and used s...

Creating Shared Context For Your Data Warehouse With A Controlled Vocabulary

02 Jan 2022

Contributed by Lukas

Summary Communication and shared context are the hardest part of any data system. In recent years the focus has been on data catalogs as the means fo...

A Reflection On The Data Ecosystem For The Year 2021

02 Jan 2022

Contributed by Lukas

Summary This has been an active year for the data ecosystem, with a number of new product categories and substantial growth in existing areas. In an ...

Exploring The Evolving Role Of Data Engineers

27 Dec 2021

Contributed by Lukas

Summary Data Engineering is still a relatively new field that is going through a continued evolution as new technologies are introduced and new requi...

Revisiting The Technical And Social Benefits Of The Data Mesh

27 Dec 2021

Contributed by Lukas

Summary The data mesh is a thesis that was presented to address the technical and organizational challenges that businesses face in managing their an...

Fast And Flexible Headless Data Analytics With Cube.JS

21 Dec 2021

Contributed by Lukas

Summary One of the perennial challenges of data analytics is having a consistent set of definitions, along with a flexible and performant API endpoin...

Building A System Of Record For Your Organization's Data Ecosystem At Metaphor

20 Dec 2021

Contributed by Lukas

Summary Building a well managed data ecosystem for your organization requires a holistic view of all of the producers, consumers, and processors of i...

Building Auditable Spark Pipelines At Capital One

13 Dec 2021

Contributed by Lukas

Summary Spark is a powerful and battle tested framework for building highly scalable data pipelines. Because of its proven ability to handle large vo...

Deliver Personal Experiences In Your Applications With The Unomi Open Source Customer Data Platform

12 Dec 2021

Contributed by Lukas

Summary The core to providing your users with excellent service is to understand them and provide a personalized experience. Unfortunately many sites...

Data Driven Hiring For Data Professionals With Alooba

04 Dec 2021

Contributed by Lukas

Summary Hiring data professionals is challenging for a multitude of reasons, and as with every interview process there is a potential for bias to cre...

Experimentation and A/B Testing For Modern Data Teams With Eppo

04 Dec 2021

Contributed by Lukas

Summary A/B testing and experimentation are the most reliable way to determine whether a change to your product will have the desired effect on your ...

Creating A Unified Experience For The Modern Data Stack At Mozart Data

27 Nov 2021

Contributed by Lukas

Summary The modern data stack has been gaining a lot of attention recently with a rapidly growing set of managed services for different stages of the...

Doing DataOps For External Data Sources As A Service at Demyst

27 Nov 2021

Contributed by Lukas

Summary The data that you have access to affects the questions that you can answer. By using external data sources you can drastically increase the r...

Exploring Processing Patterns For Streaming Data Integration In Your Data Lake

20 Nov 2021

Contributed by Lukas

Summary One of the perennial challenges posed by data lakes is how to keep them up to date as new data is collected. With the improvements in streami...

Laying The Foundation Of Your Data Platform For The Era Of Big Complexity With Dagster

20 Nov 2021

Contributed by Lukas

Summary The technology for scaling storage and processing of data has gone through massive evolution over the past decade, leaving us with the abilit...

Data Quality Starts At The Source

14 Nov 2021

Contributed by Lukas

Summary The most important gauge of success for a data platform is the level of trust in the accuracy of the information that it provides. In order t...

Eliminate Friction In Your Data Platform Through Unified Metadata Using OpenMetadata

10 Nov 2021

Contributed by Lukas

Summary A significant source of friction and wasted effort in building and integrating data management systems is the fragmentation of metadata acros...

Business Intelligence Beyond The Dashboard With ClicData

06 Nov 2021

Contributed by Lukas

Summary Business intelligence is often equated with a collection of dashboards that show various charts and graphs representing data for an organizat...

Exploring The Evolution And Adoption of Customer Data Platforms and Reverse ETL

05 Nov 2021

Contributed by Lukas

Summary The precursor to widespread adoption of cloud data warehouses was the creation of customer data platforms. Acting as a centralized repository...

Removing The Barrier To Exploratory Analytics with Activity Schema and Narrator

29 Oct 2021

Contributed by Lukas

Summary The perennial question of data warehousing is how to model the information that you are storing. This has given rise to methods as varied as ...

Streaming Data Pipelines Made SQL With Decodable

29 Oct 2021

Contributed by Lukas

Summary Streaming data systems have been growing more capable and flexible over the past few years. Despite this, it is still challenging to build re...

Data Exploration For Business Users Powered By Analytics Engineering With Lightdash

23 Oct 2021

Contributed by Lukas

Summary The market for business intelligence has been going through an evolutionary shift in recent years. One of the driving forces for that change ...

Completing The Feedback Loop Of Data Through Operational Analytics With Census

21 Oct 2021

Contributed by Lukas

Summary The focus of the past few years has been to consolidate all of the organization’s data into a cloud data warehouse. As a result there h...

Bringing The Power Of The DataHub Real-Time Metadata Graph To Everyone At Acryl Data

16 Oct 2021

Contributed by Lukas

Summary The binding element of all data work is the metadata graph that is generated by all of the workflows that produce the assets used by teams ac...

How And Why To Become Data Driven As A Business

14 Oct 2021

Contributed by Lukas

Summary Organizations of all sizes are striving to become data driven, starting in earnest with the rise of big data a decade ago. With the never-end...

Make Your Business Metrics Reusable With Open Source Headless BI Using Metriql

08 Oct 2021

Contributed by Lukas

Summary The key to making data valuable to business users is the ability to calculate meaningful metrics and explore them along useful dimensions. Bu...

Adding Support For Distributed Transactions To The Redpanda Streaming Engine

06 Oct 2021

Contributed by Lukas

Summary Transactions are a necessary feature for ensuring that a set of actions are all performed as a single unit of work. In streaming systems this...

Building Real-Time Data Platforms For Large Volumes Of Information With Aerospike

02 Oct 2021

Contributed by Lukas

Summary Aerospike is a database engine that is designed to provide millisecond response times for queries across terabytes or petabytes. In this epis...

Delivering Your Personal Data Cloud With Prifina

30 Sep 2021

Contributed by Lukas

Summary The promise of online services is that they will make your life easier in exchange for collecting data about you. The reality is that they us...

Digging Into Data Reliability Engineering

26 Sep 2021

Contributed by Lukas

Summary The accuracy and availability of data has become critically important to the day-to-day operation of businesses. Similar to the practice of s...

Massively Parallel Data Processing In Python Without The Effort Using Bodo

25 Sep 2021

Contributed by Lukas

Summary Python has beome the de facto language for working with data. That has brought with it a number of challenges having to do with the speed and...

Declarative Machine Learning Without The Operational Overhead Using Continual

19 Sep 2021

Contributed by Lukas

Summary Building, scaling, and maintaining the operational components of a machine learning workflow are all hard problems. Add the work of creating ...

An Exploration Of The Data Engineering Requirements For Bioinformatics

19 Sep 2021

Contributed by Lukas

Summary Biology has been gaining a lot of attention in recent years, even before the pandemic. As an outgrowth of that popularity, a new field has gr...

Setting The Stage For The Next Chapter Of The Cassandra Database

12 Sep 2021

Contributed by Lukas

Summary The Cassandra database is one of the first open source options for globally scalable storage systems. Since its introduction in 2008 it has b...

A View From The Round Table Of Gartner's Cool Vendors

09 Sep 2021

Contributed by Lukas

Summary Gartner analysts are tasked with identifying promising companies each year that are making an impact in their respective categories. For busi...

Designing And Building Data Platforms As A Product

04 Sep 2021

Contributed by Lukas

Summary The term "data platform" gets thrown around a lot, but have you stopped to think about what it actually means for you and your orga...

Presto Powered Cloud Data Lakes At Speed Made Easy With Ahana

02 Sep 2021

Contributed by Lukas

Summary The Presto project has become the de facto option for building scalable open source analytics in SQL for the data lake. In recent months the ...

Do Away With Data Integration Through A Dataware Architecture With Cinchy

28 Aug 2021

Contributed by Lukas

Summary The reason that so much time and energy is spent on data integration is because of how our applications are designed. By making the software ...

Decoupling Data Operations From Data Infrastructure Using Nexla

25 Aug 2021

Contributed by Lukas

Summary The technological and social ecosystem of data engineering and data management has been reaching a stage of maturity recently. As part of thi...

Let Your Analysts Build A Data Lakehouse With Cuelake

21 Aug 2021

Contributed by Lukas

Summary Data lakes have been gaining popularity alongside an increase in their sophistication and usability. Despite improvements in performance and ...

Migrate And Modify Your Data Platform Confidently With Compilerworks

18 Aug 2021

Contributed by Lukas

Summary A major concern that comes up when selecting a vendor or technology for storing and managing your data is vendor lock-in. What happens if the...

Prepare Your Unstructured Data For Machine Learning And Computer Vision Without The Toil Using Activeloop

15 Aug 2021

Contributed by Lukas

Summary The vast majority of data tools and platforms that you hear about are designed for working with structured, text-based data. What do you do w...

Build Trust In Your Data By Understanding Where It Comes From And How It Is Used With Stemma

10 Aug 2021

Contributed by Lukas

Summary All of the fancy data platform tools and shiny dashboards that you use are pointless if the consumers of your analysis don’t have trust...

Data Discovery From Dashboards To Databases With Castor

07 Aug 2021

Contributed by Lukas

Summary Every organization needs to be able to use data to answer questions about their business. The trouble is that the data is usually spread acro...

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

03 Aug 2021

Contributed by Lukas

Summary Data lake architectures have largely been biased toward batch processing workflows due to the volume of data that they are designed for. With...

Adding Context And Comprehension To Your Analytics Through Data Discovery With SelectStar

31 Jul 2021

Contributed by Lukas

Summary Companies of all sizes and industries are trying to use the data that they and their customers generate to survive and thrive in the modern e...

Building a Multi-Tenant Managed Platform For Streaming Data With Pulsar at Datastax

28 Jul 2021

Contributed by Lukas

Summary Everyone expects data to be transmitted, processed, and updated instantly as more and more products integrate streaming data. The technology ...

Bringing The Metrics Layer To The Masses With Transform

23 Jul 2021

Contributed by Lukas

Summary Collecting and cleaning data is only useful if someone can make sense of it afterward. The latest evolution in the data ecosystem is the intr...

Strategies For Proactive Data Quality Management

20 Jul 2021

Contributed by Lukas

Summary Data quality is a concern that has been gaining attention alongside the rising importance of analytics for business success. Many solutions r...

Low Code And High Quality Data Engineering For The Whole Organization With Prophecy

16 Jul 2021

Contributed by Lukas

Summary There is a wealth of tools and systems available for processing data, but the user experience of integrating them and building workflows is s...

Exploring The Design And Benefits Of The Modern Data Stack

13 Jul 2021

Contributed by Lukas

Summary We have been building platforms and workflows to store, process, and analyze data since the earliest days of computing. Over that time there ...

Democratize Data Cleaning Across Your Organization With Trifacta

09 Jul 2021

Contributed by Lukas

Summary Every data project, whether it’s analytics, machine learning, or AI, starts with the work of data cleaning. This is a critical step and...

Stick All Of Your Systems And Data Together With SaaSGlue As Your Workflow Manager

05 Jul 2021

Contributed by Lukas

Summary At the core of every data pipeline is an workflow manager (or several). Deploying, managing, and scaling that orchestration can consume a lar...

Leveling Up Open Source Data Integration With Meltano Hub And The Singer SDK

03 Jul 2021

Contributed by Lukas

Summary Data integration in the form of extract and load is the critical first step of every data project. There are a large number of commercial and...

A Candid Exploration Of Timeseries Data Analysis With InfluxDB

29 Jun 2021

Contributed by Lukas

Summary While the overall concept of timeseries data is uniform, its usage and applications are far from it. One of the most demanding applications o...

Lessons Learned From The Pipeline Data Engineering Academy

26 Jun 2021

Contributed by Lukas

Summary Data Engineering is a broad and constantly evolving topic, which makes it difficult to teach in a concise and effective manner. Despite that,...

Make Database Performance Optimization A Playful Experience With OtterTune

23 Jun 2021

Contributed by Lukas

Summary The database is the core of any system because it holds the data that drives your entire experience. We spend countless hours designing the d...

Bring Order To The Chaos Of Your Unstructured Data Assets With Unstruk

18 Jun 2021

Contributed by Lukas

Summary Working with unstructured data has typically been a motivation for a data lake. The challenge is imposing enough order on the platform to mak...

Accelerating ML Training And Delivery With In-Database Machine Learning

15 Jun 2021

Contributed by Lukas

Summary When you build a machine learning model, the first step is always to load your data. Typically this means downloading files from object stora...

Taking A Tour Of The Google Cloud Platform For Data And Analytics

12 Jun 2021

Contributed by Lukas

Summary Google pioneered an impressive number of the architectural underpinnings of the broader big data ecosystem. Now they offer the technologies t...

«« ← Prev Page 3 of 5 Next → »»