Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

Data Engineering Podcast

Technology Education

Episodes

Showing 301-400 of 494
«« ← Prev Page 4 of 5 Next → »»

Make Sure Your Records Are Reliable With The BookKeeper Distributed Storage Layer

09 Jun 2021

Contributed by Lukas

Summary The way to build maintainable software and systems is through composition of individual pieces. By making those pieces high quality and flexi...

Build Your Analytics With A Collaborative And Expressive SQL IDE Using Querybook

03 Jun 2021

Contributed by Lukas

Summary SQL is the most widely used language for working with data, and yet the tools available for writing and collaborating on it are still clunky ...

Making Data Pipelines Self-Serve For Everyone With Shipyard

02 Jun 2021

Contributed by Lukas

Summary Every part of the business relies on data, yet only a small team has the context and expertise to build and maintain workflows and data pipel...

Paving The Road For Fast Analytics On Distributed Clouds With The Yellowbrick Data Warehouse

28 May 2021

Contributed by Lukas

Summary The data warehouse has become the focal point of the modern data platform. With increased usage of data across businesses, and a diversity of...

Easily Build Advanced Similarity Search With The Pinecone Vector Database

25 May 2021

Contributed by Lukas

Summary Machine learning models use vectors as the natural mechanism for representing their internal state. The problem is that in order for the mode...

A Holistic Approach To Data Governance Through Self Reflection At Collibra

21 May 2021

Contributed by Lukas

Summary Data governance is a phrase that means many different things to many different people. This is because it is actually a concept that encompas...

Unlocking The Power of Data Lineage In Your Platform with OpenLineage

18 May 2021

Contributed by Lukas

Summary Data lineage is the common thread that ties together all of your data pipelines, workflows, and systems. In order to get a holistic understan...

Building Your Data Warehouse On Top Of PostgreSQL

14 May 2021

Contributed by Lukas

Summary There is a lot of attention on the database market and cloud data warehouses. While they provide a measure of convenience, they also require ...

Making Analytical APIs Fast With Tinybird

11 May 2021

Contributed by Lukas

Summary Building an API for real-time data is a challenging project. Making it robust, scalable, and fast is a full time job. The team at Tinybird wa...

Making Spark Cloud Native At Data Mechanics

07 May 2021

Contributed by Lukas

Summary Spark is one of the most well-known frameworks for data processing, whether for batch or streaming, ETL or ML, and at any scale. Because of i...

The Grand Vision And Present Reality of DataOps

04 May 2021

Contributed by Lukas

Summary The Data industry is changing rapidly, and one of the most active areas of growth is automation of data workflows. Taking cues from the DevOp...

Self Service Data Exploration And Dashboarding With Superset

27 Apr 2021

Contributed by Lukas

Summary The reason for collecting, cleaning, and organizing data is to make it usable by the organization. One of the most common and widely used...

Moving Machine Learning Into The Data Pipeline at Cherre

20 Apr 2021

Contributed by Lukas

Summary Most of the time when you think about a data pipeline or ETL job what comes to mind is a purely mechanistic progression of functions that mov...

Exploring The Expanding Landscape Of Data Professions with Josh Benamram of Databand

13 Apr 2021

Contributed by Lukas

Summary "Business as usual" is changing, with more companies investing in data as a first class concern. As a result, the data team is grow...

Put Your Whole Data Team On The Same Page With Atlan

06 Apr 2021

Contributed by Lukas

Summary One of the biggest obstacles to success in delivering data products is cross-team collaboration. Part of the problem is the difference in the...

Data Quality Management For The Whole Team With Soda Data

30 Mar 2021

Contributed by Lukas

Summary Data quality is on the top of everyone’s mind recently, but getting it right is as challenging as ever. One of the contributing factors...

Real World Change Data Capture At Datacoral

23 Mar 2021

Contributed by Lukas

Summary The world of business is becoming increasingly dependent on information that is accurate up to the minute. For analytical systems, the only w...

Managing The DoorDash Data Platform

16 Mar 2021

Contributed by Lukas

Summary The team at DoorDash has a complex set of optimization challenges to deal with using data that they collect from a multi-sided marketplace. I...

Leave Your Data Where It Is And Automate Feature Extraction With Molecula

09 Mar 2021

Contributed by Lukas

Summary A majority of the time spent in data engineering is copying data between systems to make the information available for different purposes. Th...

Bridging The Gap Between Machine Learning And Operations At Iguazio

02 Mar 2021

Contributed by Lukas

Summary The process of building and deploying machine learning projects requires a staggering number of systems and stakeholders to work in concert. ...

Self Service Open Source Data Integration With AirByte

23 Feb 2021

Contributed by Lukas

Summary Data integration is a critical piece of every data pipeline, yet it is still far from being a solved problem. There are a number of managed p...

Building The Foundations For Data Driven Businesses at 5xData

16 Feb 2021

Contributed by Lukas

Summary Every business aims to be data driven, but not all of them succeed in that effort. In order to be able to truly derive insights from the data...

How Shopify Is Building Their Production Data Warehouse Using DBT

09 Feb 2021

Contributed by Lukas

Summary With all of the tools and services available for building a data platform it can be difficult to separate the signal from the noise. One of t...

System Observability For The Cloud Native Era With Chronosphere

02 Feb 2021

Contributed by Lukas

Summary Collecting and processing metrics for monitoring use cases is an interesting data problem. It is eminently possible to generate millions or b...

Making It Easier To Stick B2B Data Integration Pipelines Together With Hotglue

26 Jan 2021

Contributed by Lukas

Summary Businesses often need to be able to ingest data from their customers in order to power the services that they provide. For each new source th...

Using Your Data Warehouse As The Source Of Truth For Customer Data With Hightouch

19 Jan 2021

Contributed by Lukas

Summary The data warehouse has become the central component of the modern data stack. Building on this pattern, the team at Hightouch have created a ...

Enabling Version Controlled Data Collaboration With TerminusDB

11 Jan 2021

Contributed by Lukas

Summary As data professionals we have a number of tools available for storing, processing, and analyzing data. We also have tools for collaborating o...

Bringing Feature Stores and MLOps to the Enterprise at Tecton

05 Jan 2021

Contributed by Lukas

Summary As more organizations are gaining experience with data management and incorporating analytics into their decision making, their next move is ...

Off The Shelf Data Governance With Satori

28 Dec 2020

Contributed by Lukas

Summary One of the core responsibilities of data engineers is to manage the security of the information that they process. The team at Satori has a b...

Low Friction Data Governance With Immuta

21 Dec 2020

Contributed by Lukas

Summary Data governance is a term that encompasses a wide range of responsibilities, both technical and process oriented. One of the more complex asp...

Building A Self Service Data Platform For Alternative Data Analytics At YipitData

15 Dec 2020

Contributed by Lukas

Summary As a data engineer you’re familiar with the process of collecting data from databases, customer data platforms, APIs, etc. At Yipit...

Proven Patterns For Building Successful Data Teams

07 Dec 2020

Contributed by Lukas

Summary Building data products are complicated by the fact that there are so many different stakeholders with competing goals and priorities. It is a...

Streaming Data Integration Without The Code at Equalum

30 Nov 2020

Contributed by Lukas

Summary The first stage of every good pipeline is to perform data integration. With the increasing pace of change and the need for up to date analyti...

Keeping A Bigeye On The Data Quality Market

23 Nov 2020

Contributed by Lukas

Summary One of the oldest aphorisms about data is "garbage in, garbage out", which is why the current boom in data quality solutions is no ...

Self Service Data Management From Ingest To Insights With Isima

17 Nov 2020

Contributed by Lukas

Summary The core mission of data engineers is to provide the business with a way to ask and answer questions of their data. This often takes the form...

Building A Cost Effective Data Catalog With Tree Schema

10 Nov 2020

Contributed by Lukas

Summary A data catalog is a critical piece of infrastructure for any organization who wants to build analytics products, whether internal or external...

Add Version Control To Your Data Lake With LakeFS

03 Nov 2020

Contributed by Lukas

Summary Data lakes are gaining popularity due to their flexibility and reduced cost of storage. Along with the benefits there are some additional com...

Cloud Native Data Security As Code With Cyral

26 Oct 2020

Contributed by Lukas

Summary One of the most challenging aspects of building a data platform has nothing to do with pipelines and transformations. If you are putting your...

Better Data Quality Through Observability With Monte Carlo

19 Oct 2020

Contributed by Lukas

Summary In order for analytics and machine learning projects to be useful, they require a high degree of data quality. To ensure that your pipelines ...

Rapid Delivery Of Business Intelligence Using Power BI

12 Oct 2020

Contributed by Lukas

Summary Business intelligence efforts are only as useful as the outcomes that they inform. Power BI aims to reduce the time and effort required to go...

Self Service Real Time Data Integration Without The Headaches With Meroxa

05 Oct 2020

Contributed by Lukas

Summary Analytical workloads require a well engineered and well maintained data integration process to ensure that your information is reliable and u...

Speed Up And Simplify Your Streaming Data Workloads With Red Panda

29 Sep 2020

Contributed by Lukas

Summary Kafka has become a de facto standard interface for building decoupled systems and working with streaming data. Despite its widespread popular...

Cutting Through The Noise And Focusing On The Fundamentals Of Data Engineering With The Data Janitor

22 Sep 2020

Contributed by Lukas

Summary Data engineering is a constantly growing and evolving discipline. There are always new tools, systems, and design patterns to learn, which le...

Distributed In Memory Processing And Streaming With Hazelcast

15 Sep 2020

Contributed by Lukas

Summary In memory computing provides significant performance benefits, but brings along challenges for managing failures and scaling up. Hazelcast is...

Simplify Your Data Architecture With The Presto Distributed SQL Engine

07 Sep 2020

Contributed by Lukas

Summary Databases are limited in scope to the information that they directly contain. For analytical use cases you often want to combine data across ...

Building A Better Data Warehouse For The Cloud At Firebolt

01 Sep 2020

Contributed by Lukas

Summary Data warehouse technology has been around for decades and has gone through several generational shifts in that time. The current trends in da...

Metadata Management And Integration At LinkedIn With DataHub

25 Aug 2020

Contributed by Lukas

Summary In order to scale the use of data across an organization there are a number of challenges related to discovery, governance, and integration t...

Exploring The TileDB Universal Data Engine

17 Aug 2020

Contributed by Lukas

Summary Most databases are designed to work with textual data, with some special purpose engines that support domain specific formats. TileDB is a da...

Closing The Loop On Event Data Collection With Iteratively

10 Aug 2020

Contributed by Lukas

Summary Event based data is a rich source of information for analytics, unless none of the event structures are consistent. The team at Iteratively a...

A Practical Introduction To Graph Data Applications

04 Aug 2020

Contributed by Lukas

Summary Finding connections between data and the entities that they represent is a complex problem. Graph data models and the applications built on t...

Build More Reliable Distributed Systems By Breaking Them With Jepsen

28 Jul 2020

Contributed by Lukas

Summary A majority of the scalable data processing platforms that we rely on are built as distributed systems. This brings with it a vast number of s...

Making Wind Energy More Efficient With Data At Turbit Systems

21 Jul 2020

Contributed by Lukas

Summary Wind energy is an important component of an ecologically friendly power system, but there are a number of variables that can affect the overa...

Open Source Production Grade Data Integration With Meltano

13 Jul 2020

Contributed by Lukas

Summary The first stage of every data pipeline is extracting the information from source systems. There are a number of platforms for managing data i...

DataOps For Streaming Systems With Lenses.io

06 Jul 2020

Contributed by Lukas

Summary There are an increasing number of use cases for real time data, and the systems to power them are becoming more mature. Once you have a strea...

Data Collection And Management To Power Sound Recognition At Audio Analytic

30 Jun 2020

Contributed by Lukas

Summary We have machines that can listen to and process human speech in a variety of languages, but dealing with unstructured sounds in our environme...

Bringing Business Analytics To End Users With GoodData

23 Jun 2020

Contributed by Lukas

Summary The majority of analytics platforms are focused on use internal to an organization by business stakeholders. As the availability of data incr...

Accelerate Your Machine Learning With The StreamSQL Feature Store

15 Jun 2020

Contributed by Lukas

Summary Machine learning is a process driven by iteration and experimentation which requires fast and easy access to relevant features of the data be...

Data Management Trends From An Investor Perspective

08 Jun 2020

Contributed by Lukas

Summary The landscape of data management and processing is rapidly changing and evolving. There are certain foundational elements that have remained ...

Building A Data Lake For The Database Administrator At Upsolver

02 Jun 2020

Contributed by Lukas

Summary Data lakes offer a great deal of flexibility and the potential for reduced cost for your analytics, but they also introduce a great deal of c...

Mapping The Customer Journey For B2B Companies At Dreamdata

25 May 2020

Contributed by Lukas

Summary Gaining a complete view of the customer journey is especially difficult in B2B companies. This is due to the number of different individuals ...

Power Up Your PostgreSQL Analytics With Swarm64

18 May 2020

Contributed by Lukas

Summary The PostgreSQL database is massively popular due to its flexibility and extensive ecosystem of extensions, but it is still not the first choi...

StreamNative Brings Streaming Data To The Cloud Native Landscape With Pulsar

11 May 2020

Contributed by Lukas

Summary There have been several generations of platforms for managing streaming data, each with their own strengths and weaknesses, and different are...

Enterprise Data Operations And Orchestration At Infoworks

04 May 2020

Contributed by Lukas

Summary Data management is hard at any scale, but working in the context of an enterprise organization adds even greater complexity. Infoworks is a p...

Taming Complexity In Your Data Driven Organization With DataOps

28 Apr 2020

Contributed by Lukas

Summary Data is a critical element to every role in an organization, which is also what makes managing it so challenging. With so many different opin...

Building Real Time Applications On Streaming Data With Eventador

20 Apr 2020

Contributed by Lukas

Summary Modern applications frequently require access to real-time data, but building and maintaining the systems that make that possible is a comple...

Making Data Collection In Your Code Easy With Rookout

14 Apr 2020

Contributed by Lukas

Summary The software applications that we build for our businesses are a rich source of data, but accessing and extracting that data is often a slow ...

Building A Knowledge Graph Of Commercial Real Estate At Cherre

07 Apr 2020

Contributed by Lukas

Summary Knowledge graphs are a data resource that can answer questions beyond the scope of traditional data analytics. By organizing and storing data...

The Life Of A Non-Profit Data Professional

30 Mar 2020

Contributed by Lukas

Summary Building and maintaining a system that integrates and analyzes all of the data for your organization is a complex endeavor. Operating on a sh...

Behind The Scenes Of The Linode Object Storage Service

23 Mar 2020

Contributed by Lukas

Summary There are a number of platforms available for object storage, including self-managed open source projects. But what goes on behind the scenes...

Building A New Foundation For CouchDB

17 Mar 2020

Contributed by Lukas

Summary CouchDB is a distributed document database built for scale and ease of operation. With a built-in synchronization protocol and a HTTP interfa...

Scaling Data Governance For Global Businesses With A Data Hub Architecture

09 Mar 2020

Contributed by Lukas

Summary Data governance is a complex endeavor, but scaling it to meet the needs of a complex or globally distributed organization requires a well con...

Easier Stream Processing On Kafka With ksqlDB

02 Mar 2020

Contributed by Lukas

Summary Building applications on top of unbounded event streams is a complex endeavor, requiring careful integration of multiple disparate systems th...

Shining A Light on Shadow IT In Data And Analytics

25 Feb 2020

Contributed by Lukas

Summary Misaligned priorities across business units can lead to tensions that drive members of the organization to build data and analytics projects ...

Data Infrastructure Automation For Private SaaS At Snowplow

18 Feb 2020

Contributed by Lukas

Summary One of the biggest challenges in building reliable platforms for processing event pipelines is managing the underlying infrastructure. At Sno...

Data Modeling That Evolves With Your Business Using Data Vault

09 Feb 2020

Contributed by Lukas

Summary Designing the structure for your data warehouse is a complex and challenging process. As businesses deal with a growing number of sources and...

The Benefits And Challenges Of Building A Data Trust

03 Feb 2020

Contributed by Lukas

Summary Every business collects data in some fashion, but sometimes the true value of the collected information only comes when it is combined with o...

Pay Down Technical Debt In Your Data Pipeline With Great Expectations

27 Jan 2020

Contributed by Lukas

Summary Data pipelines are complicated and business critical pieces of technical infrastructure. Unfortunately they are also complex and difficult to...

Replatforming Production Dataflows

20 Jan 2020

Contributed by Lukas

Summary Building a reliable data platform is a neverending task. Even if you have a process that works for you and your business there can be unexpec...

Planet Scale SQL For The New Generation Of Applications With YugabyteDB

13 Jan 2020

Contributed by Lukas

SummaryThe modern era of software development is identified by ubiquitous access to elastic infrastructure for computation and easy automation of depl...

Change Data Capture For All Of Your Databases With Debezium

06 Jan 2020

Contributed by Lukas

Summary Databases are useful for inspecting the current state of your application, but inspecting the history of that data can get messy without a wa...

Building The DataDog Platform For Processing Timeseries Data At Massive Scale

30 Dec 2019

Contributed by Lukas

Summary DataDog is one of the most successful companies in the space of metrics and monitoring for servers and cloud infrastructure. In order to supp...

Building The Materialize Engine For Interactive Streaming Analytics In SQL

23 Dec 2019

Contributed by Lukas

Summary Transactional databases used in applications are optimized for fast reads and writes with relatively simple queries on a small number of reco...

Solving Data Lineage Tracking And Data Discovery At WeWork

16 Dec 2019

Contributed by Lukas

Summary Building clean datasets with reliable and reproducible ingestion pipelines is completely useless if it’s not possible to find them and ...

SnowflakeDB: The Data Warehouse Built For The Cloud

09 Dec 2019

Contributed by Lukas

Summary Data warehouses have gone through many transformations, from standard relational databases on powerful hardware, to column oriented storage e...

Organizing And Empowering Data Engineers At Citadel

03 Dec 2019

Contributed by Lukas

Summary The financial industry has long been driven by data, requiring a mature and robust capacity for discovering and integrating valuable sources ...

Building A Real Time Event Data Warehouse For Sentry

26 Nov 2019

Contributed by Lukas

Summary The team at Sentry has built a platform for anyone in the world to send software errors and events. As they scaled the volume of customers an...

Escaping Analysis Paralysis For Your Data Platform With Data Virtualization

18 Nov 2019

Contributed by Lukas

Summary With the constant evolution of technology for data management it can seem impossible to make an informed decision about whether to build a da...

Designing For Data Protection

11 Nov 2019

Contributed by Lukas

Summary The practice of data management is one that requires technical acumen, but there are also many policy and regulatory issues that inform and i...

Automating Your Production Dataflows On Spark

04 Nov 2019

Contributed by Lukas

Summary As data engineers the health of our pipelines is our highest priority. Unfortunately, there are countless ways that our dataflows can break o...

Build Maintainable And Testable Data Applications With Dagster

28 Oct 2019

Contributed by Lukas

Summary Despite the fact that businesses have relied on useful and accurate data to succeed for decades now, the state of the art for obtaining and m...

Data Orchestration For Hybrid Cloud Analytics

22 Oct 2019

Contributed by Lukas

Summary The scale and complexity of the systems that we build to satisfy business requirements is increasing as the available tools become more sophi...

Keeping Your Data Warehouse In Order With DataForm

15 Oct 2019

Contributed by Lukas

Summary Managing a data warehouse can be challenging, especially when trying to maintain a common set of patterns. Dataform is a platform that helps ...

Fast Analytics On Semi-Structured And Structured Data In The Cloud

08 Oct 2019

Contributed by Lukas

Summary The process of exposing your data through a SQL interface has many possible pathways, each with their own complications and tradeoffs. One of...

Ship Faster With An Opinionated Data Pipeline Framework

01 Oct 2019

Contributed by Lukas

Summary Building an end-to-end data pipeline for your machine learning projects is a complex task, made more difficult by the variety of ways that yo...

Open Source Object Storage For All Of Your Data

23 Sep 2019

Contributed by Lukas

Summary Object storage is quickly becoming the unifying layer for data intensive applications and analytics. Modern, cloud oriented data warehouses a...

Navigating Boundless Data Streams With The Swim Kernel

18 Sep 2019

Contributed by Lukas

Summary The conventional approach to analytics involves collecting large amounts of data that can be cleaned, followed by a separate step for analysi...

Building A Reliable And Performant Router For Observability Data

10 Sep 2019

Contributed by Lukas

Summary The first stage in every data project is collecting information and routing it to a storage system for later analysis. For operational data t...

Building A Community For Data Professionals at Data Council

02 Sep 2019

Contributed by Lukas

Summary Data professionals are working in a domain that is rapidly evolving. In order to stay current we need access to deeply technical presentation...

Building Tools And Platforms For Data Analytics

26 Aug 2019

Contributed by Lukas

Summary Data engineers are responsible for building tools and platforms to power the workflows of other members of the business. Each group of users ...

A High Performance Platform For The Full Big Data Lifecycle

19 Aug 2019

Contributed by Lukas

Summary Managing big data projects at scale is a perennial problem, with a wide variety of solutions that have evolved over the past 20 years. One of...

«« ← Prev Page 4 of 5 Next → »»