Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

Two Voice Devs

Episode 219 - The Ethics of Data Scraping and LLMs

12 Dec 2024

Description

Join Mark and Allen on Two Voice Devs this week as they delve into a critical discussion about data scraping, large language models (LLMs), and the ethical responsibilities of developers. From the recent controversy surrounding BlueSky data scraping and Hugging Face datasets to the complexities of copyright law and personal privacy in the age of AI, this episode explores the gray areas and tough questions facing developers today. Hear their perspectives on the potential misuse of publicly available data, the challenges of anonymization, and the importance of upholding ethical standards in a rapidly evolving technological landscape. They also share personal anecdotes about navigating privacy policies and the dilemmas of data collection for business versus personal use. Tune in to gain valuable insights and contribute to the conversation about responsible development practices. [00:00:00] Introduction [00:01:04] Mark's deep dive into BlueSky's architecture and the data scraping controversy. [00:02:27] Discussion on BlueSky's data policy and user ownership. [00:05:32] Copyright implications of using scraped data in LLMs. [00:06:22] Exploring ethical data sources for LLM training (Wikipedia, Reddit, etc.). [00:08:31] Real-world examples of potential copyright infringement in image and video generation. [00:09:34] Hugging Face's guidelines and the removal of the BlueSky dataset. [00:12:19] The curious case of the "David Meyer" bug in ChatGPT and its implications for data privacy. [00:14:24] Allen's personal dilemma with Vodo Drive's privacy policy and data collection for model training. [00:16:50] Balancing business needs with ethical data practices. [00:17:00] Allen's challenge gathering Gemini release notes and his ethical solution. [00:19:20] The ethical responsibilities of software engineers, drawing parallels to the Challenger disaster. [00:21:19] The developer's role in advocating for ethical data usage. [00:22:21] Call to action: Share your thoughts and perspectives! #DataScraping #LLMs #AIethics #DeveloperEthics #Privacy #Copyright #BlueSky #HuggingFace #SoftwareEngineering #DataPrivacy #AI #TwoVoiceDevs #Podcast #TechPodcast #WebSockets #DataScience #EthicalAI #ResponsibleAI #TechEthics #Gemini #GoogleAI

Audio
Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes
🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Comments

There are no comments yet.

Please log in to write the first comment.