Join Mark and Allen on Two Voice Devs this week as they delve into a critical discussion about data scraping, large language models (LLMs), and the ethical responsibilities of developers. From the recent controversy surrounding BlueSky data scraping and Hugging Face datasets to the complexities of copyright law and personal privacy in the age of AI, this episode explores the gray areas and tough questions facing developers today. Hear their perspectives on the potential misuse of publicly available data, the challenges of anonymization, and the importance of upholding ethical standards in a rapidly evolving technological landscape. They also share personal anecdotes about navigating privacy policies and the dilemmas of data collection for business versus personal use. Tune in to gain valuable insights and contribute to the conversation about responsible development practices. [00:00:00] Introduction [00:01:04] Mark's deep dive into BlueSky's architecture and the data scraping controversy. [00:02:27] Discussion on BlueSky's data policy and user ownership. [00:05:32] Copyright implications of using scraped data in LLMs. [00:06:22] Exploring ethical data sources for LLM training (Wikipedia, Reddit, etc.). [00:08:31] Real-world examples of potential copyright infringement in image and video generation. [00:09:34] Hugging Face's guidelines and the removal of the BlueSky dataset. [00:12:19] The curious case of the "David Meyer" bug in ChatGPT and its implications for data privacy. [00:14:24] Allen's personal dilemma with Vodo Drive's privacy policy and data collection for model training. [00:16:50] Balancing business needs with ethical data practices. [00:17:00] Allen's challenge gathering Gemini release notes and his ethical solution. [00:19:20] The ethical responsibilities of software engineers, drawing parallels to the Challenger disaster. [00:21:19] The developer's role in advocating for ethical data usage. [00:22:21] Call to action: Share your thoughts and perspectives! #DataScraping #LLMs #AIethics #DeveloperEthics #Privacy #Copyright #BlueSky #HuggingFace #SoftwareEngineering #DataPrivacy #AI #TwoVoiceDevs #Podcast #TechPodcast #WebSockets #DataScience #EthicalAI #ResponsibleAI #TechEthics #Gemini #GoogleAI
No persons identified in this episode.
This episode hasn't been transcribed yet
Help us prioritize this episode for transcription by upvoting it.
Popular episodes get transcribed faster
Other recent transcribed episodes
Transcribed and ready to explore now
Trump $82 Million Bond Spree, Brazil Tariffs 'Too High,' More
16 Nov 2025
Bloomberg News Now
Ex-Fed Gov Resigned After Rules Violations, Trump Buys $82 Mil of Bonds, More
16 Nov 2025
Bloomberg News Now
THIS TRUMP INTERVIEW WAS INSANE!
16 Nov 2025
HasanAbi
Epstein Emails and Trump's Alleged Involvement
15 Nov 2025
Conspiracy Theories Exploring The Unseen
New Epstein Emails Directly Implicate Trump - H3 Show #211
15 Nov 2025
H3 Podcast
Trump Humiliates Himself on FOX as They Call Him Out
15 Nov 2025
IHIP News