Technology

72828 readers

3491 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

110

NYT Disables Open AI bot (www.theverge.com)

submitted 2 years ago by Anonymousllama@lemmy.world to c/technology@lemmy.world

7 comments fedilink hide all child comments

NYT looks like it's updated it's robots.txt file to disallow the Open AI bot from scraping it's data. Pretty interested to see if they just update their user agent string or if they'll respect it

all 8 comments

sorted by: hot top controversial new old

[–] plz1@lemmy.world 36 points 2 years ago (1 children)

Updating user agent doesn't natter unless NYT is actively blocking that, too. Updating robots.txt is purely a "gentleman's agreement" that OpenAI will respect it. OpenAI would be dumb to ignore it, hat all said, because it'd trigger the lawyer shenanigans to ensue.

[–] DocMcStuffin@lemmy.world 12 points 2 years ago (1 children)

NYT is already considering a lawsuit against OpenAI. So, not just dumb but arrogantly stupid when the lawyers are already in the room.

[–] iforgotmyinstance@lemmy.world 9 points 2 years ago

The burden of proof will fall upon the NYT and it will be extremely difficult to prove OpenAI is culpable for any infringement that it's end users perform.

It's new territory and will be expensive, but NYT is old money and has the liquidity to burn cash all day.

[–] autotldr@lemmings.world 8 points 2 years ago

This is the best summary I could come up with:

Based on the Internet Archive’s Wayback Machine, it appears NYT blocked the crawler as early as August 17th.

The change comes after the NYT updated its terms of service at the beginning of this month to prohibit the use of its content to train AI models.

OpenAI didn’t immediately reply to a request for comment.

The NYT is also considering legal action against OpenAI for intellectual property rights violations, NPR reported last week.

If it did sue, the Times would be joining others like Sarah Silverman and two other authors who sued the company in July over its use of Books3, a dataset used to train ChatGPT that may have thousands of copyrighted works, as well as Matthew Butterick, a programmer and lawyer who alleges the company’s data scraping practices amount to software piracy.

Update August 21st, 7:55PM ET: The New York Times declined to comment.

The original article contains 202 words, the summary contains 146 words. Saved 28%. I'm a bot and I'm open source!

[–] cbarrick@lemmy.world 6 points 2 years ago (1 children)

But all those reposts on Reddit and Lemmy are still fair game...

[–] simonced@lemmy.one 11 points 2 years ago (1 children)

shared by humans is not the same as crawled by bots...

[–] WarmSoda@lemm.ee 4 points 2 years ago

I wonder how much of a boost sites get from Reddit and lemmy, etc. Even with posts that have the text copy/pasted I imagine it has to give them traffic.