Billions of public Discord messages may be sold through a scraping service - Article

edited April 18 in General

"Joseph Cox at 404 Media confirmed that Spy Pet, a service that sells access to a database of purportedly 3 billion Discord messages, offers data "credits" to customers who pay in bitcoin, ethereum, or other cryptocurrency. Searching individual users will reveal the servers that Spy Pet can track them across, a raw and exportable table of their messages, and connected accounts, such as GitHub. Ominously, Spy Pet lists more than 86,000 other servers in which it has "no bots," but "we know it exists."

"As Cox notes, Discord doesn't make messages inside server channels, like blog posts or unlocked social media feeds, easy to publicly access and search. But many Discord users many not expect their messages, server memberships, bans, or other data to be grabbed by a bot, compiled, and sold to anybody wishing to pin them all on a particular user. 404 Media confirmed the service's function with multiple user examples. Private messages are not mentioned by Spy Pet and are presumably still secure."

"Spy Pet openly asks those training AI models, or "federal agents looking for a new source of intel," to contact them for deals. As noted by 404 Media and confirmed by Ars, clicking on the "Request Removal" link plays a clip of J. Jonah Jameson from Spider-Man (the Tobey Maguire/Sam Raimi version) laughing at the idea of advance payment before an abrupt "You're serious?" Users of Spy Pet, however, are assured of "secure and confidential" searches, with random usernames."

More:
https://arstechnica.com/?p=2017957

Comments

  • somiksomik OG
    edited April 18

    One simple rule: If it's on the internet, it's going to be exploited.

    Anything you don't want going public or being sold should not be put on the internet. Facebook, insta, tiktok, whatsapp, all are selling your data. So why not a bot scraping discord? For all i know, there are multiple bots scraping this very message I am typing now :lol:

    Thanked by (4)vyas host_c treesmokah cold

    Websites have ads, I have ad-blocker.

  • host_chost_c Hosting Provider
    edited April 19

    “You can find anything about anyone on the internet” - The Fast and the Furious 2001 =)

    Thanked by (1)dev_vps

    Host-C - VPS Services Provider - AS211462

    "If there is no struggle there is no progress"

  • edited April 19

    @somik said:
    One simple rule: If it's on the internet, it's going to be exploited.

    Anything you don't want going public or being sold should not be put on the internet. Facebook, insta, tiktok, whatsapp, all are selling your data. So why not a bot scraping discord? For all i know, there are multiple bots scraping this very message I am typing now :lol:

    @host_c said:
    “You can find anything about anyone on the internet” - The Fast and the Furious 2001 =)

    I think the more interesting thing I was reading on some side comments was it seemingly was able to scrape old, non-expired room invitations and were able to scrape private rooms too. I'm not a Discord dude so I don't know how all that all works but knowing how many folks here do actively use it, just wanted to give a heads up.

    Given Discord's "privacy" - people should have been assuming it's been scraped/sold long before this - just interesting to see it offered at a consumer level vs being disseminated via the typical data broker channels behind the scenes. Are there any other sites like this operating on the clearweb?

  • edited April 19

    Since Joseph Cox tried to make it about Kiwi Farms for whatever reason, here is their response to his accusations:

    I'm glad ARS doesn't repeat his lies.

    Screenshots



  • So? they can publish them as a boom

  • AuroraZeroAuroraZero Moderator

    Sweeeet!!! My goatse might make it mainstream then!

    Thanked by (1)Janevski

    Free Hosting at YetiNode | Cryptid Security | URL Shortener | LaunchVPS | ExtraVM | Host-C | In the Node, or Out of the Loop?

  • I used run similar 'scraper', started with only for having fun learning how to make chrome extensions but it ended to be a pretty decent project.

    sold the project back in 2021, it has some features;

    • can run in 32GB ram server for 15 concurrent browser profile, that's 15 discord account active at once, 1500 server monitored max
    • output are thrown into logsink and processed in opensearch for viewing
    • other logsink processing makes it into corpus format

    the data itself has been used for several purpose,

    • the current project owner promised to release it in future for free similar to textfiles.com
    • collected data are compiled to datasets, it's used to mimic user writing style using LLM
    • datasets are also used to track user behavior across servers, it's also possible to link profiles into one person from their writing style alone

    it sickens me how it's basically a privacy violation. you'd expect conversations in 'private' server, but nope, if you decided to make invitation links and anyone can join then you should expect getting logged by any account in same server as yours. but well, what can i do? i get the money after all. it sucks but i'll just move on

    Fuck this 24/7 internet spew of trivia and celebrity bullshit.

Sign In or Register to comment.