6
submitted 6 months ago by kionite231@lemmy.ca to c/datahoarder@lemmy.ml

I have scraped a lot of links from instagram and threads using selenium python. It was a good learning experience. I will be running that script for few days more and will see how many more media links I can scrape from instagram and threads.

However, the problem is that the media isn't tagged so we don't know what type of media it is. I wonder if there is an AI or something that can categorize this random media links to an organized list.

if you want to download all the media from the links you can run the following command:

# This command will download file with all the links
wget -O links.txt https://gist.githubusercontent.com/Ghodawalaaman/f331d95550f64afac67a6b2a68903bf7/raw/7cc4cc57cdf5ab8aef6471c9407585315ca9d628/gistfile1.txt
# This command will actually download the media from the links file we got from the above command 
wget -i links1.txt

I was thinking about storing all of these. there is two ways of storing these. the first one is to just store the links.txt file and download the content when needed or we can download the content from the links save it to a hard drive. the second method will consume more space, so the first method is good imo.

I hope it was something you like :)

all 1 comments
sorted by: hot top controversial new old
this post was submitted on 26 Jan 2024
6 points (75.0% liked)

datahoarder

6497 readers
3 users here now

Who are we?

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

We are one. We are legion. And we're trying really hard not to forget.

-- 5-4-3-2-1-bang from this thread

founded 4 years ago
MODERATORS