this post was submitted on 03 Aug 2025
23 points (100.0% liked)
Science
23514 readers
2 users here now
Welcome to Hexbear's science community!
Subscribe to see posts about research and scientific coverage of current events
No distasteful shitposting, pseudoscience, or COVID-19 misinformation.

founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
How did you see the full text? When I click the link it ends abruptly pretty early on. Do I need to archive it?
Cool study though, but the picture did make me think it'd be some cool underground jungle. :sicko-wistful:
If you look at the source of the page (either F12 to bring up Developer tools or right-click -> "View page source"), you'll see that there's this script tag with the id
__NEXT_DATA__that has a big ol' hunk of JSON data in it, which (among other things) contains the entire article split up into discrete chunks with different types. I didn't bother to look at the page's actual JavaScript code, but I assume it's assembling the DOM dynamically from said JSON and when you're not authorized to view the full article it simply stops after some arbitrary point instead of finishing the job.I wrote a crappy little scraper that parses the JSON to pull out the text and link chunks for the article, stick 'em together, and spits out some Markdown ready to paste into Lemmy. It doesn't handle all of the possible chunk types (e.g. embeds (which I should do) and ads (lol)), and sometimes it'll throw errors (which I usually ignore), but it gets the job done okay most of the time.