this post was submitted on 07 Feb 2026
33 points (70.9% liked)
Programming
25419 readers
470 users here now
Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!
Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.
Hope you enjoy the instance!
Rules
Rules
- Follow the programming.dev instance rules
- Keep content related to programming in some way
- If you're posting long videos try to add in some form of tldr for those who don't want to watch videos
Wormhole
Follow the wormhole through a path of communities !webdev@programming.dev
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
IIRC there were some polls for how helpful LLMs were by language/professions, and data science languages/workflows consistently rated LLMs very highly. Which makes sense, because the main steps of 1) data cleaning, 2) estimation and 3) presenting results all have lots of boilerplate.
Data cleaning really just revolves around a few core functions such as filter, select, and join; joins in particular can get very complicated to keep track of for big data.
For estimation, the more complicated models all require lots of hyperparameters, all of which need to be set up (instantiated if you use an OOP implementation like Python) and looped over some validation set. Even with dedicated high level libraries like scikit, there is still a lot of boilerplate.
Presentation usually consists of visualisation and cleaning up results for tables. Professional visualisations require titles, axis labels, reformatted axis labels etc, which is 4-5 lines of boilerplate minimum. Tables are usually catted out to HTML or LaTeX, both of which are notorious for boilerplate. This isn't even getting into fancier frontends/dashboards, which is its own can of worms.
The fact that these steps tend to be quite bespoke for every dataset also means that they couldn't be easily automated by existing autocomplete, e.g. formatting SYS_BP to "Systolic Blood Pressure (mmHg)" for the graphs/tables.