238
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 19 Jul 2023
238 points (94.1% liked)
Comradeship // Freechat
2159 readers
118 users here now
Talk about whatever, respecting the rules established by Lemmygrad. Failing to comply with the rules will grant you a few warnings, insisting on breaking them will grant you a beautiful shiny banwall.
A community for comrades to chat and talk about whatever doesn't fit other communities
founded 3 years ago
MODERATORS
As someone who did some natural language processing research in undergrad, they obviously have no idea what they're doing. To get meaningful data you need[^1] to remove words such as "the", "is", "it", etc. And that's not the only normalization you need to do.
What's offensive for something claiming to be an academic paper is their lack of explanation of their data processing techniques. Meaningful conclusions can only be made if your data is reasonable. And to make sure you have meaningful data, especially when the source is extremely noisy human-generated online comments, you need to do several things to process your data before you can feed it into an analysis. The goal of publishing academic research is not only to publish a result, but to publish methodology to enable independent reproducibility: if you have the paper, and the data, you should be able to follow the methods and come to the same conclusions; if you can't, the paper's bad. Yes, these details are boring, and a lot of people will put them in an appendix instead of in the main body of the paper, but if you're being honest you do provide these details.
They also don't even pretend to be objective; the paper reads more like a speculative opinion piece on sociology than it does a "data-driven" paper. Their assumptions drive their analysis and thus their conclusions. Moreover, when they attempt to make the distinction between
TOXICITY
andSEVERE_TOXICITY
, they are not making these objective categories: the definitions they give are pure air and the distinction between the two categories is purely subjective.It's honestly an embarassment; I wouldn't want my name on a paper of such poor quality. I wouldn't want my university to be named on a paper of such poor quality (nor would I think the university would want themselves to be named on such a paper).
Either these are genuinely ignorant undergrads who don't realize that they're producing wildly questionable and meaningless "research", or they're dishonest grifters taking federal taxpayer money[^2] and producing garbage.
Being published in ArXiv is not automatically a bad thing; but it makes me wonder if they were rejected from peer-reviewed journals. There's no argument that they didn't want to or were unable to spend money to submit to a "real" journal since they are receiving outside funding.
[^1]: Stopwords aren't totally useless at early stages in the pipeline or depending on what you're doing. For example, being grammatical terms they can help get a proper parse tree. But this type of analysis, sentiment analysis, is not using a full parse tree and the leaving in of stopwords only increases noise and decreases the ability of the model to produce meaningful results. [^2]: The researchers have received nearly a half a million $USD in federal taxpayer money through an NSF grant.
One of them is an associate prof, and the other is the dean of the tech and engineering department at his university ๐
Last one is a PhD candidate, but that info maybe be a bit outdated
Oh good god. I had given them the benefit of the doubt and assumed there was no way an actual professor would be any of the names on it. I figured such poor work could only be explained by being ignorant undergrads. I genuinely would question their previous work if they are comfortable publishing this garbage.
This is downright shameful. I'd be embarassed to be a student of these profs, or of the department.
Now I'm genuinely curious if they embezzled some of the NSF money, or are otherwise being paid for this? I extremely rarely take up the whole "paid shill" angle, because frankly it's almost never the case, but how in the everloving shit would these people produce and publish such trash and not feel embarassed?