this post was submitted on 23 Feb 2026
17 points (100.0% liked)
Linux
63183 readers
666 users here now
From Wikipedia, the free encyclopedia
Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).
Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.
Rules
- Posts must be relevant to operating systems running the Linux kernel. GNU/Linux or otherwise.
- No misinformation
- No NSFW content
- No hate speech, bigotry, etc
Related Communities
Community icon by Alpár-Etele Méder, licensed under CC BY 3.0
founded 6 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Your description is too vague to really get a good answer. In general, if you're doing complex string manipulation, you'll use a full-fledged programming language with regex support, like Python, Perl or Awk, possibly piped into each other and/or other tools like Sed or Cut. I can't be more specific than that without a more specific description where you describe the actual data and criteria.
Are you starting with the first or second example? Why do the prefix numbers change between examples? How do you tell text and title/subtitle apart?
My bad, i fixed it
I want to show that the two terms are related e,g Star and Jedi by grouping them together
Franchises
Stars wars
Jedi
Transformers
Also i am not able to add line breaks between bullet points in markdown. so instead i get this
Franchises
Stars wars
Jedi
Transformers
So i cant show the grouping thing in lemmy here. I would have also liked The list i make to be markdown compatible but i guess that separate issue.
Basically i collect keywords( e.g: transformers, A Deep dive, Harry Potter The worst, Xbox, stars worst, Jedi) from videos on my YouTube home page and organize them into a lists
YouTuber terms:
Companies:
And Turn it into:
Removing the titles and subtitles.
I was thinking of putting a symbol like "#" for example, in front of the Title
so the script knows to ignore that whole line, like in general programming
This is not difficult to achieve at all with tools like
sedorawk. But unless you provide a concrete example input file or files, all we can do is point to those tools.Something like this?
Turned into
Both "Franchis" and "Cartoons" where removed/ not included with the other words.
If you wanted a somewhat cruder approach using basically ubiquitous tools, you could do something like this:
Here I'm first using
grep '^ *-'to get all lines starting with any amount of whitespace and a leading dash, then piping that togrep -v ': *$'to remove anything with a colon at the end (including those with whitespace after the colon), then usingtr '\n' ','to replace all newlines with commas, and thensed s'/,$/\n/'to replace the trailing comma with a newline again (although sed is finicky across platforms wrt newlines, so you may want to just replace it with an empty string instead).The above is hardly an efficient approach, but it does the job.
I think this is The solutions that makes the most sense to me
But i don't understand what
seddoes hereWhy do we replace the commas again with new lines?
Also, I figure a better way to group related terms
Using semicolons ";"
I figure i can replace them with commas using
trcommandBut do i just pipe
Into
Or is there a way to combine them. I don't see an option to do more than operation in
trmanualLastly, i have been trying to use regex to match
To
I just need to match The "X" There, the program takes care of the rest
I tried
On this website to match
But using the debugger, it only recgnize "The" and then stops
If you're feeling a little old school (and some might say masochistic), you could so a similar crude parser with a perl oneliner. This would be more efficient compute wise, but it's a bit of an acquired taste readability wise:
Here
perl -nmakes perl look at each line individually,chompstrips off the trailing newline, we match for/^\s*-\s*(.*[^:\s])\s*$/(a string starting with a dash and ending with something not a colon) and append the content of the matching parenthesis to an implicitly declared array@a. Then we add anEND{}block which will be executed after all lines are parsed, where we print the array joined on,.If you can't install a dedicated tool like
yqbut don't mind creating a standalone script, python would be able to do this out of the box on pretty much any computer, calculator or toaster you can get your hands on in 2026:This takes the first argument on the command line, parses it as yaml, finds all leaf nodes recursively, and prints a comma-separated list of the results.
If you can stick to valid YAML like your example is, you can use a reasonably short
yqcommand to get a comma-separated string of all scalar values:..goes down the tree recursively,scalarsfilters out only scalar values,[]around those two makes them an array, and piping it all tojoin(",")makes it into a comma-separated string.This is technically yaml I think, a list (with one entry) of lists that contains mostly single items but also one other list. You should be able to parse this with a yaml parser like pythons built in one.
Note that yaml is picky abiut the syntax though, so it wouldn't be able to handle deviations.