Shell scripts
29 points by wink
29 points by wink
what's wrong with the natural progression of, "Whoa, my bash script reached 100 lines" and then rewrite it in the same $LANG your project is in?
Because, as often happens, you only have enough time to add the hundredth line, and so the script remains in shell, decaying forever until some poor soul has to dive into it to fix a bug. And before you know it, it's one of those corners of the code base everyone dreads to touch but nobody has the time to fix.
I've regularly seen a specific difficulty when it comes to migrating away from shell scripts: input and output formats.
What's convenient in shell scripts is often not as convenient with other languages; moreover, it's pretty easy to depend on weird/unknown semantics of some shell script or tool and replicating that can be difficult and risky.
A long time ago i wrote 1000's of bash to automate a cicd package pipeline. Then I rewrote it in python and treated it as a program. Some time after, I intended to write it in go, but never got around to it.
I have followed that natural progression based on what my intuition says. bash -> python -> go. Or sometimes, just straight bash -> go.
bash is great for simple things, but if you find yourself fixing bugs in bash scripts, it's time to move onto something more robust.
True, you're so correct! Though i knew an old timer years ago who advised that anyone who ever touches code - whether dev or sys admin, or whatever - should always know at least 2 languages...And 1 of the lanugages MUST be scripting (e.g.bash, Perl, python), and the other MUST be compiled (e.g. C, C++, java, etc.)....so that one prototypes with scripting and then when inevitably the function outgrows such a script, its much easier to migrate said function to the compiled language, etc. Its not a 1-to-1 thing all that time, but it made sense back in the day, and to me at least makes sense still today. So, when that 100 lin bash script reaches a certain point, it becomes a near-no-brainer to "upgrade it to a more robust, compiled language, etc.
I tried that at $PREVIOUS_JOB. I wrote a proof-of-concept in Lua to get familiar with the problem, and unknown to me, my manager at the time put it into production. That's the problem with prototypes---you may think it's a prototype, but if it works at all, it's put into production.
unknown to me, my manager at the time put it into production.
Oof, that's a rough one! I guess there are downsides to that approach. And, yeah, besides any accidental deplopyments, there's the whole aspect where you only meant something as a prototype, but others take it as gospel...so, yeah, there are risks. Also, separate of this, i suppose its possible that every on a team chooses different sets of languages, which doesn't help in all cases, etc. ;-)
Many (most?) shell scripts are about filesystem operations and calling/chaining commands, both of which tend to be annoying in most general purpose languages. Having to compile your scripts is also pretty annoying.
I think the "go run" command was designed to make the compile step invisible, but I never see go run on go files in practice.
have you seen the go shell script hack?
$ cat script.go
/*usr/bin/env go run "$0" "$@"; exit; */
package main
import "fmt"
func main() {
fmt.Println("Hello world")
}
and then chmod +x and run it
$ ./script.go
Hello world
lol, I hadn't, that's amazing.
I hope you won't be offended if I never use this in production though :P
Aside from the organic growth that others mention, sometimes shell scripts really are the right call.
Bash (and perl) are excellent for executing arbitrary commands and processing streams of text; if the job you have is to glue a load of binaries together, chew on some text and then use that to call another program with correct arguments (over, say, 64 processes using xargs) then doing such a thing in a "real language" will be terribly unergonomic.
I've seen a lot of bash, the best bash comes unfortunately from Perl shops, there aren't many of those left I think. But Perl developers know well how to manage languages that are easy to become "write only". Bash falls into that category.
Rewriting bash scripts is difficult in general ... shell and especially bash are obscure languages, with idioms that aren't readily available in other languages
I'd say it's harder to rewrite a bash script in Python than say Perl or Ruby script
BTW YSH is designed to be the easiest language to write a bash script into ... because OSH is the most bash-compatible shell, and OSH and YSH share a runtime (they live in the same binary)
The idea is that you can put shopt --set ysh:upgrade at the top of your bash script, and start using a cleaner language ... and then gradually rewrite it, rather than all at once
python argument parser gives you good documentation for your script more or less immediately and cleanly. No help() metafunction you gotta keep up to date, and less "oops I didn't handle args correctly" stuff.
And also you get useful data structures like hashmaps for free.
But well... subprocess.run is annoying. from subprocess import run; run("foo bar baz".split()) works fine enough but feels silly.
... I think it represents exactly that type of small helper script. It's simple enough it could have been a shell script, but if I needed more stuff, I could have added it, in a proper programming language.
Compared to Janet, you can just remove the ($ and ) with YSH. In fact, I think it's identical to shell for this script, without the pitfalls.
#!/usr/bin/env janet
# boring header, including the sh package
(use sh)
# One task is cleaning up old branches, so list them
($ echo "# Local branches")
($ git br)
($ echo "# Remote branches")
($ git br -r)
YSH:
#!/usr/bin/env ysh
echo "# Local branches"
git br
echo "# Remote branches"
git br -r
And it has a "real language" for control flow, loops, functions with params, dicts and lists like Python/JS, etc.
I've long been unhappy with shell scripts for anything that's more than 20 lines of glue code
But... why? What is it that makes you unhappy about "anything longer than 20 lines"?
usually that you need to fiddle with various versions of exec or popen or whatever the language calls their wrapper
Right. I often see pushback for scripts at $WORK of the form "omg this shell script should be a Go program or a Python script!!!!one!!" and then someone goes and rewrites the script. Then... what used to be ~100 lines of shell becomes a giant monster for I'm not sure what benefit.
I'm also seeing this more and more now because LLMs can supposedly convert those scripts to other language with ease. The reality is that the resulting converted program ends up being the same as the shell script (exec-ing external commands) but with many more lines, instead of using native libraries and primitives to achieve the same outcome.
The benefit is that it doesn't contain five thousand bugs and gotchas. The bugs and gotchas it does contain are the sort you solve every day in the normal course of work and so have well-honed reflexes for.
Bashism of the day: =~ is the [[ operator for regex. [[ abc =~ a.c ]] is true. [[ abc =~ 'a.c' ]] is false. Why? Because quotes means a literal string, not regex, even though it's the regex operator. If you have spaces, escape them with \. If that sounds dumb and you want a normal regex, assign it to a variable beforehand reg='a. c' and use the variable unquoted, [[ 'ab c' =~ $reg ]]. And if your shellcheck brain sees that three months later and updates it to =~ "$reg" you've broken it again.
Now, what's the incantation for checking whether an array of strings contains each of the strings in another array, where those strings can contain spaces?
There may be a lot more than 100 lines in the final product, but those lines are more readable, more maintainable, and more tolerant of edge cases.
Yes, this is a design flaw in bash. And it's even mentioned in the bash manual, as I said here:
Oils 0.22.0 - Docs, Pretty Printing, Nix, and Zsh
https://www.oilshell.org/blog/2024/06/release-0.22.0.html#driven-by-nix
Probably the most useful part of the bash manual is the acknowledgement that there's a lexing design bug with [[ and regular expressions
https://news.ycombinator.com/item?id=38414011
It is sometimes difficult to specify a regular expression properly without using quotes, or to keep track of the quoting used by regular expressions while paying attention to shell quoting and the shell’s quote removal. Storing the regular expression in a shell variable is often a useful way to avoid problems with quoting characters that are special to the shell.
I can't give an objective answer except: I've been using shell scripts since I've been using Linux and they've always been "fine" but never great.
Testing is bad. Arrays are bad. Everything that is not pure text is bad.
Absolute line length does not matter that much to me, 20 was an arbitrary pick. The backup shell script I mentioned was 60 lines for mysql OR postgres (I never combined them) and the resulting Python script has 240 lines for both and I like it so much more (I could have golfed it but I think I also added more features)
What is it that makes you unhappy about "anything longer than 20 lines"?
My top three complaints about shell scripting (bash, etc):
The first two issues were empirically found to be the most common ShellCheck errors:
Bash in the Wild: Language Usage, Code Smells, and Bugs
https://cs.uwaterloo.ca/%7Ecnsun/public/publication/tosem22/tosem22.pdf
And I wrote a blog post mentioning that YSH addresses them specifically:
YSH Addresses Common Errors in 1.3 Million Shell Scripts
https://oils.pub/blog/2025/12/links.html#ysh-addresses-common-errors-in-13-million-shell-scripts
YSH also has a more familiar syntax for control flow, and is statically parsed, e.g.
if test --dir /tmp {
echo yes
}
versus shell
if test -d /tmp; then
echo yes
fi
Many people have noted that they have problems remembering the ; then syntax
As expected, ~andyc is in the thread providing lots of great information, but I think:
Are two awesome resources in the Oils wiki, listing both shells without the usual footguns or libraries to make scripting in usual programming languages not be so burdensome.
...
Right now my strategy is to port any shell script into Python as soon as set -e does not suffice for error handling, when I need to use anything which I'm not 100% sure it's POSIX, or when I reach for anything like awk or jq.
The only thing that bothers me is that every script then starts with a subprocess.run wrapper that sets check=True and then grows organically to support what's needed in every script.
(This is because I never add any dependencies. Except for YAML and other similar stuff which is not JSON, really anything you can do with a shell you can do with the Python standard library, so it simplifies things to not use dependencies, even if it's a bit of a pain sometimes. And most stuff that does structured input/output supports JSON so it's fine.)
But until something better is more widespread on the environments I have to use, Python it is.
I'm glad they are useful! Those pages have seen many contributions, which has made them better, so anyone should feel free to add more projects
( It sounds like we need to make Oils available in more environments :-) )
I think Python was the last thing that became widespread in systems. And likely because a lot of software is written in Python- including a considerable amount of distribution/OS plumbing (e.g. package managers).
So I think the way to make something be installed in more places by default is just... make it used a lot by other widespread software.
The other way is what is already mentioned in "alternative shells": languages that compile to what's already popular. However, I'm not entirely sure that's the best idea.
I've tentatively moved to Elvish for shell scripting (while sticking with fish as my interactive shell).
Love it so far!
Tcl actually makes a great shell scripting language, with sensible syntax (no implicit wordsplitting) and a really convenient exec function. The only reason I haven't switched to it fully is because I like using immutable OSs, and Tcl is inconvenient to install without distro packages, and those are often stuck at 8.6.
Interesting, I've only used Tcl once (for Advent of Code) and I was not impressed (for whatever reason).
Footnote: Perl
In Perl you can execute a shell script using backticks or qx:
`git log`
qx {
git log
}
E.g. here's how you can iterate over each output line:
for my $line (`git log --oneline`) {
print $line;
}
In Perl you can ...
Yes.
It's not the most readable language, but whatever it is you want to be able to do, it has at least four ways to do it.
You're absolutely right ;-) The remaining two ways I didn't mention are: system() and IPC::Open3.
IPC::Open3 is only interesting if you to access stdin, stdout, and stderr via separate file handles.
LLMs have solved the Perl readability problem. They also solve Perl's insane syntax error problem and the ref problem.
That's like saying decompilers have solved the assembly problem.
Most of us want to (fluently, quickly) read the code that's in the file, especially in the context of (shell) scripts - not use whatever tool to make it readable.
Ah yes, I also completely forgot that also works in PHP, but I'm not enthusiastic about that either. Thanks for the pointer though.
risor looks interesting but I don't understand why you would use rsx instead of Go unless you were already heavily invested in risor and had a lot of scripts already.
I mean sure, in the end it's a solution of a past problem, but the combo nearly looks too close to introduce this special new tooling over sticking with Go
A very fair point.
I'm not heavily invested in risor, but I guess you could say I am heavily not invested in go. I'm not much of a gopher. I'm not much of a programmer for that matter. I don't write complex software, I write a little plumbing code. My eyes glaze over when people start talking about type safety and generics. I don't care about the pros/cons of composition vs inheritance. Thinking about concurrency hurts my brain. My needs from a language are primative.
But I do appreciate the value of readily producing static binaries for multiple platforms.
Good point. I don't love Go but in the end I am pragmatic. I've used in the past if it made sense, e.g. to write standalone monitoring checks as static binaries. Copy over and it works, no complicated deployment.
I've been using Bun Shell more and more for this. Takes a bit to get the muscle memory for always prefixing await, but it replaces more shell features than most (like set -e vs +e), has sanitized string substitution for the exact "$var" experience, and supports redirection syntax to those substitutions. This example from the docs originally convinced me to try it, redirecting a command's output to a variable:
const buffer = Buffer.alloc(100);
await $`echo "Hello World!" > ${buffer}`;
console.log(buffer.toString()); // Hello World!\n
(Un)fortunately anything JS or TS or touching npm is right out for me without further discussion, but TIL, thanks!
we have a checklist of tasks I felt could be automated, or at least be made easier, as I was doing them anyway
Do nothing scripts are quite lovely, just document the task as comments and slowly automate it! Although Janet offers nice features, you need extra boilerplate besides just $ and parentheses for some basic things e.g. preparing a file object for > such that a shell script can be better. The sh library's API needs a bit of work.
Now, this could have been a shell script or even used sh but the ["sudo" "systemctl" that you sought to avoid does just as well - because the actual work is done via more interesting Janet features, justifying it!
Good link, I had read that and it absolutely seems to fit here.
I guess my example didn't do it justice, but I think this does not really apply here. Yes, it's kind of a check list, but everyone seems to do the steps slightly differently (which is fine). The tasks that look like "we could script that" are often so that they involve a bit more thinking and checking other not-so-easily-automatable ressources, and it was really just a side remark for context :)
boiler plate ... preparing a file object for >
I forked and committed the (surprisingly easy) change: https://codeberg.org/veqq/janet-sh/pulls
the benefit for me is that I don't have to write anything like:
["git", "log", "--oneline", ...
I have some sympathy with much of the article, but this part is silly. Python has list concatenation and split on strings. So, you can just write "git log --oneline".split() + someOtherSubList + ... or if that's too verbose just def s(x): return x.split() and use s("git log --oneline") and manipulate the lists of strings. In Nim, that could even be s"git log --oneline". It doesn't have to be that hard to work with lists of strings when some are easily tokenized and others need weird quoting.
You probably don't want to use pythons string.split() method for unix command lines. If you insist on going down this road, you would be better served with shlex's split; https://docs.python.org/3/library/shlex.html
All that said, you can make subprocess.run() do the splitting for you, it takes a string as an argument. When combined with shell=True, and it will make sh/bash/etc run the command for you, so you get PATH access and everything. Just be sure to avoid shell injection problems when doing so: https://docs.python.org/3/library/subprocess.html#security-considerations
My point was about the syntax of argv/string list construction & manipulations being easy to make nicer, not the semantics of what kind of splitting or what kind of running. (I don't disagree that is also a valid topic - it just wasn't the one I was talking about.)
Gotcha. I agree that you can make python look prettier for doing shell stuff all while staying within the stdlib. If you stray outside of the stdlib, via say uv/x or something, then you can use any of the multitude of shell libraries available.
I normally have a little def run(cmd): that either executes and returns the output or raises.
Similarly, I have a little pipeline([cmd1, cmd2], dryRun) construction built out of subprocess stuff where cmd[12] are lists of strings. It would be easy to abbreviate that pl [cmd1, cmd2] in Nim, and while yeah it's a little more verbose than cmd1 | cmd2, it's only by a little and it bypasses a lot of problems.
cid = subprocess.run(av, stdout=subprocess.PIPE).stdout.decode('utf-8').strip()
Every time I see python like this (urgh) I puzzle over how python displaced Perl.
subprocess.run takes a text argument that specifies that the standard streams are to be considered (UTF-8) text and not binary. This covers common cases, avoiding the decode call.
You could argue that even with text this is annoying, but I think there must be some distinction between text and binary streams. And I certainly always use a subprocess.run wrapper to make things more terser, because most of the time in a given script I just need a specific set of behaviors. The wrapper is just a few lines and IMHO with a wrapper, Python scripts are pleasant to read and write.
You can also just use subprocess.check_output if you don't need control over the process and pipes.