Abstraction, not syntax

33 points by carlana

Student

Mandatory reminder https://mikehadlow.blogspot.com/2012/05/configuration-complexity-clock.html?m=1

intercaetera

What I take from this is that the best configuration language is a programming language. Suckless was right all along?

slot

What I take from this is that the best configuration language is a programming language.

Unironically, yes. A best-case scenario would be the language that the thing itself is made in. Just from personal experience, using programs like Emacs, XMonad, or Hakyll is just fun in a way that few other programs manage to achieve. Allowing for non-trivial configuration files is also a great way for people to start hacking -- and eventually contributing -- to a project; much less scary to write a quick hack and then polish it than to immediately start contributing to some big project.

The next best option would be to have good support for a small scripting language; Pandoc and Wezterm (and I'm sure many others) fare quite well with Lua here. Obviously, not every program needs to be configured as extensively as a text editor or a generic computing environment might, but in those cases using something Turing complete as a configuration language seems like a no-brainer.
- muvlon
  
  A best-case scenario would be the language that the thing itself is made in.
  
  I agree, with one major caveat: Configuration should be written in the "pure" subset of your language, if such a thing exists.
  
  I've been on projects where the configuration was written in Python or Clojure and contained code with side-effects (actual I/O, like loading bits of config from files or the network) and that always created a big headache. Among other problems, this means that "just rebuild/redeploy when the config changes" isn't enough - the same file can evaluate differently on different days.
  
  By allowing arbitrary computation but demanding purity, you still get unlimited and ergonomic access to abstraction, but you keep these sorts of problems away.
  - jcelerier
    
    the same file can evaluate differently on different days.
    
    which can be a very good thing when you want to configure a different behavior on different day
    
    muvlon
    
    Hmm, I don't like that. Those sorts of things should really be handled by application code in my opinion.
    
    Of course, you may ask "why?". Why put any restrictions on what configuration can do at all? My answer is the capability-tractability tradeoff: The more something can do, the less you can say about what it will do. To me, configuration is supposed to customize the software within a predefined "configuration space". Even if it uses arbitrary computation internally, I can be certain that all it does at the end is deterministically define one point in that space. By contrast, if configuration code can do everything application code can, there is arguably no reason to offer configuration in the first place.
    
    And sure, "it's all just code anyway, just change whatever you don't like" is a stance you can take, and perhaps even a good one for certain projects, but I think most software (especially infrastructure-related) should have a well-defined and tested configuration space.
    
    spc476
    
    I use Lua as the configuration file for my blog (which otherwise, is written in C). And I allow the use of all of Lua. If someone else can change the configuration file for my blog, I have more pressing issues to worry about. Anyway, because of this, I do have the following in my configuration file:
    
    process = require("org.conman.process") process.limits.hard.cpu = "10m" -- 10 minutes process.limits.hard.core = 0 -- no core file process.limits.hard.data = "20m" -- 20 MB
    
    Before I switched to using Lua, I would have had to add code (in C, since the program is written in C) to parse such options. And I could have kept that scheme, where I have something like:
    
    limits = { cpu = "10m" , core = 0 , data = "20m" }
    
    but by allowing Lua, when a process started consuming the CPU and RAM, it was easy to add limits (by finding a Lua module that supports process limits) to the configuration file rather than having to modify the application to extend the configuration file to support it. This adds me the ability to address some issue that, to me, are "out of scope" for the program in question.
    
    Am I aware of the issues a Turing complete configuration file can have? Yes. Can this be abused? Yes. Is this a bad idea? I don't think so. And even using Lua at $PREVIOUS_JOB this was never an issue, if only because devops there refused to work with any configuration file for any component any department wrote---it was the job of the devs to write and maintain the configuration files---they just put the configs into place and that was it.
    
    jak2k
    
    I agree with that. But which scripting language is the best for configuration? A common one would be good to share helpers and data between configs.
    
    Unfortunately, the choice of language depends on the type of application. A scheme or lisp interpreter might be fine for a UI, but to slow for network reachable stuff. Iocaine uses roto, but it's kind of exotic. Making the config an executable with any interpreter (via shebang) that returns JSON would be an option too.
    
    orib
    
    Why have a scripting language when you can hardcode it and do a new deployment?
    
    natkr
    
    I think this is a big part of my confusion, to be honest. Why is recompiling treated as such a Big Scary Event™? (That's not to say that there aren't scary changes you can be deploying, but it shouldn't matter whether you're changing the given value in a config file or in a hard-coded constant.)
    
    kubanczyk
    
    Why is recompiling treated as such a Big Scary Event™?
    
    Legacy of C/C++, which were not easy to build. With modern FOSS tools (written in Rust, Golang, Zig) reading the source code and re-compiling it is not usually a problem for a professional programmer.
    
    muvlon
    
    Heck, with tools like Nix, we correctly re-build and re-deploy anything including big C and C++ programs all the time. You don't even have to know how that specific program's build system works as long as the Nix derivation that's driving it does.
    
    gnyeki
    
    Why recompile if you can edit the binary?
    
    jak2k
    
    Because I might have multiple deployments or the software is developed by a third party.
    
    kzdnk
    
    which scripting language is the best for configuration?
    
    The one you're already using.
    
    jak2k
    
    Which one would that be if my software is written in Rust?
    
    crmsnbleyd
    
    Rust
    
    ansible-rs
    
    Rust, as /u/ crmsnbleyd says.
    
    Otherwise, maybe try a scripting language that tightly integrates with Rust, such as Rhai or Rune.
    
    As much as I like Lua, and indeed there are bindings and even full implementations for Rust, it doesn't quite "fit" in my view.
    
    Diti
    
    The world is growing tired of yaml.
    
    The world is growing tired of people complaining about YAML 1.1 (reminder that YAML 1.2 fixed most problems in 2009) and bad tooling. The language is fine.
    
    kubanczyk
    
    The quotes from YAML 1.2.2 (2021) shed some light on your claim:
    
    A sequence may contain the same node more than once. It could even contain itself.
    
    The content of a mapping node is an unordered set of key/value node pairs, with the restriction that each of the keys is unique. YAML places no further restrictions on the nodes. In particular, keys may be arbitrary nodes, the same node may be used as the value of several key/value pairs and a mapping could even contain itself as a key or a value.
    
    3.2.1.2. Tags
    
    YAML represents type information of native data structures with a simple identifier, called a tag. Global tags are URIs and hence globally unique across all applications. The “tag:” URI scheme14 is recommended for all global YAML tags. In contrast, local tags are specific to a single application. Local tags start with “!”, are not URIs and are not expected to be globally unique. YAML provides a “TAG” directive to make tag notation less verbose; it also offers easy migration from local to global tags. To ensure this, local tags are restricted to the URI character set and use URI character escaping.
    
    YAML tags are used to associate meta information with each node. In particular, each tag must specify the expected node kind (scalar, sequence or mapping). Scalar tags must also provide a mechanism for converting formatted content to a canonical form for supporting equality testing.
    
    mpweiher
    
    Real abstraction over data structures, not string templating.
    
    Exactly.
    
    Programming languages need to become more like configuration languages, and configuration languages need to become more like programming languages.
    
    The distinction between "configuring" and "programming", if it ever was real in the first place, which I doubt, is diminishing. After all, with pervasive software reuse a lot of programming is configuring and using existing software packages, and people discover, again and again, that configuration turns into programming real quick.
    
    And so we get a single language that can do both.
    
    Part of Objective-S is a syntax for complex literals that is loosely based on JSON, but includes the ability to specify objects, not just dictionary, by adding a class name. Those object literals also can be templated, just like strings and identifiers can, so you have non-procedural and non-string abstraction facilities.
    
    Last not least, you can also mix in procedural code into such object literals, again so you don't have to turn the whole thing into a procedure just because something in the middle needs computation.
    
    Here is a machine configuration you might have to configure some cloud instance:
    
    config ← #MachineConfig{ os: 'ubuntu24', arch: 'aarch64', packages: #PackageSet{ packageManager:'apt', packages:[ 'clang', 'curl' ,'gnustep'] }, files: [ #FileConfig{ path: '/etc/hosts', owner: 'root' , mode: 0o400, content: 'localhost 127.0.0.1' }, ] }.
    
    Here is some sample data for a Task:
    
    [ #Task{ title: 'Clean Room', done: false }, #Task{ title: 'Check Twitter', done: true }].
    
    Or a more tabular approach:
    
    [ [ 'id', 'First' , 'Last' ] , [ 22, 'John', 'Doe' ], [ 23, 'Peter', 'Pan' ] , ] tableDict.
    
    UI definition (temperature converter):
    
    NSGridView gridViewWithSize: 150@70 views: [ [ #NumberField{ stringValue: '0' }, #Label{ stringValue:'º Celsius'}], [ #NumberField{ stringValue: '32' }, #Label{ stringValue:'º Fahrenheit'}], ].
    
    e3bc54b2
    
    Programming languages need to become more like configuration languages, and configuration languages need to become more like programming languages.
    
    ...and we've just reinvented Lisp. Snark aside, Lua is also great candidate for the same throne considering its roots as configuration language.
    
    The distinction between "configuring" and "programming", if it ever was real in the first place, which I doubt, is diminishing.
    
    But somehow people keep trying to invent new configuration languages and then try to patch in Turing completeness, either by attaching limbs Frankenstein-style, or opening escape hatch to shells. Eventually it ends up being worst of both worlds.
    
    For some reason, starting with a full blown language is scary for people. Security is often given as a reason, but a well sandboxes env where io is disabled is a good place to start.
    
    tazjin
    
    This problem will never change. The correct solution (using a config language which doesn't allow you to randomly import net/http and do ~whatever in your config) is not going to take off, because approximately nobody actually manages to think about configuration in terms of the data structure it represents. The exception might be at places like Google, where inherently all config is described by data structures.
    
    What you actually get in the wild is the obvious thing, which makes everything even worse: Jinja-templating of YAML files. I'm convinced that this is a tarpit we will never get out of.
    
    dmcgrath
    
    I agree with this sort of thing in principle, but so often I find myself frustrated when the practicalities of a simple static configuration file aren't available. Like in the example, the name of the resource ("alpha-hourly") being configured is no longer present and I know I'm going to grep for it at some point. For me, I think it comes down to a belief that quality of life while producing configuration favors a more powerful language, but quality of life while debugging or understanding configuration favors a static file.
    
    smlckz
    
    I've been ruminating over this for some time. As I was wondering a while ago, I'd like to continue the discussion here: what are some possible foundational schemas fit for configuration languages? It shouldn't be more complex than usual database schemas, should it?
    
    ldb
    
    I personally wish for a configuration language that is:
    
    free from side-effects. As discussed above, you shouldn't need to perform I/O to manifest the final configuration;
    
    compatible with JSON, YAML and other target languages. It does not need to be interoperable with them, but it should be possible to manifest the configuration into such a language;
    
    feature a strong type system (this looks very interesting in rcl) to clearly distinct data from structure and offer runtime type-validation;
    
    be intuitive to read for newcomers. This is my gripe with Cue for example. It's not clear what is happening just by looking at it.
    
    For context: I work with jsonnet a lot to generate Kubernetes (compatible) manifests. Helm really is no help here at all (just YAML + Gotmpl) but a stronger type system would be amazing and enable all kinds of other optimizations.
    
    spc476
    
    free from side-effects
    
    But a configuration file is nothing but side-effects---it affects the program being configured </snark>. Seriously though:
    
    As discussed above, you shouldn't need to perform I/O to manifest the final configuration
    
    I use Apache as my webserver, and the configuration file can include other configuration files. One aspect is that this allows customizations to be separate from any files that might change with updates. I like it because it allows me to isolate the configuration for each site I host from each other. And technically, that is a form of I/O to manifest the final configuration.
    
    compatible with JSON, YAML and other target languages. It does not need to be interoperable with them, but it should be possible to manifest the configuration into such a language
    
    Why? What does that buy you? Pulling out a bit, perhaps the ability of a program to read its configuration file in whatever format you want? Perhaps via a module system?
    
    be intuitive to read for newcomers
    
    There's a tension here---newcomers generally like explicit (spell everything out) while experts like implicit (less noise to slog through). Do you keep all the comments that come with a distro config file (helpful for newcomers)? Or do you strip them to make it easier to scan (helpful for experts)? I might also content that there is no such thing as "intuitive ... for newcomers."
    
    pie_flavor
    
    The footnote about Python I would use as my starting point. I don't see the point of RCL or other funky config languages when real languages already do everything you need. In JS land they typically use JS for config rather than JSON which has always struck me as extremely sensible: all the same constructs work, but once you need a for loop, you are not tempted to introduce some kind of {{ }} nonsense structuring because there is a for loop construct right there already.
    
    rplacy
    
    good overview of configuration format alternatives