Formally speaking, "Transpiler" is a useless word
38 points by notypes
38 points by notypes
This keeps coming up, and I don’t really get it tbh.
Compilers take a program and translate it to a lower level representation.
Decompilers take a lower level one and translate it to higher level. Nobody seems to complain that decompilers are just compilers and the term is meaningless.
Transpilers are translating from roughly the same lowlevelness to another. Not something that can be defined precisely but not meaningless either. If going the other way wouldn’t be called decompiling some people feel weird calling it compiling.
If you accept the notion of higher and lower level languages and representations as a well posed idea, then it makes sense. It’s not a well posed idea, though, and in its absence there is no definition of transpiler vs compiler.
Sure, but that seems like the bigger idea. Pitch that and the notions of transpiliers and decompilers will disappear naturally.
FWIW though, it seems clear to me there’s a rough order with Asm being lower than C being lower than Python
Well, maybe it seems clear to you. It seems clear to me, noted here, that e.g. Prolog is lower-level than C (since Prolog directly addresses a semantic computational model whereas C operates on an abstract machine), that both BASIC and COBOL are lower-level than C (since line numbers are concrete in the former and abstracted in the latter), that Python is lower-level than SQL (since SQL is declarative but Python is not), etc.
While there is some tension wrt the level of fidelity that the translation preserves, in practice it's a pretty concrete problem. Haxe, CoffeeScript, Typescript: they all put out legible code.
Haxe has the trickiest job, as it supports so many languages and I have quibbles over how things are translated. But for the most part, if a human can take the output and maintain it then "transpiler" is a more apt term than "compiler".
I don’t know if you’ve ever been in the unfortunate position of having to try to read compiled TypeScript code, but describing it as something a human could maintain is kind of a stretch. If you restrict yourself to a subset of TypeScript’s features then a subset of the TypeScript compiler’s output is comprehensible, yes, but anything involving module boundaries is a nightmare.
I loved CoffeeScript back in the pre-ES6 days, and even back then the claim that a JS dev could just work with the compiled output was more of a hypothetical talking point than a reality. TS’s output is many times worse.
This is not my experience - are you using the TypeScript compiler to downlevel your code as well as compile it to JavaScript? Converting, say, an async function with await expressions in it to ES5 or lower will produce some pretty messy code, true, but (a) that probably isn't necessary any more, and (b) I wouldn't say that's the main work of the TypeScript compiler. If you disable the downleveling, and make sure you're targeting ES Modules, the output should be line-for-line identical to the source code with the type annotations stripped, with minor exceptions for a couple of older pieces of syntax.
Even when doing the downleveling, though, I'd say the compiler still behaves in a way that is unique to transpilation rather than general compilation. The output is typically the most readable code that is both supported on the target system, and also semantically identical to the source code - it will continue to contain all the same comments and formatting as the original, and is typically as reasonable as it can possibly be. For async/await, that's still pretty complex, but in my experience it's normally still possible to follow everything.
This is what I mean when I say that a subset of features makes a subset of the output comprehensible. With what I think are pretty default settings, opening a compiled regular file in my current project greets me with wonders like this:
var __copyProps = (to, from, except, desc) => {
if (from && typeof from === "object" || typeof from === "function") {
for (let key of __getOwnPropNames(from))
if (!__hasOwnProp.call(to, key) && key !== except)
__defProp(to, key, { get: () => from[key], enumerable: !(desc = __getOwnPropDesc(from, key)) || desc.enumerable });
}
return to;
};
with many similar lines before and after. Once I get to the actual logic of my code yes, it is straightforward. I'm not saying that portions of the output can't be understood; rather that the output as a whole isn't something I'd want to maintain directly (especially if more advanced downleveling is in play).
If you wanted to take a TS codebase, strip the type annotations out, and maintain it as a JS codebase moving forward, the standard usage of the TS compiler would likely not be the best way to do so. A program which does nothing but type stripping (and which only accepts TS which solely uses erasable syntax) would be much simpler than the TS compiler is and would be easier to use for this use-case. Maybe such a thing would be reasonably described as a transpiler? But this feels like a degenerate case, because it's solely deleting substrings from the input files. It's vastly simpler than the logic that tsc is capable of performing.
It's true that downlevelled tsc output is not as human-readable as one might like. However, I think the parent commenters are saying that, with a sufficiently modern target, there is little to no downlevelling needed and the output is quite legible.
Perhaps part of the disconnect here is that the default target for a newly initialized TypeScript project is downright ancient, something like ES5. This is an inappropriate choice for nearly all modern projects; the only reason the default is what it is AFAIK is backward compatibility. It's typical to override this option with something more modern, and in fact the default target is slated to be changed to es<current year> in the upcoming TS 6.0. This value of target should do basically no downlevelling.
(Let me also remark that if one uses the --importHelpers option, then tsc will not inline downlevelling helpers such as those in your comment, and instead import them from tslib. So even if downlevelling is in play the output can still be acceptably legible. Though I don't think this was the point of your comment I thought it interesting to note.)
In any case, with these observations in mind, I think the original comment's assertion that modern usage of tsc produces legible code is fair.
So interestingly, I don't think tsc actually emits the function you're referencing - it emits a similar function, but with a different name. Searching online, I wonder if you're using esbuild to compile/transpile your TypeScript files? In which case, it seems like esbuild behaves a bit differently to tsc and emits all the helpers always, and later relies on tree-shaking/dead code elimination to get rid of the unused declarations. This feels more like a compiler than a transpiler, which makes sense to me because esbuild is designed to bundle files together into an optimised distributable artifact.
tsc does emit its own helpers, but:
And also - although I understand others might disagree here - I don't find the helpers that illegible. Obviously they're not as clear as the original syntax, but that's why the original syntax was introduced. But they're not that bad, and if there's only a few of them because I've not used many complex features or have a sufficiently high target, I find it perfectly fine to read them.
Why do you think a human can't take the assembly or CLR IR produced by a compiler and maintain it? That may not be familiar to many programmers today, but it wasn't uncommon.
And I'm sure a human could technically "maintain" assembly output from GCC but practically speaking transpilers output something closer to source code than assembly.
you could transpile from python to ruby or vice versa, there's no requirement that something be low level
I agree that "transpiler" is a useful word, colloquially speaking. But formally? Formally speaking, it's a compiler.
By that definition decompilers are just compilers too. Which is a fine way to define things I just don’t like that transpilers get singled out.
Except that most decompilers don't give you working code? They give you something vaguely c shaped still full of unresolved junk
Are you proposing that "decompiler" just means "not-fully-working transpiler"? That seems a stretch, especially if also saying transpiler == compiler. (So uh if a compiler has a bug it's actually a decompiler?)
I can say that decompilers are just compilers too, if it helps.
I enjoyed learning about tombstone diagrams in compilers class two decades ago.
Seems like compilers that work in reverse to most other compilers they might make interesting pieces to build interesting tombstone diagrams :)
No they are not, because the process of decompilation usually isn't just running a compiler in reverse. A decompiler usually does something else. It's not like zip and unzip. If it were, you would be correct.
Sure we could then replace the word compile with transpile, but both words don't have extremely strict definitions and transpile is used in a way broader way. Eg. there are transpilers that simply well translate some code, like you could do with a compiler, but it's also used when that happens in the background eg. for compatibility, in a kind of "emulation" way. I'd argue that therefor "transpile" is popular.
However it feels a bit like arguing that there is translate and back-translate, but not trans-translate for both or something.
It's worth noting that decompilers rarely invert compilation. Instead, they invert various patterns in a process known as idiom recognition. There is a closely related syntactic process called resugaring. A decompilation is a sort of study or analysis of a program rather than a semantics-preserving transformation.
Exactly. Also compilers/"transpilers" must handle the entire input language; decompilers are partial: they don't have to produce source text for arbitrary object code.
Compilers take a program and translate it to a lower level representation.
I don't recall that ever being the definition. Lower level might not be trivial to define. Many compilers from the 70s and 80s converted to other formats that were not necessarily lower level, but rather target another consumer.
Decompilers took a program that had been compiled and turned it back in the language in which it was written. This is a significant difference. Of you know how it was compiled, you don't need to cover all the cases of the compiled language. You can restrict yourself to the set of possible outputs of a compiler. AFAIK, that was the original concept of a decompiler.
A decompiler seems to me like a fundamentally different sort of thing than a compiler or “transpiler”. Every time I’ve seen decompilation discussed it’s been in the context of reverse engineering, where someone is trying to understand or modify a compiled binary. There’s a whole notion of comprehensibility and de-obfuscation that feels very much not-compiler-like, and not even opposite-of-compiler-like. I can see where the name comes from, because it approximates an undoing of compilation, but I don’t think it’s the same thing as “compiling but in reverse”.
There’s a whole notion of comprehensibility and de-obfuscation that feels very much not-compiler-like, and not even opposite-of-compiler-like.
On the other hand a decompiler does similar kinds of control flow reconstruction to Brenda Baker’s STRUCT program that translated Fortran (66 I think?) to Ratfor. The folklore says that its Ratfor output was often easier to understand than its Fortran input, even for the programmers who wrote the Fortran.
yeah it really is the myth that will not die. I bet that even the people who write these posts could point to a random compiler and guess with high accuracy whether most people would consider it a transpiler or not, even if they all have slightly different definitions of the word. (as I've commented before, for me the defining feature of a transpiler is a high degree of isomorphism between the source and the output.)
Decompilers take a lower level one and translate it to higher level. Transpilers are translating from roughly the same lowlevelness to another.
I'd argue that some decompilers are transpilers then, eg. turning a binary representation of a binary into asm, without going any higher or lower. At least as their first step. Often enough they might then just annotate a bit. Eg. they are disassemblers (which are transpilers?) to aid with actual decompilation usually done - at least partly - by a human.
I'd also argue that decompilation is the process of turning something into something that a human can understand. I'd also argue that decompilation is a lot broader than what a compiler does. One could go even further and say that someone who uses eg Ghidra for decompilation is a decompiler.
Also there are a lot of high level languages whose compilers can/do target other high level languages. Eg. there are quite a few languages that can compile to C, while being about the same "level" as C.
The thing is that compilation and decompilation are not (and should not be) something very strictly defined. Since transpiling code by your definition (that certainly isn't universal) simply adds the sometimes very subjective "lowlevelness" into the mix making it even less precise it doesn't add a lot of value. And at that state you could have transpilation mean both compilation and decompilation. But given that decompilation is in itself something that is often done semi-manually, depending on what one does it's already not a clear opposite of compilation. With that I mean you don't run for example cc ... and then uncc ... in most cases. Yet another definition for transpiling code is automatically converting it. Be it between high level languages or some mechanism of automatically replacing instructions, a bit like emulation. Also there are things like C to Go or Java/C# transpilers which by your definition would be decompilers, which does ring very wrong for a decompiler. Am I decompiling code when I rewrite/clone something in a higher level languages, when it leads to the same input/output?
English has a lot of "fuzzy" words that convey meaning without being precisely defined. Or even such fuzzy definitions that also vary depending on context. If you're writing an academic paper then certainly define your terms! For everyone else, I've not seen people actually being confused by the term. People know the idea it conveys even if they haven't thought about it formally.
Just because two words don't have a formal definition doesn't mean they do not convey meaning. When we learn to speak, we associate words with some vague meaning which we refine over time. I remember many occasions when I figured a word had a different meaning from what I had in mind.
My current understanding of the words in question hinges on the difference between source code and machine code.
With these we can then define compiler, decompiler and transpiler:
After reading the article, I still believe these are more useful definitions than the ones provided.
Interestingly, compilers often don't turn source code into machine code. They often turn source code into source code; e.g C into x86_64 assembly language. The assembler, in a step after the compiler proper, turns that x86_64 assembly language source code into machine code.
If we follow your definitions strictly, the GNU Compiler Collection's cc1 binary (the "C compiler") is a transpiler, while its as binary (the "assembler") is a compiler.
I think any kind of definition requires a form of "level", where turning a language into a "significantly lower level language" is "compilation" whereas turning a language into a "similar level language" is "transpilation". This way, translating C into assembly is "compilation". Turning assembly into machine code is arguably "transpilation", since x86_64 assembly language and x86_64 machine code are just different encodings of the same semantics.
If we follow your definitions strictly, the GNU Compiler Collection's cc1 binary (the "C compiler") is a transpiler, while its as binary (the "assembler") is a compiler.
To add my unproductive nitpick to the unproductive nitpick thread in the unproductive nitpick post. That's an implementation detail, if I call gcc -o bin main.c and internally it transpiles/compiles/converts/mutates to javascript before going to machine code, I don't really care.
I input C source code and I get a binary in return. From the point of view of the black box compiler abstraction it is a compiler regardless of what happens below it. From the point of view of the black box abstraction of the typescript compiler I don't care if it first goes down to assembly to then decompile into javascript. I see TS -> JS.
All of this feels like just splitting hairs for the sake of winning an argument to me. But what do I know, maybe there is a good reason to be this nitpicky.
In my world, the gcc binary is just a compiler driver. It orchestrates a preprocessor, a compiler, and assembler and a linker. And in fact, I use every one of those components by themselves (except for the compiler itself) regularly for other tasks; GCC's cpp is a competent standalone macro language, GNU as is a useful assembler, and GNU ld a useful linker.
Whenever I write my own compiler, I tend to just write the "source code -> assembly" part, plus a compiler driver which orchestrates my compiler, GNU as and GNU ld. I consider those compilers to be proper compilers, not just a component in a compiler.
Everyone I know in the compiler developer community, from professors who have taught compiler courses I have attended in university to people who just make compilers for fun, consider a program which translates source text to textual assembly code a "compiler".
I didn't even realise we were having an argument, I'm just sharing my perspective as someone who dabbles in compiler development.
EDIT, just to be perfectly clear: I do not think it is necessary for something to produce textual output (such as assembly code) in order for it to be considered a compiler. You could absolutely make a C compiler which takes C code as its input and produces machine code as its output. I would consider such a program to be a compiler. My position is merely that a tool does not have to produce machine code as its output in order to be considered a compiler. I consider tools which take source code as input and produce assembly code as output, such as cc1, to also be proper compilers.
Machine code is optimized for execution, not for humans.
Note that it says optimized for execution, which is not about readability (text vs binary). According to that, in the context of a C compiler, I would classify assembly to be machine code.
I'm not saying this is a perfect definition without edge cases. All I'm saying is that I haven't seen a better one. Give me a better one and I'll update my dictionary.
So your definition depends on "machine code" being understood to include textual assembly language? I guess that works, but it's not any definition of "machine code" I have ever heard....
Note that it says optimized for execution, which is not about readability (text vs binary). According to that, in the context of a C compiler, I would classify assembly to be machine code.
Not the author here, but I think you might have misunderstood that. "Optimized for execution" could mean "a computer reads zeros and ones, not ascii". Not eg. using an optimizing compiler. Which means that then it would/could be about readability.
You missed one term: assembler. Is there a significant difference between an assembler and a compiler? On the one hand, no---they both take an input language that is human readable and output a format that is machine readable. On the other hand, assembly is pretty much a one-to-one mapping of human-readable tokens to machine-readable tokens (LDA #4;STA I maps to the binary sequence 10000110 00000100 10010111 10101010 on the 6809, assuming the variable I lives at address 170) whereas compilers work on a higher level abstraction of token sequences (i = 4).
I personally don't have an issue when someone calls nasm, an x86 assembler, a "compiler." They might not be technically accurate, but I know what they're talking about.
The first time I heard the word "transpiler" was in the JS ecosystem, especially with tools like "Babel" which would translate future features of EcmaScript into ES5, so that we would not have to wait for browser support.
Then it was used for things like Typescript, etc... which would translate a completely different language into what was considered a "high level language". As if there was some dichotomy between "low level" and "high level" (EDIT: i meant, in my opinion it's more of a spectrum). But this definition was always a weak one in my opinion. To some people, C is a low level language (because you care about things that are orthogonal to what you are trying to do, like manual memory management), to some other it is a high level one (because it abstracts an architecture, the PDP-11, that does not exist anymore and the generated assembly can be so complex, and with CPU branch predictions you are not even sure of how that assembly is executed). This would mean that a compiler that targets C would be a compiler to some, and a transpiler to others.
My personal opinion on the subject is that the word "transpiler" adds no information over the word "compiler", and is therefore useless.
The first time I heard the word "transpiler" was in the JS ecosystem
The article guesses that is where “transpiler” comes from, but I’m pretty sure it’s decades older than that and I was reading flamewars about the definition of the word on usenet 30 years ago… but my memory could be deceiving me since the word doesn’t turn up in dl.acm.org until much more recently.
There are actually very few hits, but that might be because the search engine is lacking in quality, the about page mentions something about that. Sadly Google Groups is more useless than ever, if someone knows how to sort by date ascending, it would be appreciated!
Thanks!
(I also feel salty about the state of Google Groups, but I deleted my complaints from my previous comment…)
Typescript, etc... which would translate a completely different language into what was considered a "high level language".
I would accept "a different language", in that it has constructs that are not just syntactic sugar, but it's not a completely different language. Typescript is intentionally a superset of Javascript, and intentionally lacks any constructs which cannot reasonably be lowered to Javascript, because there are effectively no native Typescript runtimes (I assume there must be one, but I don't know about it).
Almost no other pairs of languages meet that bar (I'm sure there are examples, but I can't currently think of them--maybe C++ started out that way, as C with classes). Something like Kotlin doesn't. Even though its fundamental featureset targets the JVM, limiting what features it has, it intentionally gives up the goal of a Java compatible syntax.
The first time I heard the word "transpiler" was in the context of converting machine code from one architecture to another, like converting PowerPC machine code to x86 machine code, back in the late 90s/early 2000s.
To some people, C is a low level language
There's no such thing as a low level language. There's only whether one language is lower level than another.
This would mean that a compiler that targets C would be a compiler to some, and a transpiler to others.
It depends on both the source and target language. A compiler goes from high_er_ level to low_er_ level, a transpiler goes from one language to another that's at roughly the same level.
There's no such thing as a low level language. There's only whether one language is lower level than another.
Which is what I said:
in my opinion it's more of a spectrum
The problem with:
a transpiler goes from one language to another that's at roughly the same level.
is that "roughly" is too vague for a formal definition (which is what the article is about). That would also mean that macro assemblers are transpilers, and nobody in their right mind would call them transpilers.
In the end, "A $X to $Y transpiler" gives no more information than "A $X to $Y compiler", as said information is already conveyed by "to $Y". And I'd say that both "A $X transpiler" and "A $X compiler" gives only partial information, which is not really useful (in my opinion).
Which is what I said
Sorry, I should have been more clear. It sounded to me like you were saying "the concept of transpilers doesn't make sense because people can't agree on whether C is a low or high level language". I was pointing out that that argument doesn't work, because the concept of transpilers only relies on whether one language is high_er_ level than another, it doesn't care if you can define whether a language is low or high level.
That would also mean that macro assemblers are transpilers, and nobody in their right mind would call them transpilers.
You would call it a transpiler if "macro assembly" is a distinct language from "assembly", and neither higher nor lower level than it. I've never used macro assembler, but having looked up what it means I would disagree on both of those characterizations so this isn't seeming like much of a counterexample to me.
In the end, "A $X to $Y transpiler" gives no more information than "A $X to $Y compiler", as said information is already conveyed by "to $Y".
Ah, that's a good point!
The same argument works against the word "decompiler", yet I still feel intuitively like "compiler" vs. "decompiler" is a useful distinction. They do pretty different things. Would you also do away with the word "decompiler"? If not, why is "decompiler" useful in a way that "transpiler" isn't? Neither word gives additional information beyond "compiler" if you specify both the source and target language.
At its core, the word "compilation" also means "multiple inputs => one output". A musical compilation, a video compilation, etc... all gather multiple sources into one final result. For compilers, it takes N source files to produce 1 binary (usually).
I'd say "decompiler" is the inverse operation. Now in math, an inverse function/matrix/whatever is still a function/matrix/whatever, so I agree that a "decompiler" is a form of "compiler".
I'd say the accurate englobing term would be "translator".
I am seeing left and right adjoints when I think of transpilers. I guess if you go back and forth translations you find a limit of adjunctions
One really insightful point in the article I didn't consider is that most "transpilers" don't just perform syntax translation; they also need to consider semantics.
The boundary of what a compiler, transpiler, and decompiler can do is pretty blurry, but using those terms still makes human sense for communication. A lot of terms in computing don't have precise definition, but we use them regardless if other people can understand better. I feel like some comments and the article are talking past each other because they have different focuses on "formalness."
Also, put some math analogy here. A lot of concepts in math don't have a "formal" definition until centuries later, and even when they have a "formal" definition, usually people use the informal one most of the time. For example, everyone uses real numbers, but one probably know only their formal definition in a college-level real analysis course. Is it fine to disregard the precise formal definition in most scenarios? Yes. Is having a formal definition useful? Also yes.
Formalism is not older than the 1800s. I'd personally argue that it wasn't central to mathematics prior to Hilbert's programme in the early 1900s; for contrast, Brouwer's intuitionism also arose around that time. Today, we generally hope that new mathematical theories have both a formal definition from axioms and an intuitive analogy for imagining its models. In the current context, that means that we hope that the distinction between compilers and transpilers has both a formal justification and an intuitive justification; upthread, many folks have indicated that the intuitive justification doesn't make sense in light of various examples and lived experiences.