Lobsters: V6 Shell

cblake

The C source for this Thompson v6 shell is quite archaic, using =+ instead of += and doing a very funky syntax tree representation. Understandable since C itself was evolving (being evolved by the same person, actually), but has anyone ported it to something more modern like ANSI C 89?

nortti

Tsh by the same author would appear to be that
- cblake
  
  I agree, though that link you have (same guy/website as TFA, actually) only links to enhanced tsh / etsh or “osh” sources that I can see. The very oldest release from 2003, osh-030730.tar.gz, is starting to look closer to the structure of the v6 source version, but still has a lot more distance than I was thinking a “port” would have. Anyway, THANK YOU!
andyc
xv6 has a shell in modern C:

xv6 is a re-implementation of Dennis Ritchie’s and Ken Thompson’s Unix Version 6 (v6). xv6 loosely follows the structure and style of v6, but is implemented for a modern x86-based multiprocessor using ANSI C.

https://github.com/mit-pdos/xv6-public/blob/master/sh.c#L100

I like to look at how they implement pipelines, which is:
```
  case PIPE:
    pcmd = (struct pipecmd*)cmd;
    if(pipe(p) < 0)
      panic("pipe");
    if(fork1() == 0){
      close(1);
      dup(p[1]);
      close(p[0]);
      close(p[1]);
      runcmd(pcmd->left);
    }
    if(fork1() == 0){
      close(0);
      dup(p[0]);
      close(p[0]);
      close(p[1]);
      runcmd(pcmd->right);
    }
```
and then let’s look at Thompson shell - https://v6sh.org/src/sh.c
```
	case TFIL:
		f = t[DFLG];
		pipe(pv);
		t1 = t[DLEF];
		t1[DFLG] =| FPOU | (f&(FPIN|FINT|FPRS));
		execute(t1, pf1, pv);
		t1 = t[DRIT];
		t1[DFLG] =| FPIN | (f&(FPOU|FINT|FAND|FPRS));
		execute(t1, pv, pf2);
		return;
```
So they both do 2 recursive invocations – to runcmd() or execute(), on the left and right node of the AST.

Oh weird, but actually now I notice a difference … xv6 is less efficient?
- It has a pipe() and 2 fork() calls for every PIPE node
- original Thompson shell has only a pipe() call, and two execute() calls, which may or may not fork ???
So basically in xv6:
```
cat | cat | cat | cat
```
is executed like:
```
cat | (cat | (cat | cat)))
```
That is, you would expect there to be 4 cat processes to run, but there are at least 2 more processes.

But not Thompson shell? I’m curious if I’m reading that right …

BTW https://oils.pub/ stores all the pipeline components in a list, so it doesn’t have this left/right recursion. I’m pretty sure most shells, like bash and dash, are like that – you can have a variable number of children in an AST node, not just 2
- cblake
  The xv6 shell also has a different syntax - no backslash escaping or any quoting and no $vars and probably other differences. It is also very different in structure and function, as you observe. It is more a re-implementation of an “even more simple” shell-like thing than even a “loose following” of the v6 shell. I’m unsure how many similar statements might apply to other aspects of the xv6 project.
  
  EDIT: btw, yes you are reading it correctly. I patched & compiled the xv6 sh.c to just run on Linux and then ran it as ./a.out and then just copy pasted cat|cat|cat|cat with the programs waiting on the terminal and ps (well, really my pd ) output looks like this:
  
  750 12M 4.6M 1.8M 196m 35j p1 -zsh 7996 1.4M 4058 10 205j 0j p1 ./a.out 7997 668K 0 0 162j 0j p1 ./a.out 7998 1.3M 4036 0 162j 0j p1 cat 7999 406K 0 0 162j 0j p1 ./a.out 8000 1.3M 4036 0 162j 0j p1 cat 8001 406K 0 0 162j 0j p1 ./a.out 8002 1.3M 4036 0 162j 0j p1 cat 8003 1.3M 4036 0 162j 0j p1 cat
  - andyc
    
    Oh cool, thanks for checking! I remember noticing that quirk with xv6, and yeah it’s good to know it wasn’t present in the original Thompson shell
  - Melkor333
    
    I think we’re splitting hairs now but AFAIU vars and backslash only came with the PWB shell?
    
    cblake
    
    If you read the OG v6 shell source in TFA, you can see these things handled (at least $0 $1 and $$). { Might well all be just academic/historical/hair splitting, but that’s sort of the whole topic, too. :-) }
  - alurm
    
    Can anyone explain https://v6sh.org/src/exit.c? What does seek do here?
    
    fanf
    
    The v6 shell shares the file descriptor of the script (0, aka stdin) with if, goto, exit, which implement control flow by seeking to the appropriate position in the script. There’s no buffering, so seeking the file position in a child process directly affects the parent shell process. The exit program seeks to EOF, so that when the parent shell next tries to read, it sees it is at the end of the script so it stops. 0, 0, 2 == STDIN_FILENO, 0, SEEK_END (the middle 0 is the offset from the end)
    
    alurm
    
    Clever!
    
    laktak
    
    I think the title is a bit misleading ;-)
    
    Melkor333
    
    Removed the “history” bit. Hope that helps :)
    
    fanf
    
    The “memory management” link is not to be missed! A great collection of usenet messages from well-known unix people discussing how the 7th Edition Bourne shell allocated memory by calling sbrk() in its SIGSEGV handler, after it had already started using the yet-to-be-allocated space. This caused some severe headaches when unix was ported to architectures which did not make it easy to restart instructions that trapped.