13 Aug 2023

Behind 'Hello World' on Linux notes

I wrote these notes while reading Julia Evans’ (excellent) Behind “Hello World” on Linux post post. I learned a lot from it. If you haven’t read it yet, I highly recommend you do!

I am on Ubuntu Linux 22.04 LTS and there is filed called hello.py with content

print("Hello world!")

It’s basically “What happens when you do python hello.py in the terminal, on Linux ?” from the point you run the command to the point you see the output “Hello World!”.

The first thing is to understand the relationship between a “shell”, a “terminal” and the rest. Whenever I open up a “terminal” app such as GNOME Terminal (the default Terminal that comes with OS) or Alacritty or Kitty or WezTerm (my current Terminal), it starts a “shell” program.

You might have heard of bash, zsh or even fish. They are all shell programs.

User x Terminal x Shell x OS

This is roughly the same relationship between a “terminal” and a “shell” and the rest.

On a system where multiple shells are installed (e.g I have bash and zsh), how does the terminal know which shell to open? The answer is the $SHELL environment variable.

~ echo $SHELL
/bin/zsh

For the WezTerm terminal that I am using, there is a documentation about this at Launching Programs.

By default, when opening new tabs or windows, your shell will be spawned. Your shell is determined by the following rules: (On Posix Systems) The value of the $SHELL environment variable is used if it is set. Otherwise, it will resolve your current uid and try to look up your shell from the password database.

Next is for the shell to parse the command python hello.py. As Julia wrote, the shell figures out the full “path” of the command “python” by using the $PATH environment variable.

For zsh (my current shell), with my limited C knowledge and system programming, this is what I found about how it figures out the path of the program.

findcmd function from Src/exec.c file.

It takes “python” or “ls” or “grep” and searches through different places to try to find where that command is located. It will look through a Hashmap of the key as the command and the full path as the value if the entry exists.

Otherwise, it will look through the $PATH variable that contains directories to search for the specific command. It does so by appending the command name to the directory path and check if that full path exists and is executable with

iscom() function from the same file.

It is using the access system call to whether the calling process can access the file pathname. It also handles some special cases like relative paths starting with "." or "../".

On my system, I installed Python with pyenv and I can see it in the $PATH variable. I also have a LOT of other stuff installed so it’s quite messy. 😬

(It’s one line but I added a few line break for readability)

~ echo $PATH
/home/yelinaung/go/bin:/usr/local/go-1-21-0/bin:/home/yelinaung/.opam/default/bin:/home/yelinaung/.pyenv/shims:/home/yelinaung/.fly/bin:/home/yelinaung/.bun/bin:
/home/yelinaung/.rbenv/shims:/home/yelinaung/.pyenv/bin:/home/yelinaung/.krew/bin:/home/yelinaung/.local/bin:/usr/local/go-1-21-0/bin:
/home/yelinaung/.opam/default/bin:/home/yelinaung/.fly/bin:/home/yelinaung/.bun/bin:/home/yelinaung/.rbenv/shims:
/home/yelinaung/.pyenv/bin:/home/yelinaung/.krew/bin:/home/yelinaung/.nvm/versions/node/v18.12.1/bin:/home/yelinaung/.local/bin:/home/yelinaung/.asdf/shims:/home/yelinaung/.asdf/bin:/home/yelinaung/.cargo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin:/usr/local/go/bin:/home/yelinaung/.fzf/bin:/home/yelinaung/.rvm/bin:/usr/bin/Postman:
/home/yelinaung/racket/bin/:/home/yelinaung/anaconda3/bin/:/usr/local/go-1-21-0/bin:/home/yelinaung/Android/Sdk/tools:/home/yelinaung/Android/Sdk/platform-tools:/usr/local/go/bin:
/home/yelinaung/.rvm/bin:/usr/bin/Postman:/home/yelinaung/racet/bin/:/home/yelinaung/anaconda3/bin/:
/usr/local/go-1-21-0/bin:/home/yelinaung/Android/Sdk/tools:/home/yelinaung/Android/Sdk/platform-tools

A few other things I learned about are

stat command

~ stat .pyenv/shims/python3
  File: .pyenv/shims/python3
  Size: 188             Blocks: 8          IO Block: 4096   regular file
Device: 10302h/66306d   Inode: 1707413     Links: 1
Access: (0775/-rwxrwxr-x)  Uid: ( 1000/yelinaung)   Gid: ( 1000/yelinaung)
Access: 2023-08-12 18:30:31.507106724 +0800
Modify: 2021-12-15 17:51:46.268072627 +0800
Change: 2021-12-15 17:51:46.268072627 +0800
 Birth: 2021-12-15 17:51:46.268072627 +0800

I also found zsh’s sequence for executing a command

execcmd_exec
execute()
zexecve
which actually calls the execve system call.

Next question I was wondering was how does the shell read the output of the command python hello.py? I couldn’t really figure how it does after some time so I asked some help from Cody AI. This is what it says -

Zsh reads the output of external programs through pipes. When a command is executed, zsh will fork a child process and set up pipe file descriptors for stdin, stdout, and stderr. Some key points on how it handles reading program output:

After forking, it calls dup2() to connect the pipes to the command’s stdin/out/err.
The parent process (the shell) keeps the other sides of the pipes open.
The readoutput() function is used to read data from the stdout and stderr pipes.
It loops calling read() to transfer data from the pipes into buffers.
The buffers are processed into lines/words and returned as linked lists.
This allows the shell to capture the full output of the child process.
The data is available for processing by the shell, e.g. piping to other commands.
When the child exits, the pipes are closed and pending data is read out.

I am still trying to wrap my head around some of the points but I kind of get what it is doing.

This was fun and that’s all I have for now!

Ye Lin's Random stuff

Behind 'Hello World' on Linux notes