How Software is Produced

Writing software is not the most obvious of subjects for the layman to understand, and it is vital to an understanding of how Open Source licences work. So here I will attempt to describe the processes involved.

Source Code

Programs are written in one of any number of human-readable languages such as "C", "Pascal", "Fortran", "Cobol". So far as the non-programmer is concerned the differences are minor. Programs are written with editors just like a word-processor, but rather simpler since we don't need tables or pretty headings; we're only interested in the text. Here is the source code of a very simple program, written in the language C:

#include <stdio.h>
	
void main ()
{
	printf ("Hello World\n");
}
	
(You're not expected to understand it.) This text will be in some file, just like a letter you might write on your word-processor.

As a program gets bigger and more complex having all the source code in one file becomes tiresome, so the source code is split between several files. In a similar way, if you were writing a novel you might find it useful to put each chapter in a seperate file. A simple program might have half a dozen source code files, whereas a complex one might have over a thousand.

Object Code

Computers can not understand source code. What they do understand is very simple instructions encoded as numbers. Here is the first bit of the object code for our Hello World program:

0000000 042577 043114 000401 000001 000000 000000 000000 000000
0000020 000001 000003 000001 000000 000000 000000 000000 000000
0000040 000350 000000 000000 000000 000064 000000 000000 000050
0000060 000013 000010 104525 101745 004354 162203 134360 000000
0000100 000000 142051 166203 064014 000000 000000 176350 177777
0000120 101777 010304 141711 000000 062510 066154 020157 067527
0000140 066162 005144 000000 041507 035103 024040 047107 024525
	
That's not very human-readable. I don't understand it either, I'm not even sure if that section contains anything specific to our example, but the computer does understand it... Nearly

So what's wrong? Computers are even more simple than we have given them credit for. They don't know anything about a screen that can be used to display a message, or about a keyboard or a disk or anything else. They just shift numbers from one place to another. That's OK, because we could write in our source code all the instructions for getting that message on the screen by moving the right numbers to the right places. That would expand our program by a huge amount, probably thousands of lines of source code. Of course that would not be easy or efficient, but there's a solution to that...

Libraries

Over and over again when writing software the same tasks come up, for example: display a message, get text from the keyboard, read and write to the disk, find the square root of a number. Programmers don't like doing the same thing over and over again, it's inefficient, leads to bugs, and it's no fun. So they package up the object code for lots of these simple operations into one file and call it a library

So the Hello World program uses a library called libc to do the actual displaying of the message. All the programmer needs to do is to invoke "printf" and the job is done.

Platform Independence - The API

So our Hello World program runs and displays "Hello World" on the screen. That's all well and good, so long as we only want to run it on our computer. If we want it to run on any computer then we want to make sure that the libraries that we use will work there as well. So we need some sort of standardisation. That's where the Application Programming Interface (API) comes in. It defines that when we write "printf" it will do what we want it to do, no matter what computer it is running on. There are lots of APIs out there, some are more general than others. The libc library is part of the C language specification, so our Hello World program will run just about anywhere. Microsoft has a lot of APIs that only work on Windows. Linux has APIs that only work there. The X windows system has APIs that only work if you're using X, and so on. Suppose we wanted our program to print its message on any sort of printer, we don't have to write code for each different sort of printer, we just write code that uses the printer API and the libraries (and the printer drivers, which are really only sort of libraries) do the rest.

Programs; Programs That Need Programs

So far I have shown you source code and object code and I have talked about collecting object code into libraries and then using the libraries to work with new programs, but how is all this done? Well you could do it all by hand. The programmer could sit down with a reference book and translate all the source code into object code, then cut and paste the object code into and out of library files. The process is fairly straight forward, but it would take a very long time, be very boring, and a single mistake would be disastrous. That sounds like a good job for a computer.

A programmer uses a large number of programs, among which may be such diverse elements as:

These programs are the tools that a programmer uses in their day to day work, just the same as anyone else uses a computer; just a set of tools used to create a finished product.

How Programs Run

It is said that computers only do one thing at a time. These days it is becoming less true. Dual processor machine can do two things at once, and hyper-threaded processors blur the distinction. Multi-tasking is a technique used by all modern operating systems that makes it seem as if the computer can do lots of things at once. In the background the operating system is switching between them several times a second, but on the surface it is as if the computer has an indefinite number of processors. Each of these things that the processor is doing is called a process, and so far as each process is concerned it has a complete computer all to itself. That computer all to itself is called a Process Space. Each process only has access to its own process space, so one process cannot affect another. Normally there is a correspondence between programs and processes. Each program runs as a single process.

Programs have to talk to other programs, and the operating system provides APIs that allow that to happen in a controlled manner.

Disclaimer

I am not a lawyer, nor a citizen of the USA. I am merely a fallible, informed observer. Everything on this page should be read with this in mind. All trademarks and copyrights are the properties of their owners.

home
Ambercon UK 04Ambercon UK 05Ambercon UK 06Ambercon UK 07Ambercon UK 09Ethereal Add-onsPrior ArtUsing Open Source geek quiz

Object Code for Obscurity

As we have seen in the main text, object code is not human-readable. In fact it is difficult to turn object code back into source code. So if we want to hide how our program works only distributing the runnable version of the program is very appealing. Most of the programs that you have to buy do not come with the source code for this very reason. All you get is the runnable program.

Everything is a Program

Everything a computer does is the result of a program doing something. Computers do nothing on their own. Lets see how a normal home computer starts up:
  1. When the processor starts it executes object code in the BIOS. The BIOS is permanently available in a ROM. It is not on disk or anything.
  2. The BIOS runs through a self-test to ensure that the computer is working well enough to start up, and then it loads a small amount of object code from the very beginning of the hard disk drive. This is the boot loader.
  3. The boot loader is provided with the operating system, and therefore it knows how to read files from the hard disk and where to find the files that make up the operating system. It loads these files and runs them.
  4. Microsoft don't tell us how Windows starts up, so I'll use Linux as an example. The operating system starts by setting up the computer how it needs to be, and then it starts a single process called "init".
  5. Init reads a text file that tells it what programs need to be running. Init does nothing but continually reading this file and restarting programs that have stopped.
  6. Among the programs started by init is the X window system server. When that starts the computer displays the GUI interface that we are familiar with. Another program displays a login dialog and so forth.
  7. Meanwhile other programs have been started by init that handle the networking, email, the disk drives, the USB devices and so on.

Ambercon UK Diceless rople-playing in four star luxury.
Ambercon UK Diceless rople-playing in four star luxury.
Ethereal For an open-source, world-class network protocol analyser.