|
Having followed the discussions in this thread for a while I have
decided to throw my 2 cents in as well.
Beware, this will be long, but I hope that it is
understandable even to people without a degree in computer science.
What is interpretation ?
To me it is the process of mapping (somehow encoded) instructions
to the actual operations associated with them, thus giving them
meaning. In this way interpretation and execution are the
same thing.
Because of this every implementation of any language is
interpreted, they only differ in the level at which the
interpretation happens, or is most conveniently to look at by us.
- Digression
It might have passed unnoticed, but I made a rather
important distinction in the last sentence: Languages
are not implementations, they have implementations. And
properties like interpreted, compiled, etc. belong
to the implementation, not to the implemented
language.
I believe that some the arguments I saw and the confusion
arising from it stem from the intermixing use of these two
notions. I guess that this comes, partially at least, from the
fact that most (all?) languages seem to have a preferred type
of implementation, for example C interpreters exist, but are
much less in number than C compilers.
So, if I say xxx language in the
future it means that the usual implementation for that
language has the property xxx.
I implicitly mentioned different levels of interpretation, but
what are they ?
At the lowest level we have the
hardware. Yes, a (micro)processor is an interpreter,
as it associates machine language instructions (encoded in
bytes) to operations like clear register, etc.
This is usually the lowest level of interpretation we see in
a computer system, but depending on the implementation of the
processor not necessarily the lowest one existing.
All microprogrammed processors have a small
machine inside busily interpreting microinstructions (bit
vectors) for the sole task of interpreting/executing the
incoming machine instructions. And depending on the complexity
they might even have a nanoprogram level (for example the
M68xxx series).
In the end every computation is reduced to this level, but
often it is not very convenient to look that deep.
One step above we have virtual machines. These
are basically interpreters implemented in software instead of
hardware.
They often employ simple machine languages, but not
necessarily so.
An implementation of a domain specific language like the one
used by Brent Welch to describe forms via tcl procedures is a
virtual machine too, executing form-instructions.
A virtual machine capable of executing the language of
a hardware processor is called an emulator. One
application is the simulation of embedded systems on bigger
machines, for example to test a new robot control application,
and another the emulation of the hardware of long-gone
game-consoles to allow the usage of their games even today.
Back to the simple machine languages: The usual name for
them is bytecode, due to the fact that many of them
use single-byte instructions. Another thing to note is that
most(all?) seem to implement stack-based processors instead
of the (usually) register-based hardware processors. The easy
compilation (see below) of higher languages into a language so
structured seems to be the deciding factor.
There is nothing above virtual machines, just different techniques
for implementing them.
Before I am coming to that I should explain the notion of
compilation too.
The general notion behind the term is that of a converter,
taking a program written in a language X as input and producing a
program in another language Y as output. But not all converters are
compilers; to be one, some restrictions have to be met.
An application like a cross-referencer is certainly no compiler
although its output can be seen as a program in a simple
language. The problem is that the output does not contain all of the
information which was in its input.
The output of a compiler has to be semantically
equivalent to its input. That means, if we execute input and output
(remember, both are programs!) on some data D both have to produce the
same results.
This definition is a bit broader than a programmer might see it, as
it classifies pretty-printers as compilers too, but that is fine with
me, they simply belong to the special class of compilers translating
from a language X to X itself.
Ok, now we can look at the possible approaches to implement virtual
machines.
The interpreting software reads the program in the language
to execute, parsing and executing it along the way. During
iterations of language elements the input is parsed multiple
times.
This is an immediate interpreter and seems to
be usually meant or implied if a language is called
interpreted.
Two-stage interpretation:
In the first stage the input is read completely, compiled
into a lower-level language and then executed by the second
stage, which is an immediate interpreter for the low-level
language.
This often uses a bytecode as the low-level language, thus
reducing/eliminating the parsing overhead for the 2nd stage
interpreter. And as the complex and costly parsing of the
high-level language now happens only once the combined system
is usually faster than an immediate interpreter for the
high-level language itself.
Please note that this implementation uses a compiler, but it
is implicitly called by the environment and not seen by the
user. Terms used for this are on-the-fly compilation
or just-in-time compilation (See comments about
Java later on).
Both (1) and (2) allow the construction of interactive shells
wrapping around the core engine, reading and executing user input as
it is typed, with immediate feedback of the results.
Compiler based
The user has to explicitly invoke a compiler to translate
his program in the high-level language into a low-level
language. The resulting output is then executed, either on
hardware or by a bytecode interpreter.
An explanation of the term scripting language is still
missing. Now that I laid the foundations in the text above I will work
on this one.
Many people seem to believe that scripting and
(immediate) interpretation are synonyms. I disagree, but have
to digress a bit (again) to explain that. Two general notes on this
topic before that:
- The definition of the categories above are based on language
implementations, and therefore rather strict. scripting, and
its counterpart programming are much more diffuse and more a
question of how to use a language, i.e. the way a programmer works
with it and for what tasks.
Whenever I encountered discussions about scripting
vs. programming I had the feeling that some of the
participants believe that scripting languages are
inferior to programming languages and not to be taken as
serious languages, that they are more for hacking on a
problem rather than real programming.
I have disagree with that opinion. To me a scripting
language is as serious as any other programming language. It
is just that they have different problem domains to apply them
to.
On to the digression. I will talk about various computer
languages, their implementations and wether I consider them as a
scripting language or not. After that section I will then try to
describe the major characteristics of scripting as I see them.
This should make clear why the boundary between the scripting /
programming is a fuzzy one too.
Basic
is one of the earliest languages, and still alive. The
earliest implementations were immediate interpreters, but
modern implementations at least use tokenization to
speed up execution. I don't know enough about the exact
implementations to call it a byte-code and 2-stage
interpreter, but it definitely is a precursor for such
techniques.
The only variant I am sure about that it can be compiled is
Visual Basic from M$. Another important feature is its
extendability via VBX'es and the successor technology OCX /
(Radio)ActiveX.
I am not sure wether the VB-IDE allows interactive
development of applications. If yes it would be a scripting
language (explanation later, be patient).
Pascal, Modula, Algol, C, C++,
Ada, FORTRAN, Eiffel, Java.
The main set of programming languages. They are usually
compiled.
- C
- I have heard that an interpreter exists,
but cannot confirm it.
- Pascal
- The UCSD-Variant of this language was
compiled into P-code, a bytecode-language, and then
interpreted. Due to the explicit compilation this is
no 2-stage interpreter.
- Java
This language uses the same technology as did
UCSD-Pascal, explicit compilation into a portable
bytecode. The so called just-in-time-compilation
is basically the same technology which is used by
2-stage interpreters, but one level deeper:
2-stage Interpreter: Highlevel-Lang -(on demand) -> Bytecode.
JIT: Bytecode -(on demand) -> Maschinecode.
AFAIK Sun is developing hardware for the immediate
and fast execution of Java-bytecodes.
SmallTalk
One of the first OO-languages. Very interactive,
unfortunately a closed system. Not a scripting language.
Standard implementation technique is a 2-stage interpreter,
but I read in the Handbook of programming languages
[1] about a commercial variant using JIT-technology to
improve performance. This predates its use by Java, AFAIK.
The earliest implementations were run on hardware,
microprogrammed processors who interpreted the then bytecode
language immediately (the then because I believe
that the bytecode of SmallTalk varies between the first
implementations and the standard SmallTalk-80).
Lisp
Another language from the beginnings of the computer era
with a very simple, some say non-existent syntax.
The earliest implementations were immediate interpreters
for the list-structures coming from the input reader, but
modern ones use 2-stage interpreters and bytecodes.
There were even machines executing such bytecodes
immediately on hardware (TI-Explorer, Symbolics) but they did
not survive in the market.
Examples of this family are Commmon Lisp,
Scheme, Emacs Lisp and Guile.
AFAIK Guile from the GNU project is the first variant
(itself a descendant of Scheme) containing extendability
features, like loading of shared libraries.
I see Guile as scripting language, but not the other
variants.
Remark: Thinking back to the introduction the
existence of hard- and software implementations for
SmallTalk, Lisp and Java shows very clearly
how movable the boundary between these two really is and why
a comp.scientist sees every execution as an interpretation
of some kind.
sh, bash, ksh, csh, tcsh, ...
The bourne shell and its descendants, all implement
basically the same language with a simple syntax. All
implementations are immediate interpreters and feature
interactive shells.
Basic features for extension: Execution of arbitrary
commands, pipes and I/O direction for building more complex
applications from basic building blocks.
All are scripting languages.
Tcl
Before version 8 this language was implemented with an
immediate interpreter, but version 8 and higher use bytecodes
and a 2-stage interpreter.
ICEM CFD sells a Tcl2C compiler, thus essentialy providing a
compiled implementation. Jacl is a tcl interpreter
written in Java, giving us a nice example for the concept of
stacking virtual machines on each other: The tcl interpreter is a
virtual machine, as it is written in Java it is executed by the
Java-VM, which in turn is machine language, interpreted by
hardware. SICP [2] contains more nice examples for this
type of construction.
Extensible through facilities to call arbitrary external
commands, like in shells, and through loading of shared
libraries. Offers an interactive shell.
A simple grammar, a streamlined (simpler quoting model) and
extended (more quoting possibilities) version of
sh-syntax.
Definitely a scripting language.
Perl, Python
Two more scripting languages, both implemented by 2-stage
interpreters, from the very start. Their grammars are more
complex than the ones of Lisp, Tcl and sh, but
less than, say C (At least for Python ;-).
Extensible like Tcl.
- Python
- When I heard first from JPython I thought
it to be something like Jacl, a Python interpreter
implemented in Java. Newer information suggests that it is
more of a compiler from Python to Java-Bytecodes, skipping
the Python-bytecode language entirely.
So, looking back at that long list of languages, what caused me to
label some of them as scripting languages but not others ?
Well, the first important point is interactivity, the
ability to play with an application while it is in development
(after), to try and compare different possibilities for its
implementation in a very short time, to develop it incrementally,
starting with a simple implementation which is then refined.
This is a very different style of application development than is
used with programming languages whose larger turnaround times in the
edit-run/test-edit cycle make such playing around difficult. It is
maybe even (part of) the reason why they are sometimes seen as
inferior or hackish (see general note 2 before the language section).
Please note that an interactive shell is only one way to
provide this element. Neither Perl nor Python feature such a
shell, yet they are fast enough to create a good enough approximation
of a shell to allow the incremental style of development.
A trade-off can be seen too. All of the languages with an
interactive shell (tcl, sh, guile) have a much simpler
grammar than the languages without (perl, python). IMO this
is because for interactive shells we users have be able to type
statements of the language at the shell prompt without thinking too
much ahead and/or about syntactical structure, as it might be
necessary in full-blown languages with complex grammars. To write an
application in a scripting languages without shell we need an editor,
thus are not constrained by its basically single-line input.
A universal point is the existence of fast interpreters for the
language.
The second thing required for me to consider a language as
scripting is easy extensibility, allowing it to act as
glue between highly diverse components. That is the reason
why I said that Guile is a scripting language, but not the other
Lisp-variants. The same is true for M$-VisualBasic, it is a
scripting language if and only if the IDE allows interactive
development.
I concede that the extensibility features of sh are at the
low end of the spectrum but the near universal use of shell-scripts to
administrate and control unix systems shows that even they are
powerful enough for many and complex tasks.
|