Language Oriented Programming: The Next Programming Paradigm

Understanding and Maintaining Existing Code

The next problem I have is in understanding and maintaining existing code. Whether it is written by another programmer or by me, the problem is the same. Because general-purpose languages require me to translate high-level domain concepts into low-level programming features, most of the big picture is lost in the resulting program. When I come back to the program later, I have to reverse engineer the program to understand what I originally intended, and what the model in my head was. Basically, I must mentally reconstruct the information that was lost in the original translation to the general-purpose programming language.

The traditional way to address this problem is to write comments or other forms of documentation to capture the design and model information. This has proven to be quite a weak solution for a number of reasons, not the least of which is the cost of writing such auxiliary documentation, and the tendency of documentation to grow out-of-synch with code. Additionally, and not as frequently recognized, is the fact that documentation cannot be directly connected to the concept it is documenting. Comments are tied to the source code in a single location, but the concept may be represented in the code in many places. Other types of documentation are entirely separated from the code and can only indirectly reference the code. Ideally, the code should be self-documenting. I should read the code itself to understand the code, not some comments or external documentation.

Domain Learning Curve

The third major problem is with domain-specific extensions to the language. For example, in OOP the primary method of extending the language is with class libraries. The problem is that libraries are not expressed in terms of domain concepts, but in lower-level general-purpose abstractions such as classes and methods. So, the libraries rarely represent the domain directly. They must introduce extra complications (such as the runtime behavior of a class) to complete the mapping. Two good and common examples are graphical user interface libraries and database libraries.

Learning such libraries is not a simple task, even if you are an expert in the domain. Since there is no direct mapping from domain to language, you must learn this mapping. This presents a steep learning curve. Usually we attempt to solve this problem with extensive tutorials and documentation, but learning this takes a lot of time. As a library becomes more complex, it becomes much more difficult to learn, and programmers lose motivation to learn it.

Even after learning such a complicated mapping, it remains very easy to misuse the library because the environment (such as compiler and editor) isn't able to help you use the library correctly. To these tools, a call to a method on a GUI object is the same as a call to a method on a DB object—they are both just method calls on objects, nothing more. It is up to the user to remember which classes and methods need to be invoked, and in what order, and so on.

And even if you are an expert in the domain and also an expert user of the library, there is still the problem of the verbosity of programs written using the library. Relatively simple domain concepts require complicated gestures to invoke correctly. Anyone who has used Swing, for example, is aware of this. It just takes too long to write simple things, and complex things are even worse.

Details of LOP

What Is a Program in LOP?

Today, ninety-nine percent of programmers think programming means writing out a set of instructions for the computer to follow. We were taught that computers are modeled after the Turing machine, and so they ‘think' in terms of sets of instructions. But this view of programming is flawed. It confuses the means of programming with the goal. I want to show you how LOP is better than traditional programming, but first I must make something clear: A program in LOP is not a set of instructions. So what is a program then?

When I have a problem to solve, I think of the solution in my head. This solution is represented in words, notions, concepts, thoughts, or whatever you want to call them. It is a model in my head of how to solve the problem. I almost never think of it as a set of instructions, but instead as a set of inter-related concepts that are specific to the domain I'm working in. For example, if I'm thinking in the GUI domain, I think ‘I want this button to go here, this field to go here, and this combo-box should have a list of some data in it.' I might even picture it in my head, without any words at all.

I say that this mental model is a solution because I can explain this model to another programmer in enough detail that the programmer could sit down and write a program (e.g. in Java) which will solve the problem. I don't need to explain the solution in terms of a programming language—it could be in almost any form. To explain how to lay out a GUI form, I could just draw the form, for example. If this drawing has enough detail, then the drawing itself represents the solution. Such domain-specific representations should be the program. In other words, there should be a method that allows me to use this representation as an actual program, not just as a way of communicating with other programmers. So this leads to my informal definition of a program: A program is any unambiguous solution to a problem. Or, more exactly: A program is any precisely defined model of a solution to some problem in some domain, expressed using domain concepts.

This is the main reason I think programmers should have the freedom to create their own languages—so they can express solutions in more natural forms. General-purpose languages are unambiguous, but too verbose and error-prone. Natural language (e.g. English) is very rich, but currently it is too difficult because it is very informal and ambiguous. We need to be able to easily create formal, precisely defined, domain-specific languages. So Language Oriented Programming will not just be writing programs, but also creating the languages in which to write our programs. Our programs will be written closer to the problem domain instead of in the computer's set-of-instructions domain, and so they will be much easier to write.

Programs and Text

Everyone is used to the idea that a program is stored as text, i.e. a stream of characters. And why shouldn't it be? After all, there are countless tools for editing, displaying, and manipulating text. Central parts of programming languages today are their grammars, parsers, compilers, and line-oriented debuggers. But a program's text is just one representation of the program. Programs are not text. Forcing programs into text form causes lots of problems that you might not even be aware of. We need a different way to store and work with our programs.

When a compiler compiles source code, it parses the text into a tree-like graph structure called an abstract syntax tree. Programmers do essentially the same operation mentally when they read source code. We still have to think about the tree-like structure of the program. That's why we have brackets and braces and parentheses. It's also why we need to format and indent code and follow coding conventions, so that it is easier to read the source.

Why do we resort to text storage? Because currently, the most convenient and universal way to read and edit programs is with a text editor. But we pay a price because text representations of programs have big drawbacks, the most important of which is that text-based programming languages are very difficult to extend. If programs are stored as text, you need an unambiguous grammar to parse the program. As features are added to the language, it becomes increasingly difficult to add new extensions without making the language ambiguous. We would need to invent more types of brackets, operators, keywords, rules of ordering, nesting, etc. Language designers spend enormous amounts of time thinking about text syntax and trying to find new ways to extend it.

If we are going to make creating languages easy, we need to separate the representation and storage of the program from the program itself. We should store programs directly as a structured graph, since this allows us to make any extensions we like to the language. Sometimes, we wouldn't even need to consider text storage at all. A good example of this today is an Excel spreadsheet. Ninety-nine percent of people don't need to deal with the stored format at all, and there are always import and export features when the issue comes up. The only real reason we use text today is because we don't have any better editors than text editors. But we can change this.

The problem is that text editors are stupid and don't know how to work with the underlying graph structure of programs. But with the right tools, the editor could work directly with the graph structure, and give us freedom to use any visual representation we like in the editor. We could render the program as text, tables, diagrams, trees, or anything else. We could even use different representations for different purposes, e.g. a graphical representation for viewing, and a textual representation for editing. We could use domain specific representations for different parts of the code, e.g. graphical math symbols for math formulas, graphic charts for charts, rows and columns for spreadsheets, etc. We could use the most appropriate representation for the problem domain, which might be text, but is not limited to text. The best representation depends on how we think about the problem domain. This flexibility of representation would also enable us to make our editors more powerful than ever, since different representations could have specialized ways to edit them.

What Is a Language in LOP?

Lastly, I should clarify what I mean by ‘language'. In LOP, a language is defined by three main things: Structure, editor, and semantics. Its structure defines its abstract syntax, what concepts are supported and how they can be arranged. Its editor defines its concrete syntax, how it should be rendered and edited. Its semantics define its behavior, how it should be interpreted and/or how it should be transformed into executable code. Of course, languages can also have other aspects, such as constraints and type systems.
Page
Sergey's photo

Sergey Dmitriev
JetBrains

Sergey Dmitriev is the cofounder and CEO of JetBrains Inc., makers of the IntelliJ IDEA Java IDE .
Sergey's personal website can be found at www.sergeydmitriev.com

Contact Sergey via email: dmitriev (at) jetbrains.com