Language Oriented Programming: The Next Programming Paradigm

Transformation Language

The Structure Language and Editor Language together already provide some power. You could use them to communicate ideas to other people, for example to draw UML diagrams or to write other types of static documents. However, most of the time we want our code to do something, so we have to find a way to make it executable. There are two main ways to do this: Interpretation and compilation.

Interpretation is supported by DSLs to help define how the computer should interpret the program. Compilation is supported by DSLs to help define how to generate executable code from our program. I will discuss support for interpretation in future articles. Right now I want to show how MPS supports compilation.

Compilation means to take source code and generate some form of executable code from it. There are many possibilities for the format of the resulting code. To generate executable code, you could generate natively executable machine code or bytecode that runs in a virtual machine. Alternatively, you could generate source code in a different language (e.g. Java or C++), and later use an existing compiler to turn that into executable code. Along the same lines, you could even generate source code in some interpreted language, and use the existing interpreter to execute the code.

To avoid dealing with such a wide variety of target formats, our approach is to do everything in MPS. First, you define a target language in MPS using the Structure Language. This target language should have a direct, one-to-one mapping to the target format. For example, if your target format were machine code, you would define a target language in MPS that represented machine code; if the target format were Java source code, you would define a Java-like target language. The target language doesn't have to support all the features of the target format, just as long as there is a simple, one-to-one mapping for all of the language features that you need.

So now there are two phases to compilation, a simple translation from the target language to the final result, and a more complex transformation from the initial source language to the intermediate target language. The translation phase is trivial, so we can focus on the more interesting transformation phase. Essentially, the problem is now simplified into how to transform models from one language to another. But the source language and target language could be radically different, making transformations very complex, for example by mapping one source node to many target nodes scattered throughout the target model. We want to make it as easy as possible to define transformations, so we need a model-transformation DSL to help us. In MPS, this DSL is called the Transformation Language.

There are three main approaches to code generation, which we would like to use together to define model transformations. The first is an iterative approach, where you enumerate all the nodes in the source model, inspect each one, and based on that information generate some resulting target nodes in the target model. The second approach is to use templates and macros to define how to generate code in the target language. The third approach is to use search patterns to find where in the source model to apply transformations.

We combine these approaches by defining DSLs to support each approach. The DSLs will all work together to help you define transformations from one language to another. For example, the iterative approach inspired the Model Query Language, which makes it easy to enumerate nodes and gather information from a concept model. You can imagine this as something like SQL for concept models. As a bonus, having a powerful query language is useful for more than just code generation (e.g. making editors smarter).

Templates

The template approach works something like Velocity or XSLT. Templates look like the target language, but allow you to add macros in any part of the template. Macros are essentially bits of code that are executed when you run the transformation. The macros allow you to inspect the source model (using the Model Query Language), and use that information to 'fill in the blanks' in the template to generate the final target code.

In Figure 5, you can see the definition of a template for generating Java code for a "Property" concept. The template adds field declarations, getters, and setters for the property. This template is part of the generator that translates code from the Structure Language into Java.

figure 5
Figure 5: Template for generating Java code for the "Property" concept

Since the templates look like the target language, you can imagine that templates are written in a special language that is based on the target language. This is in fact how it works. Instead of manually creating a new template language for each possible target language, we actually have a generator which generates the template language for you. It basically copies the target language and adds in all the special template features like macros and such. Even the template editors are generated from the target language's editors, so you don't have to hand code them either.

When you use a template language, you can think of it as writing code in the target language where some parts of the code are 'parameterized' or 'calculated' with macros. This technique helps simplify code generation enormously. Templates can also be used for other tasks like refactoring, code optimizers, and more.

Patterns

The model pattern-matching approach gives us a powerful way to search models, as an alternative to the Model Query Language. You can imagine patterns as regular expressions for concept models. Similar to the template approach, we will generate a pattern language based on the source language. The pattern language looks like the source language, but adds features which help you to define flexible criteria for performing complex matching on the source model. You can imagine this approach as a powerful search-and-replace technique. Again, the pattern languages are useful for more than just code generation. For example, they would be very useful for writing automatic code inspections for the source language's editors.

Remember that the Model Query Language, template languages, and pattern languages are all supported by powerful editors with auto-complete, refactoring, reference checking, error checking, and so on. Even complex queries, macros, and patterns will be easy to write. Code generation has never seen this level of power.

Using Languages Together

The previous section on code generation raises some interesting issues about how languages can work together. There are in fact several ways to achieve it. In MPS, all the concept models know about each other. Since languages are concept models too, this means that all the languages know about each other, and can potentially be interlinked.

Languages can have different relationships to each other. You could create a new language by extending an existing one, inheriting all of its concepts, modifying some of them, and adding your own. One language could reference concepts from another language. You could even ‘plug’ one language into another. I will discuss this in more detail in future articles.

Platforms, Frameworks, Libraries, and Languages

Our system for supporting Language Oriented Programming needs more than just meta-programming capabilities to make it useful. It should also support all the things that programmers have come to rely upon from today’s programming languages: Collections, user-interface, networking, database connectivity, etc. Programmers don’t choose languages solely based on the language itself. For instance, much of the power of Java comes not only from the language, but from the hundreds and hundreds of frameworks and APIs available for Java programmers to choose from. It’s not the Java language they are buying into, but the entire Java platform. MPS will also have a supporting platform of its own.

Before I get into the specifics, let’s talk briefly about frameworks. What is a framework? In mainstream programming, it usually means a set of classes and methods packaged up into a class library. Let’s look a little closer at this and see what we can see through the lens of LOP.

Why do we want to package up classes and methods into libraries? Most programmers would recite what their professors once told them and say, “Reuse.” But that just leaves another question in its place. Why do we want to reuse some set of classes? The answer is because the set of classes is useful for solving certain types of problems, like making GUIs, or accessing databases, or whatever. You might say that a class library corresponds to some domain. Lo and behold, we see the connection. Class libraries are wannabe DSLs! This sad fact really frustrates me.

Domain-specific languages exist today in the form of class libraries, except they aren’t languages, have none of the advantages of languages, and have all the limitations of classes and methods. Specifically, classes and methods are immediately tied to a specific runtime behavior which can’t be modified or extended, because that behavior is defined by the concepts of 'class' and 'method'. Because they are not languages, class libraries are rarely supported intelligently by the environment (compiler and editor, for example).

Should we be stuck with wannabe DSLs, or should we have the freedom to use a real DSL when a DSL is called for? Freedom, of course. Any class library is a good candidate for creating a full-fledged DSL for our platform. For example, all the libraries in the JDK should be DSLs for the MPS platform. Some of these DSLs are not so critical at the outset, but others will have a big impact on the power and reusability of the platform right from the beginning. I want to talk briefly about the three most important platform languages that will be provided with MPS: The Base Language, the Collection Language, and the User Interface Language.

Base Language

The first thing we need is a language for the simplest programming domain, which is general-purpose imperative programming. This simple language would support such nearly-universal language features as arithmetic, conditionals, loops, functions, variables, and so on. In MPS we have such a language, which is called the Base Language.

The need for such a language should be clear. For example, if we want to add two numbers together, we should be able to say ‘a + b’ as simple as that. We won’t need to use it everywhere, but it will be needed in some part of nearly all programs, wherever it is the most appropriate tool for the job.

The Base Language is so named because it is a good foundation for many languages that need basic programming support like variables, statements, loops, etc. It can be used in three ways. You can extend it to create your own language based on it, you can reference its concepts in your programs, and you can generate your code to the Base Language. There will be various generators available to transform the Base Language into other languages like Java, C++, etc. Not every language needs to use the Base Language, of course, but it’s a good starting point in many cases.

Page
Sergey's photo

Sergey Dmitriev
JetBrains

Sergey Dmitriev is the cofounder and CEO of JetBrains Inc., makers of the IntelliJ IDEA Java IDE .
Sergey's personal website can be found at www.sergeydmitriev.com

Contact Sergey via email: dmitriev (at) jetbrains.com

:: onBoard issues ::
Issue #2
February 2005