Quantcast
Channel: Hacker News 50
Viewing all articles
Browse latest Browse all 9433

Go and Rust — objects without class [LWN.net]

$
0
0

Comments:"Go and Rust — objects without class [LWN.net]"

URL:https://lwn.net/SubscriberLink/548560/26d15e832d21a483/


Welcome to LWN.net The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider accepting the trial offer on the right. Thank you for visiting LWN.net! Free trial subscription Try LWN for free for 1 month: no payment or credit card required. Activate your trial subscription now and see why thousands of readers subscribe to LWN.net.

May 1, 2013

This article was contributed by Neil Brown

Since the advent of object-oriented programming languages around the time of Smalltalk in the 1970s, inheritance has been a mainstay of the object-oriented vision. It is therefore a little surprising that both "Go" and "Rust" — two relatively new languages which support object-oriented programming — manage to avoid mentioning it. Both the Rust Reference Manual andThe Go Programming Language Specification contain the word "inherit" precisely once and the word "inheritance" not at all. Methods are quite heavily discussed, but inheritance is barely more than a "by the way".

This may be just an economy of expression, or it may be an indication of a sea change in attitudes towards object orientation within the programming language community. It is this second possibility which this article will consider while exploring and contrasting the type systems of these two languages.

The many faces of inheritance

While inheritance is a core concept in object-oriented programming, it is not necessarily a well-defined concept. It always involves one thing getting some features by association with some previously defined things, but beyond that languages differ. The thing is typically a "class", but sometimes an "interface" or even (in prototype inheritance) an "object" that borrows some behavior and state from some other "prototypical" object.

The features gained are usually fields (for storing values) and methods (for acting on those values), but the extent to which the inheriting thing can modify, replace, or extend these features is quite variable.

Inheriting from a single ancestor is common. Inheriting from multiple ancestors is sometimes possible, but is an even less well-defined concept than single inheritance. Whether multiple inheritance really means anything useful, how it should be implemented, and how to approach the so-calleddiamond problem all lead to substantial divergence among approaches to inheritance.

If we clear away these various peripheral details (important though they are), inheritance boils down to two, or possibly three, core concepts. It is the blurring of these concepts that is created by using one word ("inheritance"), which, it would seem, results in the wide variance among languages. And it is this blurring that is completely absent from Go and Rust.

Data embedding

The possible third core concept provided by inheritance is data embedding. This mechanism allows a data structure to be defined that includes a previously defined data structure in the same memory allocation. This is trivially achieved in C as seen in:

 struct kobject {
 char *name;
 struct list_head entry;
 ...
 };

where a struct list_head is embedded in a struct kobject. It can sometimes be a little more convenient if the members of the embedded structure (next and prev in this case) can be accessed in the embedding object directly rather than being qualified as, in this case entry.next and entry.prev. This is possible in C11 and later using "anonymous structures".

While this is trivial in C, it is not possible in this form in a number of object-oriented languages, particularly languages that style themselves as "pure" object oriented. In such languages, another structure (or object) can only be included by reference, not directly (i.e. a pointer can be included in the new structure, but the old structure itself cannot).

Where structure embedding is not possible directly, it can often be achieved by inheritance, as the fields in the parent class (or classes) are directly available in objects of the child class. While structure embedding may not be strong motivation to use inheritance, it is certainly an outcome that can be achieved through using it, so it does qualify (for some languages at least) as one of the faces of inheritance.

Subtype polymorphism

Subtype polymorphism is a core concept that is almost synonymous with object inheritance. Polymorphic code is code that will work equally well with values from a range of different types. For subtype polymorphism, the values' types must be subtypes of some specified super-type. One of the best examples of this, which should be familiar to many, is the hierarchy of widgets provided by various graphical user interface libraries such as GTK+ or Qt.

At the top of thishierarchy for GTK+ is the GtkWidget which has several subtypes including GtkContainer and GtkEditable. The leaves of the hierarchy are the widgets that can be displayed, such as GtkEntry and GtkRadioButton.

GtkContainer is an ancestor of all widgets that can serve to group other widgets together in some way, so GtkHBox and GtkVBox — which present a list of widgets in a horizontal or vertical arrangement — are two subtypes of GtkContainer. Subtype polymorphism allows code that is written to handle a GtkContainer to work equally well with the subtypes GtkHBox and GtkVBox.

Subtype polymorphism can be very powerful and expressive, but is not without its problems. One of the classic examples that appears in the literature involves "Point and ColorPoint" and exactly how the latter can be made a subtype of the former — which intuitively seems obvious, but practically raises various issues.

A real-world example of a problem with polymorphism can be seen with the GtkMenuShell widget in the GTK+ widget set. This widget is used to create drop-down and pop-up menus. It does this in concert with GtkMenuItem which is a separate widget that displays a single item in a menu. GtkMenuShell is declared as a subtype of GtkContainer so that it can contain a collection of different GtkMenuItems, and can make use of the methods provided by GtkContainer to manage this collection.

The difficulty arises because GtkMenuShell isonly allowed to contain GtkMenuItem widgets, no other sort of child widget is permitted. So, while it is permitted to add a GtkButton widget to a GtkContainer, it is not permitted to add that same widget to a GtkMenuShell.

If this restriction were to be encoded in the type system, GtkMenuShell would not be a true subtype of GtkContainer as it cannot be used in every place that a GtkContainer could be used — specifically it cannot be the target of gtk_container_add(myButton).

The simple solution to this is to not encode the restriction into the type system. If the programmer tries to add a GtkButton to a GtkMenuShell, that is caught as a run-time error rather than a compile-time error. To the pragmatist, this is a simple and effective solution. To the purist, it seems to defeat the whole reason we have static typing in the first place.

This example seems to give the flavor of subtype polymorphism quite nicely. It can be express a lot of type relationships well, but there are plenty of relationships it cannot express properly; cases where you need to fall back on run-time type checking. As such, it can be a reason to praise inheritance, and a reason to despise it.

Code reuse

The remaining core concept in inheritance is code reuse. When one class inherits from another, it not only gets to include fields from that class and to appear to be a subtype of that class, but also gets access to the implementation of that class and can usually modify it in interesting ways.

Code reuse is, of course, quite possible without inheritance, as we had libraries long before we had objects. Doing it with inheritance seems to add an extra dimension. This comes from the fact that when some code in the parent class calls a particular method on the object, that method might have been replaced in the child object. This provides more control over the behavior of the code being reused, and so can make code reuse more powerful. A similar thing can be achieved in a C-like language by explicitly passing function pointers to library functions as is done with qsort(). That might feel a bit clumsy, though, which would discourage frequent use.

This code reuse may seem as though it is just the flip-side of subtype inheritance, which was, after all, motivated by the value of using code from an ancestor to help implement a new class. In many cases, there is a real synergy between the two, but it is not universal. The classic examination of this issue is apaper by William R. Cook that examines the actual uses of inheritance in the Smalltalk-80 class library. He found that the actual subtype hierarchy (referred to in the paper as protocol conformance) is quite different from the inheritance hierarchy. For this code base at least, subtypes and code reuse are quite different things.

As different languages have experimented with different perspectives on object-oriented programming, different attitudes to these two or three different faces have resulted in widely different implementations of inheritance. Possibly the place that shows this most clearly is multiple inheritance. When considering subtypes, multiple inheritance makes perfect sense as it is easy to understand how one object can have two orthogonal sets of behaviors which make it suitable to be a member of two super-types. When considering implementation inheritance for code reuse, multiple inheritance doesn't make as much sense because the different ancestral implementations have more room to trip over each other. It is probably for this reason that languages like Java only allow a single ancestor for regular inheritance, but allow inheritance of multiple "interfaces" which provide subtyping without code reuse.

In general, having some confusion over the purpose of inheritance can easily result in confusion over the use of inheritance in the mind of the programmer. This confusion can appear in different ways, but perhaps the most obvious is in the choice between "is-a" relationships and "has-a" relationships that is easy to find being discussed on the Internet. "is-a" reflects subtyping, "has-a" can provide code reuse. Which is really appropriate is not always obvious, particularly if the language uses the same syntax for both.

Is inheritance spent?

Having these three very different concepts all built into the one concept of "inheritance" can hardly fail to result in people developing very different understandings. It can equally be expected to result in people trying to find a way out of the mess. That is just what we see in Go and Rust.

While there are important differences, there are substantial similarities between the type systems of the two languages. Both have the expected scalars (integers, floating point numbers, characters, booleans) in various sizes where appropriate. Both have structures and arrays and pointers and slices (which are controlled pointers into arrays). Both have functions, closures, and methods.

But, importantly, neither have classes. With inheritance largely gone, the primary tool for inheritance — the class — had to go as well. The namespace control provided by classes is left up to "package" (in Go) or "module" (in Rust). The data declarations are left up to structures. The use of classes to store a collection of methods has partly been handed over to "interfaces" (Go) or "traits" (Rust), and partly been discarded.

In Go, a method can be defined anywhere that a function can be defined — there is simply an extra bit of syntax to indicate what type the method belongs to — the "receiver" of the method. So:

 func (p *Point) Length() float64 {
 return math.Sqrt(p.x * p.x + p.y * p.y)
 }

is a method that applies to a Point, while:

 func Length(p *Point) float64 {
 return math.Sqrt(p.x * p.x + p.y * p.y)
 }

would be a function that has the same result. These compile to identical code and when called as "p.Length()" and "Length(&p)" respectively, identical code is generated at the call sites.

Rust has a somewhat different syntax with much the same effect:

 impl Point {
 fn Length(&self) -> float {
 sqrt(self.x * self.x + self.y * self.y)
 }
 }

A single impl section can define multiple methods, but it is perfectly legal for a single type to have multiple impl sections. So while an impl may look a bit like a class, it isn't really.

The "receiver" type on which the method operates does not need to be a structure — it can be any type though it does need to have a name. You could even define methods for int were it not for rules about method definitions being in the same package (or crate) as the definition of the receiver type.

So in both languages, methods have managed to escape from existing only in classes and can exist on their own. Every type can simply have some arbitrary collection of methods associated with it. There are times though when it is useful to collect methods together into groups. For this, Go provides "interfaces" and Rust provides "traits".

 type file interface {
 Read(b Buffer) bool
 Write(b Buffer) bool
 Close()
 }
 trait file {
 fn Read(&self, b: &Buffer) -> bool;
 fn Write(&self, b: &Buffer) -> bool;
 fn Close(&self);
 }

These two constructs are extremely similar and are the closest either language gets to "classes". They are however completely "virtual". They (mostly) don't contain any implementation or any fields for storing data. They are just sets of method signatures. Other concrete types can conform to an interface or a trait, and functions or methods can declare parameters in terms of the interface or traits they must conform to.

Traits and interfaces can be defined with reference to other traits or interfaces, but it is a simple union of the various sets of methods.

 type seekable interface {
 file
 Seek(offset u64) u64
 }

trait seekable : file { fn Seek(&self, offset: u64) -> u64; }

No overriding of parameter or return types is permitted.

Both languages allow pointers to be declared with interface or trait types. These can point to any value of any type that conforms to the given interface or trait. This is where the real practical difference between the Length() function and the Length() method defined earlier becomes apparent. Having the method allows a Point to be assigned to a pointer with the interface type:

 type measurable interface {
 Length() float64
 }
The function does not allow that assignment.

Exploring the new inheritance

Here we see the brave new world of inheritance. It is nothing more or less than simply sharing a collection of method signatures. It provides simple subtyping and doesn't even provide suggestions of code reuse or structure embedding. Multiple inheritance is perfectly possible and has a simple well-defined meaning. The diamond problem has disappeared because implementations are not inherited. Each method needs to be explicitly implemented for each concrete type so the question of conflicts between multiple inheritance paths simply does not arise.

This requirement to explicitly implement every method for every concrete type may seem a little burdensome. Whether it is in practice is hard to determine without writing a substantial amount of code — an activity that current time constraints don't allow. It certainly appears that the developers of both languages don't find it too burdensome, though each has introduced little shortcuts to reduce the burden somewhat.

The "mostly" caveat above refers to the shortcut that Rust provides. Rust traits can contain a "default" implementation for each method. As there are no data fields to work with, such a default cannot really do anything useful and can only return a constant, or call other methods in the trait. It is largely a syntactic shortcut, without providing any really inheritance-like functionality. An example from theNumeric Traits bikeshed is

 trait Eq {
 fn eq(&self, other: &Self) -> bool { return !self.ne(other) };
 fn ne(&self, other: &Self) -> bool { return !self.eq(other) };
 }

In this example it is clear that the defaults by themselves do not provide a useful implementation. The real implementation is expected to define at least one of these methods to something meaningful for the final type. The other could then usefully remain as a default. This is very different from traditional method inheritance, and is really just a convenience to save some typing.

In Go, structures can have anonymous members much like those in C11 described earlier. The methods attached to those embedded members are available on the embedding structure as delegates: if a method is not defined on a structure it will be delegated to an anonymous member value which does define the method, providing such a value can be chosen uniquely.

While this looks a bit more like implementation inheritance, it is still quite different and much simpler. The delegated method can only access the value it is defined for and can only call the methods of that value. If it calls methods which have been redefined for the embedding object, it still gets the method in the embedded value. Thus the "extra dimension" of code reuse mentioned earlier is not present.

Once again, this is little more than a syntactic convenience — undoubtedly useful but not one that adds new functionality.

Besides these little differences in interface declarations, there are a couple of significant differences in the two type systems. One is that Rust supports parameterized types while Go does not. This is probably the larger of the differences and would have a pervasive effect on the sort of code that programmers write. However, it is only tangentially related to the idea of inheritance and so does not fit well in the present discussion.

The other difference may seem trivial by comparison — Rust provides a discriminated union type while Go does not. When understood fully, this shows an important difference in attitudes towards inheritance exposed by the different languages.

A discriminated union is much like a C "union" combined with an enum variable — the discriminant. The particular value of the enum determines which of the fields in the union is in effect at a particular time. In Rust this type is called an enum:

 enum Shape {
 Circle(Point, float),
 Rectangle(Point, Point)
 }

So a "Shape" is either a Circle with a point and a length (center and radius) or a Rectangle with two points (top left and bottom right). Rust provides a match statement to access whichever value is currently in effect:

 match myshape {
 Circle(center, radius) => io::println("Nice circle!");
 Rectangle(tl, br) => io::println("What a boring rectangle");
 }

Go relies on interfaces to provide similar functionality. A variable of interface type can point to any value with an appropriate set of methods. If the types to go in the union have no methods in common, the empty interface is suitable:

 type void interface {
 }

A void variable can now point to a circle or a rectangle.

 type Circle struct {
 center Point
 radius float
 }
 type Rectangle struct {
 top_left, bottom_right Point
 }

Of course it can equally well point to any other value too.

The value stored in a void pointer can only be accessed following a "type assertion". This can take several forms. A nicely illustrative one for comparison with Rust is the type switch.

 switch s := myshape.(type) {
 case Circle:
 printString("Nice circle!")
 case Rectangle:
 printString("What a boring rectangle")
 }

While Rust can equally create variables of empty traits and can assign a wide variety of pointers to such variables, it cannot copy Go's approach to extracting the actual value. There is no Rust equivalent of the "type assertion" used in Go. This means that the approaches to discriminated union in Rust and Go are disjoint — Go has nothing like "enum" and Rust has nothing like a "type assertion".

While a lot could be said about the comparative wisdom and utility of these different choices (and, in fact,much has been said) there is one particular aspect which relates to the topic of this article. It is that Go uses inheritance to provide discriminated unions, while Rust provides explicit support.

Are we moving forward?

The history of programming languages in recent years seems to suggest that blurring multiple concepts into "inheritance" is confusing and probably a mistake. The approach to objects and methods taken by both Rust and Go seem to suggest an acknowledgment of this and a preference for separate, simple, well-defined concepts. It is then a little surprising that Go chooses to still blend two separate concepts — unions and subtyping — into one mechanism: interfaces.

This analysis only provides a philosophical objection to that blend and as such it won't and shouldn't carry much weight. The important test is whether any practical complications or confusions arise. For that we'll just have to wait and see.

One thing that is clear though is that the story of the development of the object-oriented programming paradigm is a story that has not yet been played out — there are many moves yet to make. Both Rust and Go add some new and interesting ideas which, like languages before them, will initially attract programmers, but will ultimately earn both languages their share of derision, just as there are plenty of detractors for C++ and Java today. They nonetheless serve to advance the art and we can look forward to the new ideas that will grow from the lessons learned today.


Did you like this article? Please accept our trial subscription offer to be able to see more content like it and to participate in the discussion. (Log in to post comments)

Viewing all articles
Browse latest Browse all 9433

Trending Articles