Will Project Loom obliterate Java Futures?

Published in

SoftwareMill Tech Blog

19 min readJan 28, 2020

Project Loom is a proposal to add fibers and continuations as a native JVM construct. With a JDK release every 6 months, we’ll probably see it released (or some part of it) sooner rather than later.

Fibers are light-weight threads, which can be created in large quantities, without worrying about exhausting system resources. Fibers are going to change how we write concurrent programs in Java. Does that mean that asynchronous programming as we know it today, based on constructs such as Future and CompletableFuture will become obsolete? Are Futures going away?

What’s Project Loom

Project Loom has three main goals: introducing continuations, fibers, and tail-call elimination. Continuations (also known as coroutines) are the technical, low-level means through which fibers are going to be implemented; tail-call elimination will be done in the future. Hence, let’s focus on fibers.

Is our future a big bundle of communicating fibers?

The execution of a fiber can be suspended at explicitly defined yield points. When a fiber yields, other fibers might be executed by the scheduler. Fibers are then resumed when a condition is met (e.g. a resource is available — such as a file being opened) and when the scheduler allocates a thread on which the fiber might run.

There can be an arbitrary number of fibers, running on a small thread pool, yielding cooperatively. Each fiber consumes only a small amount of memory, and context switches aren’t expensive.

When a blocking call is encountered while executing a fiber, it doesn’t block the underlying thread. Instead, the fiber is suspended until the requested resource (file, network connection, console) is available.

How 5 fibers might be executed on 3 threads

Switching fibers is a cheap operation, as contrasted with the cost of switching contexts of a thread. Fibers bring two main improvements over threads: one is that they have a low memory footprint, meaning that they can be created in large quantities. Second, that suspend and resume are fast operations.

However, Project Loom doesn’t stop here. What seems to be the hardest part, is retrofitting existing blocking Java APIs so that they become fiber-aware. Each time Java does blocking I/O, waits on a lock/mutex or sleeps, this can now take advantage of the new mechanism. A blocking I/O call executed on a fiber will now use non-blocking I/O under the hood.

This will make it possible for existing code to immediately benefit from the performance improvements typically available only to asynchronous APIs.

Does this mean that Loom’s fibers are the long-awaited answer to our concurrency problems?

Why we are doing async

Before we answer that question, let’s think for a while why we started using asynchronous APIs at all. Performance is the first obvious answer: instead of blocking OS-level threads, which are expensive to create, run and switch, we use non-blocking APIs, which allow us to utilise the threads in a much greater fraction. Hence, our code runs faster, with less thread-related overhead.

But that’s only part of the story. Asynchronous programming turns out to be a good model to deal with concurrency; or at least, a better one than purely synchronous code.

Numerous projects have shown that working directly with thread synchronization primitives (such as mutexes and locks) usually leads to deadlocks, thread starvation or other bugs. That’s why we need better abstractions.

One variant is using callbacks and working directly in continuation-style. However, this soon leads to callback hell. An alternative is for non-blocking APIs to return Futures (or Promises, as they are known in JavaScript), which can then be combined: sequenced or run in parallel. This model for taming concurrency is much more manageable and leads to fewer bugs, and hence better code.

Taking these ideas one step further, we have the actor model (known from Erlang and Akka), in which isolated processes communicate solely by message-passing. Asynchronous, non-blocking APIs also play a big role here; they allow scheduling of sending messages to actors, when a given resource is available, or when an I/O call completes.

Hence, programming in the asynchronous style is not only more performant and leads to better utilisation of resources (e.g. through batching and backpressure), but it also gives us a better tool to model computations which happen concurrently.

What’s wrong with async

Using non-blocking, asynchronous functions doesn’t come without problems. If you’ve been doing “reactive” or non-blocking programming you are probably well aware of these. There are three main ones, mentioned by the Project Loom team lead, but also shared in other talks (e.g. this popular one by Tomek Nurkiewicz):

lost control flow: a business process expressed by composing Futures using thenCompose/flatMap and friends is less readable than its procedural, synchronous and blocking alternative.
lost context: as each callback/future runs on its own thread, the stack traces aren’t really informative. Quite the opposite.
viral: once a method returns a Future/CompletableFuture, all methods which call it also have to return a Future (they could block, forcing the Future to a value, but that’s usually not desired).

Working and composing Futures in a simple business process

Some languages and libraries try to solve the above problems. For example, both C# and the Scala ZIO library provide rich stack traces, which transcend thread boundaries, and help to recover the lost context.

Another example are async/await constructs, which help to mitigate the lost control flow problem. These are now native in Javascript, available as a language feature in C# and Kotlin, and as a library feature in Scala.

Coloring functions

However, async/await constructs have a fundamental limitation, which is also an instance of problem #3: virality. The core of the problem is that a Future-based or asynchronous function can call both other asynchronous and synchronous functions. However, synchronous functions cannot call asynchronous functions.

This not only divides the world of functions into two types (also called “colors”: blue for synchronous, red for asynchronous; see this article & HN discussion), but also causes problems e.g. when working with higher-order functions.

Turns out that a simple collection.forEach(elementFunction) method might need to have two variants: one when elementFunction is synchronous (blue) and one when it’s asynchronous (red). In Scala 2’s async/await, await calls are not allowed in closures at all. A similar library for Scala 3, dotty-cps-async, allows awaiting in closures, if a “shifted” version of the function is provided externally. Kotlin takes a hybrid approach, where functions are still colored, but there’s no need to explicitly await on suspending functions. Go bypasses this problem entirely by making all functions asynchronous (red) — an approach which is not far from Loom, as we will see.

How fibers solve the above problems

Fibers try to solve the above problems by making code synchronous again; or at least, by making code look as if it was doing synchronous calls.

First, we can once again use familiar constructs to express our business logic. This includes built-in flow control (if, while, for), as well as custom flow-control, defined using higher-order functions (e.g. forEach, map, filter). If we are running in a fiber, we can safely call “blocking” (fiber-suspending) code.

Using Loom, this might still be non-thread-blocking code

Second, we regain control of the context. Thanks to the fact that fibers are natively supported on the JVM, stack traces will once again be meaningful by default. The context (call stack) of each fiber is preserved upon suspension and restored when the fiber resumes.

Finally, programming in the “synchronous style” doesn’t have to be viral. If we are using one method which does fiber-suspending blocking calls, this doesn’t impose any programming-paradigm requirements on the users of the code.

In a way, the fibers from project Loom are similar to the approach taken by Go. Everything is now a “red” method, as any method can suspend the fiber; a pragmatic and practical take, however every rose has its thorns, as we’ll see!

Side note: how Loom fibers compare to Kotlin coroutines/Scala fibers

Kotlin already has coroutines, which are supported by the compiler; Scala libraries have fibers (see Monix and ZIO). Are they the same as the upcoming JVM Fibers?

No; and they can’t be — JVM fibers are implemented natively, and can leverage the fact that they have runtime support. On the other hand, coroutines in Kotlin are a compile-time mechanism; in Scala, fibers are a library mechanism.

If you use Kotlin’s coroutines, upon compilation the code is transformed (using the continuation-passing style (CPS) transform) to a callback-based variant; in a way, coroutines are a purely syntactic construct. However, there’s no magic, and a synchronous, blocking I/O or mutex call will remain thread-blocking (unlike in Loom).

Still, we can use nicer syntax for asynchronous calls, using an async/await like construct. The CPS transform and hence Kotlin’s coroutines also have other possibilities, such as defining generators or stream processors.

Same example as before, but written using a coroutine (notice the suspend) and await() calls

As for Scala, there’s also no magic, especially that everything happens at the library level. Here, a fiber is an abstraction, which makes it easier to write concurrent programs. The abstraction still denotes a light-weight thread, preferably composed from asynchronous, non-blocking operations (any blocking operation should be run in isolation on a dedicated thread pool).

In the Scala implementations, there are rich APIs for describing concurrently running processes and their interactions (and which can use fibers as one of the options). The syntax to define process composition is similar to that known from composing futures, and they are equally viral.

Both Scala and Kotlin fibers enqueue work units to a queue; a scheduler then runs them on a thread pool, thus allowing to run many fibers on a much smaller thread pool.

While they are different constructs from the Loom fibers, Kotlin coroutines and Scala fibers will be able to leverage the native implementation. However, will this make sense? Quoting Daniel Spiewak:

In the performance tests that I’ve done, yes, Loom is a lot faster than a naive fiber scheduler based on opaque thread pooling, but you can reap massive performance benefits (often around an order of magnitude) in fiber libraries by making the thread scheduler fiber-aware. Loom actually defeats this optimization because it forces you into a generic executor-style interface. So Loom’s “native mechanism” is actually a lot slower than what can be achieved in higher level frameworks specifically because Loom is too low level to take advantage of what the higher level frameworks know about access patterns.

An interesting research area, trying to combine the composability of the asynchronous style with the simplicity of the blocking style, is the monadic-reflection project for Scala 3. It utilises Loom under the hood, as well as Scala 3’s context functions, and allows seamlessly converting between the two representations.

A fiber is still a thread

Going back to asynchronous programming: are Loom Fibers an answer to our concurrency problems, then? After all, as compared to Futures, they solve the problems of lost control flow, lost context and virality. Sadly, there’s more to writing concurrent programs than that!

A fiber is still a thread-like construct (in the proposition, that’s even expressed at the type level; the common super class of Fiber and Threadis a Strand). That means that if we’ll want to run multiple fibers concurrently, we’ll need some way of orchestrating them; just as with threads. The fibers need to be created at the right time, errors need to be handled etc. This is also reflected in the proposal: the authors write not only that continuations are a low-level construct, but that fibers are low-level as well.

Hence while fibers successfuly implement the motto “codes like sync, works like async”, they fail to fulfill “concurrency made simple”.

Maybe concurrency is an inherently complex problem? If that’s the nature of concurrency, we’ll have to live with it. Concurrency will never be simple. Instead, we can try to make concurrency understandable.

But do we want code that looks like sync, works like async?

We’ve seen these promises before. People have been trying to implement “transparent” RPC (Remote Procedure Calls) multiple times. As Martin Kleppmann writes in his book “Designing Data-Intensive Applications”:

The RPC model tries to make a request to a remote network service look the same as calling a function or method in your programming language, within the same process (this abstraction is called location transparency). Although RPC seems convenient at first, the approach is fundamentally flawed. A network request is very different from a local function call: (…)

Doesn’t this sound familiar?

RPCs, and any network calls in general, have a significantly different characteristic than a normal function call. They are unpredictable: can arbitrarily fail, regardless of the value of input parameters. They can take an arbitrary amount of time to complete, while a normal function completes, fails or loops. And even if a network call seems to have failed, it might have still succeeded from the viewpoint of the other system.

People have fallen victim of false RPC abstractions multiple times. That’s why we have to be extra cautious with abstractions like Loom’s fibers. It’s tempting to treat everything as a synchronous call; but sometimes you have to resist the temptation.

Writing code in the asynchronous style might be harder; but that doesn’t mean that once written, the code won’t be “better” under (subjective) metrics such as quality and understandability. The fact that a normal function call is also syntactically distinct from an RPC call, might be an advantage to readability.

Orchestrating fibers

Leaving RPC aside, and going back to the problem of orchestrating fibers. We know they are a thread-like concept. But, we still don’t want to use locks and mutexes (be it in the thread-switching or fiber-suspending variants) to coordinate how multiple fibers (or threads, or more generally strands) cooperate. We need higher-level constructs to achieve that. How can these constructs look like?

When designing the orchestration layer, our overall goal would be to hide fibers as much as possible. Since we want to treat them as a low-level tool, if the user of our hypothetical library sees a fiber, we’ve already lost.

The more you do concurrency without seeing fibers, threads or synchronization primitives, the better.

Let’s think about the operations we might want to perform. Some basic ones would be:

run a list of computations in parallel, collecting all results
run a list of computations in sequence, collecting all results
race two computations, returning the result of the first one that completes
retry a computation, according to a given retry schedule (with a maximum number of retries, maximum total time etc.)
specify a timeout for a computation

All of these require representing our computations in a lazy manner. The computations need to be lazy, as it’s the orchestration layer that decides when and how many times a particular computation is started.

In the async world, this lazy representation would be something like Callable<CompletableFuture<T>>. With fibers, it’s enough to have Callable<T>. Indeed, a talk on fibers by Loom’s lead developer contains a similar example:

From “Why continuations are coming to Java”. See also section on structured concurrency below.

Once we have such a lazy representation, we might introduce more operators and leverage other benefits. For example, we can decouple creation and execution orders of our computations. It doesn’t matter, in what order Callable<T> instances are created. These representations are lazy, and their actual effects only take place once the computation is evaluated. Hence it only matters, in what order the computations are composed.

The operators that we might want to add can cover error handling, threadpool-pinning, repeated evaluation, caching, and most importantly, safe resource allocation.

Cancellation and resource safety

Once we start running computation concurrently, be it using Futures, Loom’s fibers or our lazy computational wrappers, at some point we’ll face the problem of cancellation (also known as interruptability).

For example if we have two computations, and want to get the result of the one that completes first, once this happens, it doesn’t make any sense to continue evaluating the second one. In other words, we need to cancel (interrupt) the computation that “lost”.

Similarly, if we run a number of computations in parallel, and one of them fails, we might want to cancel the others and re-throw the error.

Designing a cancellation system is a complex subject, mostly outside the scope of this article. There’s a couple of approaches; as far as Project Loom is concerned, the good news is that cancellation is built-in: a fiber can be safely canceled. The cancellation points are naturally determined by the places where blocking calls happen, that is at yield points.

However, this raises questions about what happens to finalizers: blocks of code that need to run to e.g. cleanup an allocated resource. Typically, these blocks of code must run, whatever the outcome of the computation: if it completes normally, with an exception, or if it’s canceled.

It’s very tricky to design an interruption system. Examples of hard questions that need to be answered are: what happens when an exception is thrown inside a finalizer? How do these exceptions influence nested finalizers? What if a fiber is canceled in the middle of a finalizer? If cancellation takes a long time, can we cancel cancellation — e.g. impose a hard deadline, during which cancellation has to finish?

A first attempt to implementing finalizers with fibers could be to use try ... finally, or try-with-resources. Similarly to thread interruption, cancellation would inject an exception. However, this mechanism has a couple of downsides. If cancellation is exception-driven, the exception can be caught and recovered from. Do we want to allow this to happen?

Moreover, code that catches all exceptions is not uncommon. In this case, care must be taken to re-throw cancellation exceptions. That’s one of many examples that show why exceptions aren’t a good tool for control flow.

Second, cancellation in this scenario can take an arbitrary amount of time. What if we want to stop cancellation, if it e.g. tries to communicate over the network (sending a bye message), but we are stopping the process because of a network error? Finally, do we want to be able to trigger cancellation from within the fiber, by throwing an exception?

These considerations might lead us to the conclusion that we need an external coordinator, which runs our computations. Such a coordinator would ensure that cancellation is done properly, running all finalizers as required, if necessary imposing restrictions on how long they take. In this scenario, cancellation is an out-of-band operation, independent of the exception mechanism. We’ve just re-invented composable cancellation.

Combining lazily evaluated computation representations, and a coordinator which runs them, we arrived at a solution that is in many ways similar to our current, Future-based, asynchronous approach to dealing with concurrency.

A crucial Loom contribution, making interruption more practical is the ability to cancel (interrupt) a fiber on any yield point. Combined with the retrofitted Java APIs, “blocking” calls can now be safely interrupted, as opposed to the current situation.

Better wrappers

While we might depart from using Futures/CompletableFutures, it turns out that if we want to avoid interacting with fibers directly (and we do), we still need a way to represent computations in a “wrapped” form.

Note that we’re still leveraging and using Loom: both as the fiber runtime, and to define what happens in a single fiber. We can use the control flow constructs we know and like, preserving the context, in a non-viral way, by writing synchronous-like code. However, to orchestrate the fibers, we still need to work with “wrapped” values.

However, just as Futures are a superior abstraction for concurrency comparing to callbacks, the design that we’ve arrived at: lazily-evaluated effect wrappers are a superior abstraction to Futures. We’ve been using the very basic Callable<T> interface for now, but such abstractions are better known in the industry as the IO datatype.

An IO is a data structure, which describes a computation. It builds upon the Callable<T> representation, by adding more features and possibilities: the ability to integrate with asynchronous computations (using callbacks), and synchronous ones; describing error handling; describing forking and joining of fibers; representing concurrent computations, with canceling and finalization mechanisms; and more.

The difference between the two wrappers might seem small, but in fact is crucial: a Future is a running computation, which will at some point yield a result. An IO describes how to run a computation which will at some point yield a result by wrapping a function.

Another important feature of operating on the level of lazily evaluated computation descriptions, is that we now have a referentially transparent representation. This has huge implications for the ability to reason about code: since we represent side-effecting computations as a value, = has a single meaning. In more practical terms, refactoring side-effecting computations is safe, just as refactoring “normal” code.

This concept is not new, and comes directly from functional languages, such as Haskell and Scala, where these representations are often used. Find out more reading about ZIO, Haskell’s IO or writing your own IO as described in the Functional Programming with Scala book.

Leaky abstraction

We’ve started with asynchronous programming using callbacks; then we progressed to Futures. Only to abandon them, and use fibers. But — not entirely — it turns out we might need a “wrapped” representation after all, but of a different kind.

Is our hybrid fiber+IO representation the one we’re looking for? Or maybe, since we need “hybrids” at all, we should use yet another approach to concurrency?

Martin Thompson is one well-known Java architect, who thinks fibers are wrong, since they are a leaky abstraction:

Instead, Martin proposes working with state machines, and points at the P language as a possible research area into how to make concurrent programs more understandable. Martin also points out the RPC fallacy:

And that fibers are just start of the story (as we’ve already seen when discussing interruptions):

Others, however, disagree, pointing out that asynchronous APIs have proved hard to teach and work with:

And that fibers allow using familiar constructs to express sequencing, while introducing a new construct to express parallelism; as opposed to the current, Future-based situation:

I recommend reading the whole twitter thread — although the branching factor of the discussion makes it hard to follow at times.

There are valid points on both sides. In the end, it might be a matter of picking the best approach for a particular job. When doing a simple service, small application or a quick script, I don’t want to deal with any kind of wrappers, be it Future or IO. Fibers and their “codes like sync, works like async” model will make my life much easier.

When writing a business application, I might want to use the synchronous-like API that Loom Fibers enable to express the business logic, using well-known constructs to express the control flow within a business process. However, to orchestrate concurrently running computations, handle errors and allocate resources, I’ll use an IO-like abstraction.

Finally, for a high-performance asynchronous system, I’ll probably take the fully-asynchronous approach, working with state machines, callbacks or Futures.

Fibers and actors

We shouldn’t forget actors! Actors provide a very approachable way to thinking about concurrency, and even though they provide a single, fixed recipe for describing concurrent processes, as such are a great tool in their modelling. For many problems, it’s natural to think about islands of state, which communicate only asynchronously and only with messages. How will they be impacted by the upcoming fibers?

Now that we can once again perform “blocking” operations in our code, should we do this in actors? Yes and no. Yes — that’s of course an option; as part of the message-handling logic, if the actor runs in a fiber (and it probably will), you’ll now be able to run blocking operations. Should you do it? That depends on what the actor does.

Actors themselves are a bit similar to fibers; their resume point is the message-handling code; they get suspended after handling a single message. Suspending the actor additionally when there’s a fiber-suspending call, might lead to different actor behavior.

Currently, if there’s e.g. a network call to be done when handling a message, typically the actor runs the appropriate method, which returns a Future. Then, the actor adds a listener on the future’s completion, to be notified when the future — and hence the network call — completes. The message handler completes and other messages might be handled — while the network call is executing concurrently.

If the network call was blocking, then the actor wouldn’t be able to process any messages in between starting and completing the network call. That’s a crucial difference! My suspicion is that most actor-based systems will want to retain the current non-blocking and asynchronous behavior, either by directly working with Futures/IOs, or by creating one by running the blocking call concurrently with the actor.

As a supplement to the discussion on cancellation, actors also run in a managed, coordinated environment. Actors form tree-like hierarchies, and their cancellation — stopping an actor — is coordinated and supervised. Hence we have another example of cancellation, which is implemented as a separate, out-of-band mechanism.

Fibers and reactive

Since we are on the subject of actors: what about reactive? In the context of relatively direct programming style (not architectural patterns), the main idea behind “reactive data processing” is that of handling and propagating backpressure. That is, the flow of data is controlled and adjusted to the rate at which various components handle it.

Here once again, to properly handle backpressure, we need a run-time coordinator. Like before, individual stages of a reactive data pipeline, can take advantage of fibers and the fact that we can code in synchronous-like style.

However, all concurrency-related and dataflow-related operations need to be performed at the stream level, using “wrapped” computation descriptions — no big changes here, as compared to the style we do reactive stream processing today (e.g. using Akka Streams, RxJava or “functional” fs2).

Structural concurrency

There’s yet another approach to concurrency, which was implemented both by project Loom and Kotlin: structured concurrency. The basic idea is that the layout of the program (in its textual form) should correspond to its execution flow. This is compared to the argument against using goto, in favor of higher-level constructs such as if, for, while etc.

With the structured concurrency approach, it’s not possible just to create a thread or a fiber as a side-effect and forget about it. All threads/fibers are scoped, and will be terminated (by waiting or interruption/cancellation) when the scope which created them exits. This one hand allows much better local reasoning (which is also an important trait of the lazy IO wrappers mentioned earlier), and promises taming concurrency. But it also requires thinking about concurrency in a new way.

Kotlin’s coroutines allow programming in this style by creating coroutines in a scope; similarly, Loom provides scopes which delimit regions, in which fibers are created. After that region exits, fibers are terminated once they yield. While these are useful primitives, libraries will have to provide higher-level interfaces, to orchestrate concurrent processes.

Scala’s ZIO/Monix also allow working in the “structured concurrency” style, by providing a number of combinators, which orchestrate how effects given as parameters are run: either in sequence, in parallel, with error handling, retries, rate-limited etc.

Summary

Code written in synchronous style, taking advantage of the fact that fibers can be suspended and resumed, is convenient for constructing side-effecting, but sequential and single-threaded programs. If you need to block to wait for something, and you can do it without syntactic overhead — great! That can be a boost to productivity and readability. (Plus, it’s quite possible that .get() on a Future will stop being a crime.)

However, fibers are not a solution for orchestrating concurrent flows. They don’t differ much from threads here. If you want to run two computations in parallel and combine their results, you still have to manage how the combination should be done (is it a race — should only the first value be returned, or both?), how to manage errors and interruption (what happens if one of the processes ends with an error?), timeouts etc.

Project Loom will definitely stir the status quo of asynchronous, non-blocking and concurrent programming in Java and on the JVM. New best practices will have to emerge, when it comes to sequencing effects, concurrency, parallelism and actor systems. Futures will probably stay, but there’s also a window of opportunity to upgrade the stack and use lazily evaluated IO wrappers, bringing even more functional programming to the Java world.

Hence while Project Loom won’t replace the way we do concurrency in Java, it might provide yet another tool in our toolbox. And it will definitely be misused, as every other construct out there. But there will also be room for new libraries, which combine the managed environments in which IO computation descriptions are safely interpreted, with the “codes like sync, works like async” of Loom’s fibers.