Small & fast Docker images using GraalVM’s native-image

Adam Warski
SoftwareMill Tech Blog
7 min readMay 30, 2019

--

The JVM ecosystem has a lot of great traits, but small, cloud-deployment-friendly Docker images is not one of them. Dockerizing even a simple application will result in an image that has hundreds of megabytes, as it needs to contain all dependencies (jars) and the whole JVM. Can we do anything about it?

A small docker container

Turns out we can! GraalVM (recently released), apart from the new JIT compiler and polyglot support, also contains the native-image utility. As the name suggests, it can be used to build native images of applications, given a set of input jars and an entry class.

Such an image is a stand-alone executable, which contains a “partial” virtual machine, SubstrateVM. SubstrateVM includes only some components of a full JVM, such as thread scheduling and the garbage collector. However, many functionalities are left out: for example classes cannot be loaded dynamically, there’s no security manager, etc.

We can use native-image to build a Docker image containing our application, which will not only be much more compact, but also starts faster. The basic idea is simple:

  1. compile the Scala (Java/Kotlin/…) application to .jar files
  2. use native-image to generate a native binary
  3. create a docker image with the generated native binary

The docker image from step 3, apart from the executable application, only needs a very basic linux installation. No other dependencies are required, not even a JRE— everything’s included in the binary. Hence, a good candidate for the base image is alpine, which weights only 5MB.

That all sounds too good to be true! Yes, there are gotchas:

  1. not all Java/bytecode features are supported. For example, usage of reflection requires additional configuration. This might be a much bigger problem for Java than for Scala, as in Scala reflection is almost unused.
  2. generating the native image is a time-consuming process. Hence, in exchange for a smaller docker image, we get longer compilation time. However, we can delegate this task to a CI server.
  3. native-image is at an “early adopter” stage (unlike GraalVM, for which a stable version is available)

An example using Scala + sbt

Let’s look at an example! We’ll take a “Hello, world!” application, but complicate it a bit so that it’s not completely trivial: we’ll print the greeting from a background thread. For that, we’ll use the Monix library, which transitively pulls a couple of dependencies: reactive-streams, cats-core, monix-eval, and so on. As a result, we’ll get a bunch of jars on the classpath.

Here’s the whole code of our application (all of the code is available on GitHub):

When you run it, you should see two greetings, one from the background thread (task.start starts an asynchronously executing process), and the other from the main thread (as in line 9 we are evaluating task directly as well):

Hello, world from thread: scala-execution-context-global-11!
Hello, world from thread: main!

Let’s first build a “traditional” Docker image. We’ll use sbt-native-packager and the default configuration. In sbt, this is:

// in project/plugins.sbt
addSbtPlugin("com.typesafe.sbt" % "sbt-native-packager" % "1.3.21")
// in build.sbt
lazy val core: Project = (project in file("core"))
.enablePlugins(DockerPlugin)
.enablePlugins(JavaServerAppPackaging)
.settings(
packageName in Docker := "docker-test"
)

As a result of running sbt docker:publishLocal, we get an image which has over 600MB, and takes about 2.2 seconds to run (on a 2018 MBP):

$ docker images | grep docker-test
docker-test 0.1-SNAPSHOT 97a9755b70bf 4 seconds ago 647MB
$ time docker run graalvm-tests:0.1-SNAPSHOT
Hello, world from thread: scala-execution-context-global-9!
Hello, world from thread: main!
real 0m2.272s
user 0m0.032s
sys 0m0.016s

There’s definitely room to improve! Now, let’s try doing the same using native-image. The first obstacle is that we can’t build the native image on the host machine, as in my case it’s macOS, and binaries built on macOS won’t run on alpine linux, or any other linux. Welcome to the native binaries world!

To solve this problem, we’ll have to build the native image in a host-system-agnostic way. We can use Docker for that, and build the native binary in a docker container! But first, we’ll need a container with GraalVM and native-image installed. Here’s the Dockerfile:

// Dockerfile
FROM oracle/graalvm-ce:19.0.0
WORKDIR /opt/graalvm
RUN gu install native-image
ENTRYPOINT ["native-image"]
// building the image
docker build -t graalvm-native-image .

And here’s the pseudo-command that we need to run, to build the native image:

docker run -it 
-v HOST_CLASSPATH_DIRECTORY:/opt/cp
-v HOST_RESULT_DIRECTORY:/opt/graalvm
graal-native-image
"--static"
"-H:Name=out"
"MAIN_CLASS"

We’re mounting two volumes. The first should contain all the jars that our application depends on (mapped to /opt/cp), including the jar with the application itself. The second is a directory, to which the created native image will be written (/opt/graalvm).

Note: we’re using --static, so that all glibc references are statically linked, as alpine contains a different glibc implementation than the one supported by GraalVM.

To automate the process, we’ll need to do some sbt/Scala programming! We’ll create a new task which, when run, will build the container with the native image:

In the task implementation, we need to first create a directory containing all the dependency jars, and obtain the name of the main class:

Then, we construct the command itself and start the container, which will build the native image:

Finally, we create the docker image which will contain the native image binary. It will be based on the following Dockerfile:

FROM alpine:3.9.4
COPY out /opt/docker/out
RUN chmod +x /opt/docker/out
CMD ["/opt/docker/out"]

To build the image, we need to provide the out file, which should be the binary built by the container before. We once again invoke the docker command:

And we’re ready! We can now invoke the task we’ve just written using sbt dockerGraalvmNative:

[info] Running native-image using the 'graalvm-native-image' docker container
[info] Running: docker run --rm -v /Users/adamw/projects/graalvm-tests/core/target/native-docker/stage/cp:/opt/cp -v /Users/adamw/projects/graalvm-tests/core/target/native-docker/stage/result:/opt/graalvm graalvm-native-image -cp /opt/cp/* --static -H:Name=out com.softwaremill.graalvm.Hello
[info] Build on Server(pid: 10, port: 39981)*
[info] [out:10] classlist: 13,446.92 ms
[info] [out:10] (cap): 1,439.55 ms
[info] [out:10] setup: 3,296.29 ms
[info] [out:10] (typeflow): 12,512.91 ms
[info] [out:10] (objects): 9,332.97 ms
[info] [out:10] (features): 480.57 ms
[info] [out:10] analysis: 23,033.36 ms
[info] [out:10] (clinit): 1,724.69 ms
[info] [out:10] universe: 2,107.40 ms
[info] [out:10] (parse): 839.70 ms
[info] [out:10] (inline): 2,087.42 ms
[info] [out:10] (compile): 8,056.96 ms
[info] [out:10] compile: 11,505.92 ms
[info] [out:10] image: 959.39 ms
[info] [out:10] write: 779.09 ms
[info] [out:10] [total]: 55,415.13 ms
[info] Building the container with the generated native image
[info] Running: docker build -t docker-graalvm-native-test -f /Users/adamw/projects/graalvm-tests/run-native-image/Dockerfile /Users/adamw/projects/graalvm-tests/core/target/native-docker/stage/result
[info] Sending build context to Docker daemon 10.46MB
[info] Step 1/4 : FROM alpine:3.9.4
[info] ---> 055936d39205
[info] Step 2/4 : COPY out /opt/docker/out
[info] ---> 563b8eee2ad3
[info] Step 3/4 : RUN chmod +x /opt/docker/out
[info] ---> Running in e222636c9c14
[info] Removing intermediate container e222636c9c14
[info] ---> caf4caf3d912
[info] Step 4/4 : CMD ["/opt/docker/out"]
[info] ---> Running in c8e44ffc73d6
[info] Removing intermediate container c8e44ffc73d6
[info] ---> 6c89703a6b9a
[info] Successfully built 6c89703a6b9a
[info] Successfully tagged docker-graalvm-native-test:latest
[info] Build image docker-graalvm-native-test
[success] Total time: 63 s, completed May 17, 2019 11:30:25 AM

It took a while, but now we have an image that is an order of magnitude smaller in size, and twice as fast to run:

$ docker images | grep docker-graalvm-native-test
docker-graalvm-native-test latest 6c89703a6b9a 4 seconds ago 26.5MB
$ time docker run -it --rm docker-graalvm-native-test
Hello, world from thread: main!
Hello, world from thread: scala-execution-context-global-39!
real 0m1.238s
user 0m0.033s
sys 0m0.015s

The 1.2 seconds the application takes to run is almost purely Docker overhead. The binary itself runs almost instantly (try it — build a native image locally and run it directly!), but here we also need to start the container, execute it and clean up afterwards.

Summary

Summing up, we’ve reduced the size of the docker image from 647MB to 26.5MB, and decreased the time it takes our application to run from 2.2s to 1.2s, which is the overhead imposed by Docker. Building the image takes some time and brings with it some restrictions on the JVM features that we might use: anything that dynamically manipulates the bytecode, scans the classpath or inspects classes using reflection, will be problematic.

Luckily, it’s usually a good idea to avoid reflection and classpath scanning anyway — and applications written using Scala and Scala libraries rarely rely on it. Instead, Scala applications use compile-time mechanisms, which not only have better runtime performance, but also provide better type-safety and detect problems earlier. For Java, frameworks need to be adapted, and the build process further customised. Quarkus is one of the efforts in that direction.

If you’d like to experiment with the above, all of the code is available on GitHub. Enjoy!

Looking for Scala and Java Experts?

Contact us!

We will make technology work for your business. See the projects we have successfully delivered.

--

--

Software engineer, Functional Programming and Scala enthusiast, SoftwareMill co-founder