A couple of months ago my team had its first experience working with Java fibers, we needed to make our main application work asynchronously.
In this 3 part series, I will share my team’s experience and how we deploy and implement Java fibers in production.
In the previous part (Part 1), we talked about what fibers are in high level, how they compare to threads and why we started to explore them.
In this part we’ll focus further in-depth about fibers and how they differ from threads, we’ll see how to create fibers, how to work with them, and the basic concepts of how they work.
Threads vs. Fibers
We searched for a reason why not to stay with threads. We researched the costs and performance penalties of working with threads vs. fibers.
We wanted to find proof that fibers can work better than threads, or at least shine in some areas. So we did several tests and experiments to try and prove that, mostly on performance and scale.
I must say, the research did not yield conclusive results like I wished it did, but instead, it taught us a lot and enabled us to control the behavior of our service via a simple configuration flag.
Eventually it really depends on your specific use-case. In our case the performance differences between threads and fibers were minor but we gained a better imperative and more clean code.
Performance
Performance measurement is always a problem to do, because it is based on the conditions of the system it is running on.
But let’s take a look at a standard benchmark test that does allow us to get a sense of the performance we could get by working with Fibers:
The thread ring problem.
In the thread ring problem, we create 500 threads (can be any other amount as well), while connecting them in a ring (circle) structure so that the last thread points to the first one.
Then, serially we pass a message from one thread to the other, in a circular way, 10,000 times. There are many ways to implement this, here is one example available on GitHub: https://github.com/vy/fiber-test It uses the de-facto framework to measure nano performance on the JVM – JMH. It requires a little tweaking to run, but eventually, here are the results:
Environment and plan:
- Testing environment, laptop: Thinkpad X1, Core i7-7500U 2.70Ghz (4 cores), Ubuntu 1.64
- 5 Warm Up iterations, 5 executions.
Results:
- Java Threads: 10.646 ops/s
- Fibers: 103.241 ops/s
Fibers show improvement of almost x10 in this case, nice, we have a potential here !
perf
Next thing was to put our newly refactored application under perf to figure out if we gain any improvements in metrics such as branch predictions, CPU utilization, page faults and such. If you are unfamiliar with perf , it is a Swiss-army knife Linux profiler for almost everything. Our refactored application had a flag to set whether to run with fibers or threads. The following shows the differences between the 2 runs.
We issued the following command to run perf:
perf stat -p 90607 -a sleep 300
Environment and plan:
- Testing environment, server: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz, CentOS 7.2.1511
- 40 cores
- 128gb RAM
- Production traffic, +/- 300 QPS
- 5 minutes sampling.
Results:
Method/Metric | cpu-clock (msec) | context-switches | cpu-migrations | page-faults | cycles | instructions | branches | branch-misses |
Fibers | 2221610.273407 | 19,480,439 | 2,957,727 | 485,777 | 5,183,396,393,765 | 5,867,959,049,330 | 1,211,078,813,490 | 20,758,136,733 |
Threads | 2076849.036113 | 19,361,672 | 3,122,642 | 468,374 | 4,826,607,131,964 | 5,518,275,491,102 | 1,141,887,811,701 | 20,030,907,473 |
The results are not conclusive. One reason for this is the behavior of our application. It suffers from long business logic decisions and methods that consume a lot of CPU and do not block enough for fibers switching. In the parts where they do block, threads are doing similar job, therefore the outcome stays similar. If we reduced the number of threads of the ForkJoinPool (see Part 3) we could have a better ruling in the favor of fibers, but we couldn’t due to a large amount of CPU in our code.
Scale
Next was to try and differentiate the amount of threads vs. fibers in an application.
So, to demonstrate the burden on the OS when creating thousands of threads vs. the same amount of fibers, we did the following:
Lets see what happens when we try to run a small program that creates 100k threads:
import java.util.concurrent.TimeUnit; public class Monster extends Thread { public void run () { while (true) { System.out.println("I am monster running on: "+Thread.currentThread().getId()); try { TimeUnit.MILLISECONDS.sleep(100); } catch (InterruptedException e) { e.printStackTrace(); } } public static void main(String[] args) throws Exception { for (int i=0;i<100000;i++) { Monster monster = new Monster(); monster.start(); } } }
This is the output you will see when you run it:
Java HotSpot(TM) 64-Bit Server VM warning: Attempt to deallocate stack guard pages failed. Exception in thread "main" java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:717) at Test1.main(Test1.java:19) Java HotSpot(TM) 64-Bit Server VM warning: Attempt to deallocate stack guard pages failed. Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f6493563000, 12288, 0) failed; error='Cannot allocate memory' (errno=12)
Oops, It died.
It was unable to allocate and create native threads due to memory limitations.
Now, let’s write a similar fiber version of this:
import co.paralleluniverse.fibers.Fiber; import co.paralleluniverse.fibers.SuspendExecution; import co.paralleluniverse.strands.Strand; import co.paralleluniverse.strands.SuspendableRunnable; public class Monster implements SuspendableRunnable { public void run() throws SuspendExecution, InterruptedException { while (true) { System.out.println("I am monster running on: "+Thread.currentThread().getId()); try { Strand.sleep(100); } catch (InterruptedException e) { e.printStackTrace(); } } } public static void main(String[] args) throws Exception { for (int i=0;i<100000;i++) { new Fiber<Void>(new Monster()).start(); } } }
Running the above works like a charm 🙂
Now that we got the sense of what fibers are, in terms of performance and scale, let’s see how to create/work with them…
Hello world – how do we create a simple fiber?
In order to simplify our lives, fibers are implemented in a very similar fashion to Java threads. They have a functional interface, are Runnable like, and are launched/destroyed in the same way.
In fact, their implementation in Java is such that both implement a shared class named Strand, that is an abstraction of both a thread and a fiber.
This means you can design the system based on Strands, and decide by configuration whether to run on fibers or normal Java threads.
At the moment, fibers are used as an external library with intentions to make them part of the JVM (see: Project Loom).
Fibers require the code to be instrumented – instrumentation is a method used to inject (patch) bytecode instructions on top of existing classes that Java produces when it compiles sources to bytecode.
There are 2 ways to do it:
- Via dynamic instrumentation, by adding a -javaagent parameter to the VM parameters
- Using static instrumentation, by building the bytecode with instrumentation using build tools such as Ant/Maven
So first, to add support for fibers in your project , add the following to your Maven pom.xml:
<dependency> <groupId>co.paralleluniverse</groupId> <artifactId>quasar-core</artifactId> <version>0.7.10</version> </dependency> <dependency> <groupId>co.paralleluniverse</groupId> <artifactId>quasar-actors</artifactId> <version>0.7.10</version> </dependency> <dependency> <groupId>co.paralleluniverse</groupId> <artifactId>quasar-galaxy</artifactId> <version>0.7.10</version> </dependency> <dependency> <groupId>co.paralleluniverse</groupId> <artifactId>quasar-reactive-streams</artifactId> <version>0.7.10</version> </dependency>
In order to make methods in our code “fiber friendly”, we need to annotate them with the @Suspendable annotation or declare them to throw SuspendExecution.
This will tell Quasar what our interruption points are, so that instrumentation will be active.
Let’s write a simple fiber that runs a single fiber and calls 2 methods that print some output and sleep:
import co.paralleluniverse.fibers.Fiber; import co.paralleluniverse.fibers.SuspendExecution; import co.paralleluniverse.fibers.Suspendable; import co.paralleluniverse.strands.Strand; import java.util.concurrent.TimeUnit; public class Demo1 extends Fiber<Void> { @Suspendable void method1 () throws SuspendExecution, InterruptedException { System.out.println("Hello from method1, run count: "+Fiber.currentFiber().getCurrentRun()); Fiber.sleep(100, TimeUnit.MILLISECONDS); } @Suspendable void method2 () throws SuspendExecution, InterruptedException { System.out.println("Hello from method2, run count: "+Fiber.currentFiber().getCurrentRun()); Fiber.sleep(100, TimeUnit.MILLISECONDS); } @Override protected Void run() throws SuspendExecution, InterruptedException { System.out.println("Running fiber: "+Fiber.currentFiber().getName()); method1(); method2(); return super.run(); } public static void main(String[] args) throws Exception { new Demo1().start(); } }
To run it we use the following command:
java -classpath . -javaagent:quasar-core-0.7.10.jar -Dco.paralleluniverse.fibers.detectRunawayFibers=false -Dco.paralleluniverse.fibers.verifyInstrumentation=false Demo1
Output:
Running fiber: fiber-10000001 Hello from method1, run count: 1 Hello from method2, run count: 2 Process finished with exit code 0
We can see that interruption occurred between method1 and method2, due to the fact that the runCount increased by one on the second print, which implies that a fiber branch selection had occurred.
What went on? Why was the run count increased? Here is a step by step tracing:
- Fiber is first launched and started.
- run method is running.
- method1 is being called (run count = 1)
- Fiber.sleep is being called, the fiber stops
- The fiber scheduler is running and re-schedules this fiber.
- run is running again
- Instrumentation now jumps to method2
- method2 is being called (run count = 2)
- Fiber.sleep is being called, the fiber stops
- The fiber scheduler is running and re-schedules this fiber.
- Instrumentation now jumps to after method2
- Fiber ends.
What’s next
In this part we got a hold on how to create simple fibers, how they shine in performance and scale (in some scenarios), and how they can help us.
In the next part, we’ll deep dive into the structure of fibers, how they work behind the scenes in order to better understand them, and what we learnt when implementing them in production.
Continue reading the next part ….
Part III – Deeper view
Or go to the previous part …
Part I – Overview