Craties Go to J on the Beach

2018-06-28, by Andrei Dan

Last month, three of us flew out for J on the Beach in Málaga, a conference for bringing "developers and DevOps together around Big Data."

Meet us

Meet the craties: Christian, Marios, and Andrei

This post was co-written by Marios and Andrei.

Talks

In this post, we want to share some of the best talks we attended at the conference.

Unfortunately, not all of the videos are up yet. But we will update this post with videos, if and when they go up.

Asynchronous Programming with Kotlin

Kotlin is a statically typed programming language that runs on the Java Virtual Machine (JVM) that can be compiled down to JavaScript. In this talk, Hadi Hariri explained Kotlin's approach to asynchronous programming.

This talk was 100% code and detailed how Kotlin can be used to replace CompletableFuture, handle Thread lifecycles, thread access to common data structures, and so on.

We got the impression that for pure asynchronous programming, Kotlin is definitely a language to consider.

Good Ideas That We Forgot

Joe Armstrong gave a talk that was a fun look back at the good ideas in software engineering that have been forgotten

One of his core ideas was that we're often replacing things that work just fine because we like things that are new and shiny. (At least, we like building new and shiny things.) But these replacements are often incomplete or inferior in some respect. And yet we stick with them because they're new (new is better, right?) and we try to patch or work around the issues. Sometimes, Joe says, a better approach is to look backwards.

One of his examples was the modern web and how multiple factors are to blame for worsening overall experiences.

We didn't agree with all of it though.

For instance, he advocates for using dear old GNU Make to build any and all languages. He had some interesting points, but ultimately, we prefer domain-specific tools (like Maven and Gradle) and the benefits they can offer.

Joe had a few recommendations that we will pass along to you.

Three great books to read:

Two papers to read:

Four old tools to learn:

Cluster Consensus: When Aeron Met Raft

In this talk, Martin Thompson introduced us to Aeron-Raft.

Aeron-Raft is an implementation of the Raft consensus algorithm on top of Aeron, an open source blazing fast UDP unicast and multicast message transport.

Raft is an alternative to Paxos, with the goal of being simple and understandable.

Unfortunately, even though Paxos works, it is very hard to understand. Even Leslie Lamport, the creator, sort of admitted this. He wrote a second paper called Paxos Made Simple which was a (not so successful) attempt to clear up some of the confusion.

So, putting these technologies together, Aeron-Raft can achieve consensus about which messages were sent and delivered.

Something we particularly found interesting was how they drew inspiration from CPU parallelization to make Raft faster. Specifically, CPUs do instruction pipelining. And this inspired them to come up with "consensus pipelining," where microinstructions (propose, log, transmit, commit, execute) are pipelined.

This talk also highlighted some interesting shortcomings of the Java language. For example, the lack of a directory sync API, issues with ByteBuffer, inheritance issues with MappedByteBuffer and DirectByteBuffer.

Automerge: Making Servers Optional for Real-Time Collaboration

Martin Kleppmann gave a very interesting talk on Automerge.

Automerge is a JavaScript library that allows you to work offline. When you go back online, the library can sync your changes to the cloud. But you can also sync with peers, making decentralized peer-to-peer collaboration possible.

Under the hood, Automerge uses JSON structures that keep track of the change history made by each user.

This same approach can also be used for distributed systems where multiple nodes make conflicting changes that must be reconciled to achieve consistency. In this context, it is called a conflict-free replicated datatype, or CRDT.

It was fun seeing how CRDTs are being used to solve a problem most of us are familiar with: collaborative document editing.

Infrastructure as Code With Terraform

Mitchell Hashimoto (creator of Vagrant!) gave a talk on Terraform.

Targeted at system administrators, Terraform is a service that allows you to describe almost anything with an API using declarative configuration files that can be shared, collaborated on, and versioned, just like any other part of your software.

Terraform looks pretty cool because it allows you to centralize the automation of tools and services across your whole business using one tool. The example he gave in his talk was how at Hashicorp, they use Terraform to setup all the necessary accounts for employees who are being onboarded.

Predictive Elastic Database Systems

Rebecca Taft (from CockroachDB) gave a talk on elastic database systems that seems particularly relevant as more and more business move their apps to the cloud.

The gist of it is: scaling your database in and out at the right times can save you a lot of money, but getting that right is difficult.

Rebecca introduced us to two algorithms that have been designed to help.

The first one manages the actual scaling in and out of a clustered database. And the second is capable of predicting when to start the process. For example, you might want to start scaling out in anticipation of predictable peak traffic.

Both algorithms were implemented and tested as a part of her PhD.

Turning Software Into Computer Chips

Benedek Farkas gave a talk that introduced us to Hastlayer, a C# library that can transform normal code into a special bytecode that can, in turn, be run on any FPGA chip.

FPGA chips have been built to run certain kinds of algorithms (that can be parallelized internally) many times faster than on a normal CPU. Since cloud providers are starting to include FPGAs as a service in their clouds, Hastlayer looks like an interesting option to consider if you are looking to improve performance.

ArangoDB Datacenter-to-Datacenter Replication

Ewout Prangsma (from ArangoDB, naturally) gave a talk that walked us through their internal attempts to implement datacenter-to-datacenter replication.

The first thing they did was throw tools at the problem. They started out with Kafka, which brought Zookeeper with it. This solved the problem, but the setup and the operational costs of these tools were too much. (Instead of managing one distributed system, users would now have to manage three! Eep.)

The next attempt (and the solution they stuck with) was a lot more simple and had no external dependencies. What they essentially did was replicate data over HTTP. The tool is called ArangoSync and does asynchronous replication.

A nice takeaway from this talk was that they deputized the customer who badly needed this feature as a product owner (or more precisely “feature owner”). This customer drove most of this feature design and also provided usability and functionality QA feedback with a very short feedback loop. Neat.

Wrap Up

This was a great conference! Marios reports that it was one of the best conferences he has ever attended. The quality of the talks was great, they had an impressive speaker lineup, and the overall organization of the event was well done!

Marios and Christian cooling down at the CrateDB booth