Last month, three of us flew out for J on the Beach in Málaga, a conference for bringing "developers and DevOps together around Big Data."
This post was co-written by Marios and Andrei.
In this post, we want to share some of the best talks we attended at the conference.
Unfortunately, not all of the videos are up yet. But we will update this post with videos, if and when they go up.
Asynchronous Programming with Kotlin
We got the impression that for pure asynchronous programming, Kotlin is definitely a language to consider.
Good Ideas That We Forgot
One of his core ideas was that we're often replacing things that work just fine because we like things that are new and shiny. (At least, we like building new and shiny things.) But these replacements are often incomplete or inferior in some respect. And yet we stick with them because they're new (new is better, right?) and we try to patch or work around the issues. Sometimes, Joe says, a better approach is to look backwards.
One of his examples was the modern web and how multiple factors are to blame for worsening overall experiences.
We didn't agree with all of it though.
For instance, he advocates for using dear old GNU Make to build any and all languages. He had some interesting points, but ultimately, we prefer domain-specific tools (like Maven and Gradle) and the benefits they can offer.
Joe had a few recommendations that we will pass along to you.
Three great books to read:
- Algorithms + Data Structures = Programs, by Niklaus Wirth
- The Mythical Man-Month, by Frederick Brooks
- How to Win Friends and Influence People, by Dale Carnegie
Two papers to read:
- A Plea for Lean Software, by Niklaus Wirth
- The Emperor’s Old Clothes, by Tony Hoare (this lecture won the Turing Award)
Four old tools to learn:
Cluster Consensus: When Aeron Met Raft
Raft is an alternative to Paxos, with the goal of being simple and understandable.
Unfortunately, even though Paxos works, it is very hard to understand. Even Leslie Lamport, the creator, sort of admitted this. He wrote a second paper called Paxos Made Simple which was a (not so successful) attempt to clear up some of the confusion.
So, putting these technologies together, Aeron-Raft can achieve consensus about which messages were sent and delivered.
Something we particularly found interesting was how they drew inspiration from CPU parallelization to make Raft faster. Specifically, CPUs do instruction pipelining. And this inspired them to come up with "consensus pipelining," where microinstructions (propose, log, transmit, commit, execute) are pipelined.
This talk also highlighted some interesting shortcomings of the Java language. For example, the lack of a directory sync API, issues with ByteBuffer, inheritance issues with MappedByteBuffer and DirectByteBuffer.
Automerge: Making Servers Optional for Real-Time Collaboration
Under the hood, Automerge uses JSON structures that keep track of the change history made by each user.
This same approach can also be used for distributed systems where multiple nodes make conflicting changes that must be reconciled to achieve consistency. In this context, it is called a conflict-free replicated datatype, or CRDT.
It was fun seeing how CRDTs are being used to solve a problem most of us are familiar with: collaborative document editing.
Infrastructure as Code With Terraform
Targeted at system administrators, Terraform is a service that allows you to describe almost anything with an API using declarative configuration files that can be shared, collaborated on, and versioned, just like any other part of your software.
Terraform looks pretty cool because it allows you to centralize the automation of tools and services across your whole business using one tool. The example he gave in his talk was how at Hashicorp, they use Terraform to setup all the necessary accounts for employees who are being onboarded.
Predictive Elastic Database Systems
The gist of it is: scaling your database in and out at the right times can save you a lot of money, but getting that right is difficult.
Rebecca introduced us to two algorithms that have been designed to help.
The first one manages the actual scaling in and out of a clustered database. And the second is capable of predicting when to start the process. For example, you might want to start scaling out in anticipation of predictable peak traffic.
Both algorithms were implemented and tested as a part of her PhD.
Turning Software Into Computer Chips
FPGA chips have been built to run certain kinds of algorithms (that can be parallelized internally) many times faster than on a normal CPU. Since cloud providers are starting to include FPGAs as a service in their clouds, Hastlayer looks like an interesting option to consider if you are looking to improve performance.
ArangoDB Datacenter-to-Datacenter Replication
The first thing they did was throw tools at the problem. They started out with Kafka, which brought Zookeeper with it. This solved the problem, but the setup and the operational costs of these tools were too much. (Instead of managing one distributed system, users would now have to manage three! Eep.)
The next attempt (and the solution they stuck with) was a lot more simple and had no external dependencies. What they essentially did was replicate data over HTTP. The tool is called ArangoSync and does asynchronous replication.
A nice takeaway from this talk was that they deputized the customer who badly needed this feature as a product owner (or more precisely “feature owner”). This customer drove most of this feature design and also provided usability and functionality QA feedback with a very short feedback loop. Neat.
This was a great conference! Marios reports that it was one of the best conferences he has ever attended. The quality of the talks was great, they had an impressive speaker lineup, and the overall organization of the event was well done!
Marios and Christian cooling down at the CrateDB booth