We’re proud to announce the release of Sumac, our first open source library. Sumac is a simple, lighweight library for parsing command line arguments in Scala.
One of the great things about Spark is the ability to use it on just one machine in “local mode.” Not only is this useful for trying out Spark before setting up a cluster, it makes it easy to use Spark in your unit tests. It was easy enough to write one test using Spark, but we ran into a couple of issues when we went to integrate them into test suite:
After you run a few Spark jobs, you’ll realize that Spark spits out a lot of logging messages. At first, we found this too distracting, so we turned off all Spark logs. But that was too heavy-handed — we always wanted to see some of the log messages, and of course, when we needed to debug something, we wanted everything.
Our goal as a company is to create a data-driven product that is powerful but simple. This simplicity is for the benefit of usability, but it happens to hide a lot of things that we have behind the scenes, including masses of data, trained algorithms, and useful bits of code.
Some of these things we can share and some we can’t, and our aim with this blog is to share what we can. This will also be a channel for us to give a bit back to the same open source community that makes our lives easier.
Part of our aim here is purely self-interested: to attract talented engineers. If you’re a developer or data scientist with an interest in what we do, please email firstname.lastname@example.org. We look forward to hearing from you.