Esquire Theme by Matthew Buchanan
Social icons by Tim van Damme



Botmobiles by Darren Rawlings



Breaking: Darren Rawlings takes requests from pals. :D

See more of his series of fictional cars made into transforming robots here. -Dean

Rad Vehicles Transformed by Darren Rawlings










These transforming fictional vehicles couldn’t be radder. “If They Could Transform” is a new series by our pal, Darren Rawlings. Head over to Rawls’s DeviantArt or Tumblr for more! -Dean



10 Things You Should Know About Running MongoDB at Scale


Guest post by Asya Kamsky, Principal Solutions Architect at MongoDB.

This post outlines ten things you need to know for operating MongoDB at scale based on my experience working with MongoDB customers and open source users:

  1. MongoDB requires DevOps, too. MongoDB is a database. Like any other data store, it requires capacity planning, tuning, monitoring, and maintenance. Just because it’s easy to install and get started and it fits the developer paradigm more naturally than a relational database, don’t assume that MongoDB doesn’t need proper care and feeding. And just because it performs super-fast on a small sample dataset in development doesn’t mean you can get away without having a good schema and indexing strategy, as well as the right hardware resources in production! But if you prepare well and understand the best practices, operating large MongoDB clusters can be boring instead of nerve-wracking.
  2. Successful MongoDB users monitor everything and prepare for growth. Tracking current capacity and capacity planning are essential practices in any database system, and MongoDB is no different. You need to know how much work your cluster is currently capable of sustaining and what demands will be placed on it during times of highest use. If you don’t notice growing load on your servers you’ll eventually get caught without enough capacity. To monitor your MongoDB deployment, you can use MongoDB Management Service (MMS) to visualize your operations by viewing the opscounters (operation counters) chart:
  3. The obstacles to scaling performance as your usage grows may not be what you’d expect. Having seen hundreds of users’ deployments, the performance bottlenecks usually are (in this order):



Pypix: Machine Learning With Python


Machine learning (ML) teaches machines how to carry out tasks by themselves. It is that simple. The answer is No. This article will give you a broad overview of the types of learning algorithms…



Before and after lot vacancy



Justin Blinder used New York’s city planning dataset and Google Streetview for a before and after view of vacant lots.

Vacated mines and combines different datasets on vacant lots to present a sort of physical facade of gentrification, one that immediately prompts questions by virtue of its incompleteness: “Vacated by whom? Why? How long had they been there? And who’s replacing them?” Are all these changes instances of gentrification, or just some? While we usually think of gentrification in terms of what is new or has been displaced, Vacated highlights the momentary absence of such buildings, either because they’ve been demolished or have not yet been built. All images depicted in the project are both temporal and ephemeral, since they draw upon image caches that will eventually be replaced.



A New Challenge


About two years ago Joyent began offering Linux instances, running under KVM, stored on ZFS, and secured by Zones (“double hull virtualization”). Since then, I’ve been doing more and more work on Linux performance as customers deploy on these instances. It’s been fascinating to work on both the illumos and Linux kernels at the same time (a Linux guest in an illumos host), with full stack visibility of the guests, and hypervisor, down to metal. It’s let me better understand the design choices for each OS by having another perspective, something I talked about in my recent SCaLE12x keynote, and in my Systems Performance book. Apart from different OSes, I’ve also worked directly with dozens of customers, with many different applications, databases, and programming languages — too many to list. It’s been amazing, and I’m grateful that I had the chance to work and innovate in this unique environment.

However, I’ve decided to leave Joyent and pursue a new opportunity (more details about this later). It involves, among other things, working higher up in the application level, as well as Linux full time. Compared to the production observability I typically get on SmartOS/DTrace, working on Linux will at times be a challenge, but I’m interested in challenging myself and taking that on. I don’t know what tracing tool I’ll be using in the long term, but I’m no stranger to the Linux tracers: SystemTap, perf_events, dtrace4linux, ktap, and LTTng, which I’ve been using in a lab environment to solve customer issues. It’s been challenging, but also rewarding, to analyze Linux in new and different ways, and improve its performance.

I know some in the Solaris/illumos/SmartOS communities may be saddened by this news. I hope people can be glad for what I have contributed, and I’ll certainly continue to participate in these communities when I can. Also note that my change of job doesn’t change my technical opinion on those platforms and their technologies (especially DTrace, ZFS, and Zones) – which are still great, for the exact reasons I’ve publicly documented over the years and spoken about at conferences. I’m proud to have worked with them, and with the smart and dedicated people who build, support, and use them.



Spreading additional load on already busy servers



by kunev and others



When hearing that WhatsApp was bought for $16 billion


by drwxrxrx



Building a snow shelter


To give you a break from the usual GNU/Linux/DevOps/Puppet/GlusterFS drab, I’ve decided to have a go at writing a different kind of technical article. This article will show you how to build the traditional Canadian snow dwelling known as a quinzee. If you will be travelling to Canada, I recommended that you read through this article ahead of time, so that you don’t offend your host by being unfamiliar with their traditional living accommodations.


If you are not Canadian or if it is your first time building this type of shelter, I recommend that you have a professional supervise and guide you! If it is not sufficiently cold out (less than 0oC) when you are building or sleeping in your shelter, it could collapse, squashing and suffocating you! If it’s built correctly, a quinzee can safely support the weight of many people, although it is not intended for use as scaffolding. When you are inside the quinzee digging, you must have a buddy outside, standing watch for your safety, in case of quinzee collapse or polar bear attack.

Fresh snow:

Start off by choosing a place to build your quinzee. Usually you’ll build your quinzee near the other quinzees in the village, however for this article, I have decided to set up in a remote location, as this will be used as my vacation home. It is good to build on flat terrain, where there is a lot of fresh snow. In Canada, there is always an abundance of snow so it should be easy to find a pristine area.

Fresh snow

Fresh snow

If you build in a wooded area, be sure that there are no trees or dead branches that might fall or break off in the wind and squish you or your quinzee!


With each snowfall, new layers of snow are deposited onto the older layers which is called stratification. Since each layer could have different consistencies or strengths, it is important to mix or break up the layering before we build our quinzee. This will make sure that the walls of the quinzee are stable, and don’t shift later on.

Destratification of the snow

Destratification of the snow

Here, the layers of snow are destratified with a shovel, in a circle of about two to three metres in diametre. This circle will be the outside size of your quinzee. A three metre diametre quinzee would more than comfortably fit three large sleeping Canadians, which is the equivalent of two standard polar bears wide.

Piling the snow:

To make a quinzee, you need a large pile of snow. You’ll want to shovel the snow from outside of the destratified quinzee circle, into the centre of the pile. Work your way around the circle, gradually moving outwards as you need more snow. This technique has the added advantage that the area outside of your quinzee will be cleared of snow, and gives you room for stargazing, polar bear watching, and other typical Canadian pastimes.

A pile of snow

A pile of snow

After a few hours your pile will grow large enough that it should be about two to three metres high. You don’t have to worry about the specific height as long as you always throw the snow on the top of the pile. The pile will automatically spread out to form a larger diametre base if it is too high to be supported on a smaller base.

Some quinzee builders will pile their backpacks and other equipment in the centre of the pile under a tarp before piling on the snow. This is a tricky optimization, but if done correctly, there will be less snow to shovel onto the pile, and less snow to dig out at the end.

Large snow pile

A larger pile of snow

When not in use, make sure you don’t leave shovels or other items lying around on the ground, as they could easily get covered up by snowfall and lost forever. Conveniently, shovels can be stuck into the snow, where they’ll stand upright and remain visible.

At this point your pile of snow should now be complete. Congratulate yourself on a job well done and award yourself with a cool mug of maple syrup.


When digging out the quinzee, we’ll need to know how much snow to remove so that we don’t create unwanted windows in the wall of the shelter. To help keep track of the wall thickness, first gather about 40 to 50 sticks, each about 30cm in length. Insert all the sticks in the wall of the quinzee spaced out about 30cm from one another, and all pointing towards the centre of the shelter.

Sticks inserted into the pile of snow

Sticks inserted into the pile of snow

If the sticks are longer than 30cm, it’s okay if they protrude from the outside of the shelter. If they are too short, you can use them for firewood. When we dig out the shelter, we should encounter the sticks from inside the quinzee. Whenever you see a stick, you should stop digging in that direction.


At this point the quinzee needs time to settle. It typically takes between two and six hours. In this time, the snow crystals will lock together under their own weight, and the structure will harden because of the cold.

Waiting for compaction

Waiting for compaction

Although not easily noticeable, the structure may shrink due to this compaction. This is usually a great time to prepare lunch!


A typical Canadian meal might consist of bannock cooked on a stick, Montreal bagels, poutine, and a glass of maple syrup. Dessert is usually tire, served on snow. When your quinzee has sufficiently compacted, it’s time to start digging it out. If your quinzee needs more time to settle, then you can keep busy by gathering some firewood or practising your favourite Canadian song. While gathering a little firewood, I had a chance to take a quick photo looking down the main street of my village. On the right you can see some of the kindling I’ve collected, and in the distance to the left you can see a quinzee.

Main road of my village

Main road of my village


You typically want as small of a door opening as you can fit through. Start digging near, but above the base of the quinzee at a downward angle towards the centre. As you reach the centre, start digging outwards in all directions. The downward angle will prevent strong winds from blowing directly into the quinzee. The inner dome will keep heat from escaping. It takes some experience to get the shape right, but after years of practice, you’ll quickly become an expert.

Cross sectional diagram of a quinzee

Cross sectional diagram of a quinzee

This process should take you about an hour. When you are digging inside of the quinzee, remember to always have your buddy on the outside in the event of collapse. While collapses are rare, they can happen, in particular if the weather is too warm, if you didn’t let your quinzee settle for long enough, or if you made your walls too thin or your roof too thick. It is important that your buddy pay attention, because the quinzee insulates you from sound (and cold) very well. To demonstrate this, scream or talk loudly while you’re inside the quinzee. Your buddy will not be able to hear you very easily.

Entrance to the quinzee

Entrance to the quinzee

During the day, if you ask your buddy to seal off the entrance with their body, it will be very dark inside the quinzee, but you should be able to see faint blue areas where the walls are the thinnest and some light is coming in. You probably don’t want to dig much further in those areas.

Finishing up:

When you are finished digging out your quinzee, pull out one of the sticks in the roof to create an air hole for ventilation. You’ll probably want to gently smooth out the inside walls and roof with your gloved hand. This will prevent the jagged digging marks (snow stalactites) from dripping on you once you’re sleeping in the quinzee and your body heat is causing them to melt.

View from inside the quinzee

View from inside the quinzee of the door

As a related optimization, some quinzee builders like to light a candle in the centre of the quinzee for a few hours to try to melt any variations on the inner walls, causing a glaze and preventing dripping.

Making it a home:

It’s customary to line the bottom of your quinzee with either a tarp, or if it’s available, some hay. The hay is a particularly comfortable bedding material because it soaks up excess moisture which would otherwise make your floor damp. In general, Canadians only line their quinzees with hay when royalty or a special guest is coming to visit. If you do not receive any hay when staying over at a friend’s place, please do not be offended as it is hard to come by in a country that has a winter climate for most of the year.

Looking into the quinzee

Looking into the quinzee

To get in and out of your quinzee it is customary to crawl or slide in on your stomach. In the above photo, an inconsiderate guest decided to “walk” on their hands and knees, causing damage to the door. In some Canadian villages, damaging a neighbours door can be a great insult and has led to at least two serious incidents. [1] [2]

For sleeping, orient your feet so they are closest to the door, and with your head away from it. When multiple people are lying next to each other in this way, the quinzee can be quite warm during cold winter nights.


If you plan on leaving your home unattended for some time, it is common practice to destroy your quinzee so that small animals or children don’t get accidentally trapped inside should it be left unsupervised. Sadly, only a few days after I took these photos, a polar bear attacked my village and destroyed the quinzee!

What remained after a polar bear attack

What remained after a polar bear attack

One positive outcome of the attack is that you can get a good look inside the quinzee and you see a nice cross-section of the walls. Fortunately, this won’t cost me much in building materials, but it will be some time before I have time to rebuild, and even longer before I have time to get a guest quinzee built!

I hope this has been an instructional article! Stay warm, stay safe, and

Happy Hacking,




Feature Preview: Vagrant Share


A primary goal of Vagrant is not only to provide easy-to-use development environments, but also to make it easy to share and collaborate on these environments.

With Vagrant 1.5, we’re introducing a feature that will allow you to share your running Vagrant environment with anyone, on any network connected to the internet. We’re calling this feature ‘Vagrant Share.’

This feature lets you share a link to your web server to a teammate across the country, or just across the office. It’ll feel like they’re accessing a normal website, but actually they’ll be talking directly to your running Vagrant environment. They’ll be able to see any changes you make, as you make them, in real time.

With Vagrant Share, others can not only access your web server, they can access your Vagrant environment like it was any other machine on a local network. They can have access to any and every port.

Read on for a demo and more details.


Before we get into details about Vagrant share, let’s show a few demos. You may need to go fullscreen to read the text.

Sharing an HTTP server:

Sharing SSH access:

Sharing a static IP with Vagrant Connect:

Vagrant Share, Vagrant Connect

The feature we call “Vagrant Share” introduces two new Vagrant commands: vagrant share and vagrant connect.

The share command is used to share a running Vagrant environment, and the connect command compliments it by accessing any shared environment. Note that if you’re just sharing HTTP access, the accessing party does not need Vagrant installed. This is covered later.

We’ll cover the details of each command next.

HTTP Sharing

By default, Vagrant Share shares HTTP access to your Vagrant environment to anyone in the world. The URL that it creates is publicly accessible and doesn’t require Vagrant to be installed to access — just a web browser.

$ vagrant share
==> default: Local HTTP port: 5000
    default: Local HTTPS port: disabled
==> default: Your Vagrant Share is running!
==> default: URL:

Once the share is created, a relatively obscure URL is outputted. This URL will route directly to your Vagrant environment; it doesn’t matter if you or accessing party is behing a firewall or NAT.

Currently, HTTP access is restricted through obscure URLs. We’ll be adding more ACLs and audit logs for this in the future.

SSH Access

While sharing your local webserver is a powerful collaboration tool, Vagrant Share doesn’t stop there. With just a single flag, Vagrant Share can allow anyone to easily SSH into your Vagrant environment.

Perhaps you’re having issues where your app isn’t running properly or you just want to pair program. Now, with just one flag, anyone you want can SSH into your Vagrant environment from anywhere in the world.

SSH access isn’t shared by default. To enable sharing SSH, you must add the --ssh flag to vagrant share:

$ vagrant share --ssh
==> default: SSH Port: 22
==> default: Generating new SSH key...
    default: Please enter a password to encrypt the key:
    default: Repeat the password to confirm:
    default: Inserting generated SSH key into machine...
==> default: Checking authentication and authorization...
==> default: Creating Vagrant Share session...
    default: Share will be at: awful-squirrel-9454
==> default: Your Vagrant Share is running!

When the --ssh flag is provided, Vagrant generates a brand new SSH keypair for SSH access. The public key portion is automatically inserted into the Vagrant environment. The private key portion is uploaded to the server managing the Vagrant Share connections. The password used to encrypt the private key is not uploaded anywhere, however, meaning we couldn’t access your VM if we wanted to. It is an extra layer of security.

Once SSH access is shared, the person wanting to access your Vagrant environment uses vagrant connect to SSH in:

$ vagrant connect --ssh awful-squirrel-9454
Loading share 'awful-squirrel-9454'...
Password for the private key:
Executing SSH...

Welcome to Ubuntu 12.04.1 LTS

 * Documentation:
Last login: Wed Feb 26 08:38:55 2014 from

The name of the share and the password used to encrypt the private key must be communicated to the other person manually, as a security measure.

Vagrant Connect

Vagrant share can share any TCP/UDP connection, and is not restricted to only a single port. When you run vagrant share, Vagrant will share the entire Vagrant environment.

When the person you are sharing with runs vagrant connect SHARE-NAME, Vagrant will give this person a static IP they can use to access the machine as if it were on the local network:

$ vagrant connect awful-squirrel-9454
==> connect: Connecting to: awful-squirrel-9454
==> connect: Starting a VM for a static connect IP.
    connect: The machine is booted and ready!
==> connect: Connect is running!
==> connect: SOCKS address:
==> connect: Machine IP:
==> connect:
==> connect: Press Ctrl-C to stop connection.

Security Concerns

Sharing your Vagrant environment understandably raises a number of security issues.

With the launch of Vagrant 1.5, the primary security mechanism for Vagrant Share is security through obscurity along with an encryption key for SSH. Additionally, there are several configuration options made available to help control access and manage security:

  • --disable-http will not create a publicly accessible HTTP URL. When this is set, the only way to access the share is with vagrant connect.

  • --ssh-once will allow only one person to SSH into your shared environment. After the first SSH access, the keypair is physically deleted and SSH access won’t be possible anymore.

In addition to these options, there are other features we’ve built to help:

  • Vagrant share uses end-to-end TLS connections. So even unencrypted TCP streams are encrypted through the various proxies and only unencrypted during the final local communication between the local proxy and the Vagrant environment.

  • SSH keys are encrypted by default, using a password that is not transmitted to our servers or across the network at all.

  • SSH is not shared by default, it must explicitly be shared with the --ssh flag.

  • A web interface we’ve built shows share history and will show basic access logs in the future.

  • Share sessions expire after a short time (currently 1 hour), but can also be expired manually by ctrl-c from the sharing machine or via the web interface.

Most importantly, you must understand that by running vagrant share, you are making your Vagrant environment accessible by anyone who knows the share name. When share is not running, it is not accessible.

And, after Vagrant 1.5 is released, we will be expanding the security of this feature by adding ACLs, so you’re able to explicitly allow access to your share based on who is connecting.

For maximum security, we will allow you to run your own Vagrant Share server. We won’t be launching this right with Vagrant 1.5, but it will be an option shortly after that.

Technical Details

We’ve been demoing Vagrant Share around the world over the past month or so. The response has been overwhelmingly positive, but the first reaction from everyone is always: “How does this work?” In this section, we’ll briefly cover some technical details of the feature.

There are a lot of moving parts that make Vagrant Share work. Here is an overview of the primary components:

  • Local Proxy - This runs on the share host machine (not within the Vagrant environment). It connects to the remote proxy and proxies traffic to and from the Vagrant environment and the remote proxy. It is also responsible for registering new shares with the remote proxy.

  • Remote Proxy - This runs on a remote server on the internet. It creates shares and is connected to local proxies. It also handles all ACLs, security audit logs, SSH keys, and more.

  • Connect Proxy VM - When vagrant connect is called, Vagrant runs a very small proxy virtual machine (13 MB RAM-only!). This virtual machine exposes the static IP that the connecting person uses to access the share. Any traffic sent to this IP is routed to the remote proxy, which in turn routes down to the local proxy and the shared Vagrant environment.

The connection from the connect proxy to the remote proxy uses the standard SOCKS5 protocol. The connection between the remote proxy and the local proxy uses a modified variant to reduce the number of packets that must be sent for any given connection.

What’s Next?

Vagrant Share will ship with Vagrant 1.5. To use it, you’ll need an account in the yet to be announced web service.

At that time, we’ll publish further details about share, connect and the account required to use them.

Next week, we’ll cover another feature of Vagrant 1.5 — stay tuned.

Integrating D3 with CouchDB


A 4-part series by Mike Bostock describing various integrations paths of D3 and CouchDB:

  1. Part 1: saving a D3 app in CouchDB
  2. Part 2: storing D3 library in CouchDB and storing data in CouchDB
  3. Part 3: accessing CouchDB data from D3
  4. Part 4: data import

Original title and link: Integrating D3 with CouchDB (NoSQL database©myNoSQL)

Girls at Devopsdays (see what I did there?)


Ben Hughes (@benjammingh) of Etsy, visiting from SF to speak on security, tells me he is horrified by the pitifully small number of women at the conference. He says he thought New York was bad ‘til he saw London. I saw, maybe, six female persons here, among several hundred male ones. Personally I don’t notice this much - and if I don’t experience any direct discrimination or signs of overt sexism I tend to see people as people, regardless of gender. I don’t see me as part of a minority, so maybe I forget to see other women as such. Talking with Ben has made me think this isn’t too good.

As someone who’s done a fair bit of hiring, and never had a single female applicant I shrug, I say “what can I do? They aren’t applying”. But after talking with Ben I’m thinking that’s a lame cop out too. Something needs to be done and just because it isn’t my fault as an evil misogynist hirer, doesn’t mean I shouldn’t be trying to make improvements. I think it’s a slow burn. It probably requires a culture shift with girls moving nearer to the model of the classic teenage bedroom programmer, and boys moving further away from it, until they all meet in the middle in some kind of more social, sharing, cooperative way of being. Hackspaces for teenagers? With quotas? It probably also requires more, and more interesting, tech in schools; programming for all. Meanwhile, I’m going to start with my small daughters (and sons), and to set them a good example I’m going to upskill myself. I’m going to get around to learning Ruby, as I’ve been meaning to for years.

Oh but wait… did I say there was no overt sexism here at Devopsdays? So as I’m writing this there’s an ignite talk going on. It’s a humorous piece using Game of Thrones as a metaphor for IT company life. It starts ok - Joffrey, Ned, Robb and Tyrion get mentions as corporate characters, it rambles a bit but it gets some laughs, including from me. But then suddenly this happens…

The presenter shows a picture of Shae, saying “…and if you do it right you get a girl.” Oh, you GET a “girl” do you, Mister Presenter? That well known ownable commodity, the female human. Oh and apparently you get one who’s under eighteen… that’s a little, er, inappropriate, isn’t it?

And then, as if that wasn’t enough, the presenter shows a photo of Danaerys Targaryen, saying “And here’s a gratuitous picture of Danaerys”. Danaerys is gratuitous apparently! Well who knew! Not anyone who’s read or seen Game of Thrones. Oh and wait, it gets better - the guy now tells us, “if you do it really well you get two girls.”

So, firstly in the mind of this man, women are present in fiction and in IT only as prizes for successful men to win. Secondly women can reasonably be described as girls. Thirdly, he assumes that his audience is exclusively male (I was sitting in the front row - he’s got no excuse for that one).

So I spoke to the guy afterwards, and he didn’t see a problem with it at all, happily defending the piece, though he did finish by thanking me for the feedback. I guess the proof will be in whether he gives the talk again in a modified form.

Such a lost opportunity though - GoT is packed with powerful and interesting women, good, evil and morally ambiguous. He could so easily have picked half and half. It’s like he read it or watched it without even seeing the non-male protagonists.

Hmm, i just used the word guy… We say “guys” a lot round here, and i mostly assume it’s meant in a gender neutral inclusive way. That’s the way I take it, and use it. But now I’m wondering how personally contextualised is our understanding of terms like “guys”. Atalanta management once upset a female colleague by sending out a group email, beginning, “Chaps…”. She felt excluded, whereas we assumed everyone used the word as we do in a completely neutral way. But both terms of address are originally male-specific, so it should be no surprise that some people will understand them to be either deliberately or carelessly excluding.

I don’t know… This is the kind of small detail that tends to get written off as irrelevant, over-reacting, being “touchy”, political correctness gone mad, and so on. But in the absence of really overt sexism - leaving aside Game of Thrones guy’s show - maybe we do need to look to the small details to explain why tech still seems to be an unattractive environment for most women.

I nearly proposed an open space on this subject, but a few things held me back. Firstly, which I’m not proud of, I thought “what if nobody cares enough to come along? A) I’d feel a little silly, and b) I’d be so disappointed in my peers for not seeing it as important”. I should’ve been braver, taken the risk.

More importantly though, I think it’s really important that this isn’t seen as a “women’s issue”, to be debated and dealt with by women, but rather as a people’s issue, something that affects us all negatively, reflects badly on us; something we should all be responsible for fixing. And for me, the only female person in the room at the time, to propose this talk felt like ghettoising it as a women-only concern. I may be wrong. Maybe lots of people would’ve seized the chance to work towards some solutions. I wish Ben had been around.

So, I didn’t propose a talk. But what I did do was put on my organisers’ t-shirt and silly hat, so at least it was visible and clear that there are female persons involved in creating a tech conference in London. Even if there weren’t enough attending.

About the Author

Helena Nelson-Smith is CEO of Atalanta Systems, she’s also, by some people’s definition, a “girl”.

Random Thoughts on Community Mechanics Relative to Open Source Project Size


When you start an open source project, it’s yours, and it’s usually super fun.

There’s a direction set, and you probably will gather a degree of people (if you are lucky) that think similarly.   You feed off that energy.

As it grows, ideas mix.   You’re given access to some great sounding boards.

As it grows further though, the direction one person wants to take things begins to conflict with where it needs to go.   How do you communicate that you aren’t doing X, because there are 50 different more important things you can be doing and you don’t want to forever maintain and support the code for X?

Even in the first months, you quickly will transition from having a way to express creativity to helping a lot of users on the internet for free.   What do you do, and how to do you continue to keep it fun and interesting?

There’s a nice quote from Star Trek II though, that applies.  ”The needs of the many outweigh the needs of the few, or the one”.   This is the philosophy, I think, that works best in large projects.

A frequent example of complaint is the Linux kernel.   While it’s easy to say it is managed aggressively, there is a need for expedience.   If every interaction takes 30 minutes, there can only be 72 — realistically 36 — meaningful interactions in a given day.  And this is problematic — there’s a large class of folks that assume efficiency is being impolite, and it almost never is.  Reading between the lines is a danger on the internet, but it’s an unavoidable curse of the medium.

It becomes vitally important then, as a community grows that users help users and become self-sufficient.

A primary enabler of this is documentation, as well as frequent communication, for which things like a project mailing list can be a great way for people to understand the project.   (Though a recent trend in open source is for folks to just submit changes on GitHub prior to seeing if they are a good match — which works 50% of the time, and leads to confusion the other 50% — at worst it makes for a lot of expensive 1:1 vs 1:N conversations).

As a project grows though, you add new folks who don’t remember the goals of the project.   While technical documentation increases, there’s a greater need for philosophic documentation — why things are the way they are, the way the project thinks, etc.

Yet, there’s a cardinal flaw in this — by and large, documentation won’t be read.

A recent example of not understanding the goals of a project was a project called neovim, which seeks to rewrite much of vim — perhaps (IMHO) the world’s best editor.  Not only is the task likely not well estimated, it tries to do such to fulfill a fringe feature need that is not well aligned with the main project — it neglects the needs of vim’s userbase and the way the project has changed to deal with the scope and magnitude of what it now entails.

Another commonly cited example is “Innovator’s Dilemma”.   The “New Hotness” can arise every N years, burn the forest, and start anew, and this is very common in technology because as things grow in scope, the scope becomes a larger thing to maintain.

The way to enable agility in the face of innovator’s dilemma - and to not get sidetracked by serving fringe needs — is to occasionally say no, and well define the goals and purposes of a project.

The easiest track when faced with contributions that don’t understand the purpose of the project — or the current priorities — is to do everything that comes in.   Even telling every new idea about *why* you don’t want to do X can take a lot of time.  But it’s important that “no” is said more frequently than “yes”.

As a species, we’re not a very good Hive Mind.   We do well at recognizing patterns, but if you get 500 people to tell you what they want in a car or truck, you are going to get a very ugly car or truck that might only please 5 of those people.

When a project merges every single idea, without validation or thought, it grows to lack identity and cohesiveness, and soon grows to lack stability. 

To handle this, a project must continue to know what it is — and what it isn’t — and concentrate on it’s core use cases.   

In phases where it recognizes more time on X is needed, it must not spend time on Y, and must communicate that it is disinterested, at least for now, in Y.

And to be respectful of incoming change, it must communicate when a request comes in for Z, if it’s likely never going to be interested in Z, that it’s likely never going to be interested in Z.

This itself can be a source of conflict — especially as, many uses believe open source projects are more of a playground, when the reality is they are designed to solve specific problems.

The question that should go through the mind of a maintainer, every single second is, where can I apply the most effort now to do the most good.

"Hey look, a squirrel" is an easy phenomenon to get into.   Does the most recent request or question get the most attention?  

What you have to do is sample the entire swath of a userbase, understand all of it, and declare focus to solve the problems that affect the most people first, and avoid distractions.

So there are people that might say, poke fun at Linus Torvalds for the way he might reply to someone on a mailing list.  While I’ll never do that — I get it — he has a hell of a job.

What project leaders are are arbiters of change and priority — with a sufficiently large project — it’s impossible to service everything at the same time, and importantly — it’s important that a project not go in every direction at the same time.

On one end, you spread yourself too thin, and to an extreme, there’s being Drawn and Quartered.

I think Linus is doing a heck of a job.

Especially when there’s a chance for something to be misinterpreted, coming off clearly and without ambiguity is difficult.

It’s not a level of force I’d ever wish to see in most projects, but in the kernel, it may be warranted.  Like Spock said — the good of the many outweigh the needs of the few, or the one.

If trying to please everyone resorts in pleasing no one, identifing the use cases you want to solve, and solve them well, becomes important.

And if you’re not going to solve the other use cases, it’s important to let people know, so you don’t be caught up explaining them infinitely.

So yeah, the kernel list can be a fun place, but I understand why it is that way completely.  It’s building a great product, and you can’t argue with that success.

I guess what I’m saying here is that everything that has ever been written about how to run a good project changes drastically as that project scales — and there are very few people that have had that level of experience.   There’s an entirely different continuum and it feels altogether different.

While everyone would like a democracy, there haven’t been any in a very long time (even in Greece, not everyone could vote).   Representative republics (aka the USA) are notoriously inefficient.   Google “Twitch plays Pokemon” to see how that works most of the time.

Avoid turning your project into a free for all — “Wikipedia with pull requests”.  Don’t be afraid to be bold and define your direction well, and make sure you still have time to do the things that need to be done.   



Highlight on Collectors: Collector Patterns



Our Highlight on Collectors series will introduce and document time-series data collectors. Join us each month or so as we analyze a different collector in our ongoing effort to help you ensure that you’re using the best tools available.

Every monitoring system is born from a set of assumptions — assumptions that may ultimately impose functional limits on the system and what you can accomplish with it. This is perhaps the most important thing to understand about monitoring systems before you get started designing an infrastructure of your own: Monitoring systems become more functional as they become less featureful.

Some systems make assumptions about how you want to collect data. Some of the very first monitoring systems, for example, assumed that everyone would always monitor everything using SNMP. Other systems make assumptions about what you want to do with the data once it’s collected — that you would never want to share it with other systems, or that you want to store it all in teensy databases on the local filesystem.  Most monitoring systems present this dilemma, they each solve part of the monitoring problem very well, but wind up doing it in a way that saddles you with unwanted assumptions, like SNMP and thousands of teensy databases. 

What if we could extract the good parts from each monitoring system — as if each were a bag of jellybeans from which we could keep the pieces we like? This line of reasoning implies a new way of thinking about the monitoring problem: Instead of using a single monitoring system, we could combine a bunch of independent, special-purpose data collectors with a source-agnostic storage tier.  This strategy would enable us to use whatever combination of collectors makes sense, and then store the monitoring data together, where it can be visualized, analyzed, correlated, alerted on, and even multiplexed to multiple destinations. If this sounds like a good strategy to you, then you’ll be happy to know that there are already tons of special-purpose open-source data collectors available. 
To help get you started down the road of deciding on collection tools for your environment, we’re writing this post to document the patterns employed by the various data collection tools out there.  With so many great tools to collect system and application metrics, it’s impossible to document them all, but no matter what tools you choose, they will probably employ at least one of the design patterns detailed herein.

Collector Patterns for System Metrics

We start by dividing interesting metrics into two general categories: those that are derived from within the monitored process at runtime, and those that are gathered from outside the monitored process. There are four patterns generally used by external processes to collect availability and performance data.

The Centralized Polling Pattern

Anyone who has been working with monitoring systems for a while has used centralized pollers. They are the archetype design; the one that comes to mind first when someone utters the phrase “monitoring system” (though, that is beginning to change); it looks like this:


How it works

Like a grade-school teacher performing the morning roll-call, the centralized poller, is a monolithic program that is configured to periodically poll a number of remote systems, usually to ensure that they are available, but also to collect performance metrics. The poller is usually implemented as a single process on a single server, and usually attempts to make guarantees about the interval at which it polls each service (once every 5 minutes for example).

Because this design predates the notion of configuration management engines, centralized pollers are designed to minimize the amount of configuration required on the monitored hosts. They may rely on external connectivity tests, or they may remotely execute agent software on the hosts they poll, but in either case, their normal mode of operation is to periodically pull data directly from a population of monitored hosts.

Trade-offs and Gotchas

  • Centralized pollers are easy to implement but often difficult to scale.
  • Centralized pollers typically operate in “minute scale”, using for example, 1 or 5 minute polling intervals, and this limits the resolution at which they can collect performance metrics.
  • Older centralized pollers are likely to use agents with root-privileged shell access for scripting, communicate using insecure protocols, and have unwieldy (if any) failover options.

Examples of Pollers in the wild

Although classical centralized pollers like Nagios, Munin, and Cacti are numerous, very few Librato-compatible collectors employ centralized polling directly. Most shops interject a metrics aggregator like Statsd between the polling system and the metrics storage system. The rearviewmirror collector for Nagios, is one example of a pure direct poller, which can be run from cron, or scheduled by Nagios itself. 

The Stand-Alone Agent Pattern

Stand-alone agents have grown in popularity as configuration-management engines have become more commonplace. They are often coupled with centralized pollers or rollup-model systems to meet the needs of the environment. They look like this:


How it works

Agent software is installed and configured on every host that you want to monitor. Agents usually daemonize and run in the background, waking up on timed intervals to collect various performance and availability metrics. Because agents remain resident in memory and eschew the overhead of external connection setup and teardown for scheduling, they can collect metrics on the order of seconds or even microseconds. Some agents push status updates directly to external monitoring systems, and some maintain summary statistics which they present to pollers as-needed via a network socket.

Trade-offs and Gotchas

  • Agent configuration is difficult to manage without a CME, because every configuration change must be pushed to all applicable monitored hosts.
  • Although they are generally designed to be lightweight, they can introduce a non-trivial system load if incorrectly configured.
  • Be careful with closed-source agent software, which can introduce backdoors and stability problems.
  • Open source agents are generally preferred because their footprint, overhead, and security can be verified and tweaked if necessary. 

Examples of agents in the wild

Collectd is the most popular stand-alone agent with librato support. Sensu uses a combination of the agent and polling pattern, interjecting a message queue between them. It is Librato compatible via the Sensu Sender. 

The Roll-up Pattern

The Roll-up Pattern is often used to achieve scale in monitoring distributed systems and large machine clusters or to aggregate common metrics across many different sources. It can be used in combination with  agent software or instrumentation. It looks like this:


How it works

The roll-up pattern is a strategy to scale the monitoring infrastructure linearly with respect to the number of monitored systems. This is usually accomplished by co-opting the monitored machines themselves to spread the monitoring workload throughout the network. Usually, small groups of machines use an election protocol to choose a proximate, regional, collection host, and send all of their monitoring data to it, though sometimes the configuration is hard-coded.  

The elected host then summarizes and de-duplicates the data, and sends it up to another host elected from a larger region of summarizers. This host in turn summarizes and de-duplicates it and so forth.

Trade-offs and Gotchas

  • Roll-up systems scale well, but can be difficult to understand and implement.
  • Important stability and network-traffic considerations accompany the design of roll-up systems.

Roll-up in the wild

Ganglia is a popular monitoring project that combines stand-alone agents with the roll-up pattern to monitor massive clusters of hosts with fine-grain resolution. The Statsd daemon process can be used to implement roll-up systems to hand-off in-process metrics.

Logs as Event-Streams

System and event logs are a handy event stream from which to derive metric data. Many large shops have intricate centralized log processing infrastructure from which they feed many different types of monitoring, analytics, event correlation, and security software. If you’re a Platform-as-a-Service customer, the log-stream may be your only means to emit, collect and inspect metric data from your application.

How it works

Applications and Operating systems generate logs of important events by default. The first step in the log-stream pattern requires the installation or configuration of software on each monitored host that forwards all the logs off that host. Eventreporter for Windows or Rsyslogd on Unix are popular log forwarders. Many programming languages also have log generation and forwarding libraries, such as the popular Log4J java library. PaaS systems like Heroku have likely pre-configured the logging infrastructure for you.

Logs are generally forwarded to a central system like splunk, fluentd, or logstash for processing, indexing and storage, but in larger environments they might be map/reduced or processed by other fan-out style parallel processing engines. System logs are easily multiplexed to different destinations so there is a diverse collection of software available for processing logs for different purposes.

Trade-offs and Gotchas

  • Although many modern syslog daemons support TCP, the syslog protocol was originally designed to use UDP in the transport layer, which can be unreliable at scale.
  • Log data is usually emitted by the source in a timely fashion, but the intermediate processing systems can introduce some delivery latency.
  • Log data must be parsed, which can be a computationally expensive endeavor. Additional infrastructure may be required to process logging event streams as volume grows.

Log Snarfing in the wild

Our Heroku and AppHarbor add-ons make it trivial to extract metrics from your application  logs, and there are several other tools on our collectors page for DIY log snarfing systems like Logstash and Greylog.

Collection Patterns for In-Process Metrics

A radical departure from the patterns discussed thus-far, instrumentation libraries enable developers to embed monitoring into their applications, making them constantly emit a stream of performance and availability data at runtime. This is not debugging code, but a legitimate part of the program; expected to remain resident in the application in production. Because the instrumentation resides within the process it is monitoring it can gather statistics on things like thread count, memory buffer and cache sizes, and latency, which are difficult (in the absence of standard language support like JMX)  for external processes to inspect.

Instrumentation libraries make it easy to record interesting measurements inside an application by including a wealth of instrumentation primitives like counters gauges and timers. Many also include complex primitives like histograms, and percentiles, which facilitate a superb degree of performance visibility at runtime.

The applications in question are usually transaction-oriented; they process and queue requests from end-users or external peer processes to form larger distributed systems. It is critically important for such applications to communicate their performance metrics without interrupting, or otherwise introducing latency into their request cycle. Two patterns are normally employed to meet this need.

The Process Emitter Pattern

Process emitters attempt to immediately purge every metric via a non-blocking channel. They look like this:


How it works

The developer imports a language-specific metrics library, and calls an instrumentation function like time() or increment(), as appropriate for each metric they wish to emit. The instrumentation library is effectively a process-level stand-alone agent, that takes the metric and flushes it to a non blocking channel (usually a UDP socket, or a log stream). From there the metric is picked up by a system that employs one or more of the external-process patterns.

Process emitters in the wild

Statsd is a popular and widely used target for process emitters. The project maintains myriad language bindings to enable the developer to emit metrics from the application to a Statsd daemon process listening on a UDP socket. 

The Process Reporter Pattern

Process reporters use a non-blocking dedicated thread to store their metrics in an in-memory buffer. They either provide a concurrent interface for external processes to poll this buffer, or periodically flush the buffer to upstream channels.


How It works

The developer imports a language-specific metrics library, and calls an instrumentation function like time() or increment(), as appropriate for each metric they wish to emit. Rather than purging the metric immediately, process reporters hand the metric off to a dedicated, non-blocking thread that stores and sometimes processes summary statistics for each metric within the memory space of the monitored process. Process reporters can push their metrics on a timed interval to an external monitoring system or can export them on a known interface that can be polled on demand.

Process reporters in the wild

Process reporters are specific to the language in which they are implemented. Most popular languages have excellent metrics libraries that implement this pattern. Coda Hale Metrics for Java, Metriks for Ruby, and go-metrics are all in use at Librato.

Until Next time

We hope this article will help you identify the assumptions and patterns employed by the data collectors you choose to implement in your environment. Be sure to check back, as we continue this series with in-depth analysis on the individual collectors that use these patterns in the wild.



When we learned that the database outage was because the hosting company does automatic upgrades


by shrikeh