Esquire Theme by Matthew Buchanan
Social icons by Tim van Damme

18

Jun

Adapting inventory for Ansible

via jpmens.net

Ansible uses a so-called “inventory” to determine the list of nodes and groups of nodes it can use. This inventory file defaults to /etc/ansible/hosts and typically looks something like this in INI file format:

[devservers]
a1
k4.ww.mens.de

[dbservers]
deb101 ntp=ntp1.example.net
sushi ansible_ssh_host=127.0.0.1 ansible_ssh_port=222

The above example defines two groups (devservers and dbservers) each with the specified host names, which need to be resolvable from the Ansible management system. If you cater for special configurations (e.g. Vagrant boxes on your workstation, or something behind a jump-host) you can use the ansible_ssh_* variables to define particular addresses and/or port numbers to use.

If you prefer to separate out, say, “production” and “development” systems, you can also have two distinct inventory files which you pass to ansible and ansible-playbook with the -i switch, or by setting $ANSIBLE_HOSTS to point to the respective file.

A less known fact is that the inventory can also be read from a program. Say you already have a configuration management database (CMDB) and wish to use that for driving Ansible, it’s pretty easy to do. While you can, for example, periodically dump that database into a “hosts” file for Ansible, you can also have Ansible query your database on the fly.

I accomplish this by setting $ANSIBLE_HOSTS to the full pathname of the executable program which will provide the inventory.

A small example

As a small example, consider the following SQLite table with the three columns id, type, and name:

sqlite> SELECT * FROM hosts;
1           webserver   www01
2           dbserver    pg01
3           dbserver    pg02
4           webserver   www02
5           testing     tiggr
6           testing     t1
7                       ldap

I want to massage the type column into a group for Ansible, whereby NULL types will be placed into a group with the exciting name “ungrouped”.

Let’s see some results.

$ export ANSIBLE_HOSTS=/etc/ansible/inventory/inv.py
$ ansible dbserver --list-hosts
    pg01
    pg02

The group name “dbserver” has been expanded and Ansible shows me the names of the two hosts it contains.

Ansible invokes the inventory script at least twice: once to find all groups and the hosts they contains, and once for each host. In other words, when Ansible starts doing something to the “dbserver” group, our inv.py program will be invoked like this:

inv.py --list
inv.py --host pg01
inv.py --host pg02

Our little program produces this JSON output from above database, when invoked with --list:

{
    "ungrouped": {
        "hosts": [
            "ldap"
        ]
    }, 
    "webserver": {
        "hosts": [
            "www01", 
            "www02"
        ]
    }, 
    "testing": {
        "hosts": [
            "t1", 
            "tiggr"
        ]
    }, 
    "local": [
        "127.0.0.1"
    ], 
    "dbserver": {
        "hosts": [
            "pg01", 
            "pg02"
        ]
    }
}

When Ansible calls the program to find variables for a particular host (i.e. inv.py --host pg01), the program will produce this JSON on output:

{
    "admin": "Jane Jolie", 
    "datacenter": 1
}

The inv.py program is simple enough:

#!/usr/bin/env python

import sqlite3
import sys
try:
    import json
except ImportError:
    import simplejson as json

dbname = '/etc/inv.db'

def grouplist(conn):

    inventory ={}

    # Add group for [local] (e.g. local_action). If needed,
    # set ansible_python_interpreter in host_vars/127.0.0.1
    inventory['local'] = [ '127.0.0.1' ]

    cur = conn.cursor()
    cur.execute("SELECT type, name FROM hosts ORDER BY 1, 2")

    for row in cur.fetchall():
        group = row['type']
        if group is None:
            group = 'ungrouped'
        
        # Add group with empty host list to inventory{} if necessary
        if not group in inventory:
            inventory[group] = {
                'hosts' : []
            }
        inventory[group]['hosts'].append(row['name'])

    cur.close()
    print json.dumps(inventory, indent=4)

def hostinfo(conn, name):

    vars = {}

    cur = conn.cursor()
    cur.execute("SELECT COUNT(*) FROM hosts WHERE name=?", (name, ))

    row = cur.fetchone()
    if row[0] == 0:
        print json.dumps({})
        sys.exit(0)

    # Inject some variables for all hosts
    vars = {
        'admin'         : 'Jane Jolie',
        'datacenter'    : 1
    }

    # Assuming you *know* that certain hosts need special vars
    # and you can't or don't want to use host_vars/ group_vars,
    # you could specify them here. For example, I *know* that
    # hosts with the word 'ldap' in them need a base DN

    if 'ldap' in name.lower():
        vars['baseDN'] = 'dc=mens,dc=de'


    print json.dumps(vars, indent=4)


if __name__ == '__main__':
    con = sqlite3.connect(dbname)
    con.row_factory=sqlite3.Row

    if len(sys.argv) == 2 and (sys.argv[1] == '--list'):
        grouplist(con)
    elif len(sys.argv) == 3 and (sys.argv[1] == '--host'):
        hostinfo(con, sys.argv[2])
    else:
        print "Usage: %s --list or --host <hostname>" % sys.argv[0]
        sys.exit(1)

    con.close()

As alluded to in the comments, I can create variables which Ansible will use in, say, templates. I could obtain these from our CMDB, pull them from files, etc. I could easily also create special groups by injecting appropriate JSON. For example, a customer of mine wanted groups defined based on the kind of hardware and its location; a few lines of Python did the trick.

Note, that due to the inventory program being invoked frequently, it will impose an additional load on your CMDB. (There is talk of modifying Ansible to not have that done, but it’s a work in progress.)

If necessary, you could employ some form of caching such as the ec2 inventory script uses.

Dynamic inventory and more vars

There are loads of places from which Ansible reads variables if you want them, and using inventory scripts, such as shown above, adds yet another source.

In addition to whichever variables my inventory script produces, Ansible will also populate host_vars and group_vars from the respective directories if these are placed in the directory containing my inv.py. For example:

./host_vars
./host_vars/ldap
./inv.py

When the node “ldap” is referenced (i.e. used), Ansible will populate additional variables from the host_vars/ldap file — a file in YAML format.

Mixing static and dynamic

In addition to inventory scripts, I can set $ANSIBLE_HOSTS to point to a directory. In this case, Ansible runs any executable files it finds therein and merges static inventory files (in INI file format) it finds into their output.

$ export ANSIBLE_HOSTS=/etc/ansible/inventory
$ ansible all --list-hosts
    127.0.0.1
    ldap
    manager
    pg01
    pg02
    t1
    t1.prox
    tiggr
    www01
    www02
    www1

You’ll notice there are more nodes shown than in the first JSON output above (www1 and manager, for example) — these come from additional inventory files in /etc/ansible/inventory/.

Most of this is well documented on the AnsibleWorks Web site. Nevertheless, I hope this was useful to you.

Ned Batchelder: 51 at MoMath

via nedbatchelder.com

For my birthday (today), we visited the Museum of Math in New York (yesterday). I’ve been looking forward to getting there since it opened six months ago, and my reluctant family had to accede to my birthday destination, so all five of us spent the afternoon.

Museum of Math

The museum is a fun place, with lots of interactive exhibits. Some were intriguing but baffling, like a polyhedron exploration device which looked great, but was impossible to control. We rode square-wheeled tricycles, made ourselves into fractal trees, explored cross-sections in the wall of fire, rolled weird shapes to make weird paths, looked at specular holography, and so on.

As is typical with these kinds of high-traffic interactive displays, a number of them were not working, which was disappointing. But overall, it was a lot of fun, and not the same feel as math class at all. One helpful museum worker kept popping up to tell us how to better use the exhibits, and Susan said, “if I had her as a math teacher, I might have learned something in high school!”

It was a great day. If you enjoy mathematical thinking, I heartily recommend the Museum of Math.

A few blocks north, we found the Museum of Sex, but decided not to go in with the kids….

Expecting a DDoS

via devopsreactions.tumblr.com

image by nosalb

15

Jun

DevOps tools applied in real life

via devopsreactions.tumblr.com

by @juan_domenech, Alex and others

14

Jun

Infrastructure as Code – A comprehensive overview

via www.jedi.be

I’ve been tracking infrastructure as code for a few years now. Over the years it has gotten closer to real code.

Close but no sigar yet…. We’ve come a long way but when you compare it to real languages it still feels in it’s infancy. In this updated overview I gave at the ABUG, I went through:

  • the basic concepts of infrastructure as code
  • the differences/concepts in the languages (chef, puppet, …)
  • the editors , syntax checkers, highlighting
  • integration with git version control
  • integration with CI systems
  • the different forms of testing (syntax, compile, unit, smoke testing)
  • using vagrant, veewee and the tools in that eco-system
  • debugging , profiling your code

This talk is probably the most comprehensive tool list that I’ve seen/made about the subject. But feel free to post and add your findings in the comments!

Note: that at the end of the presentation there are many extra links still to be sorted or slight outdated tools.

I’ve given previous versions of this talk at Devoxx 2012 and Jax2012. Enjoy the Jax2012 video here:

Just decreased response time by 100ms

via devopsreactions.tumblr.com

by kelly-dunn

12

Jun

P:R Approved: Francesco Francavilla’s Batman 1972!

via www.tencentticker.com

Note: Francesco Francavilla is one talented son of a gun, and he’s never short on ideas. Between projects like drawing Marvel’s Hawkeye, work for DC and his own Black Beetle series at Dark Horse, he also found time to work on a stunning Elseworlds pitch for DC setting Batman in the seedy, grindhouse days of the 1970s. – Chris A.

10

Jun

The 10 Deadly Sins Against Scalability

via highscalability.com

In the moral realm there may be 7 deadly sins, but scalability maven Sean Hull has come up Five More Things Deadly to Scalability that when added to his earlier 5 Things That are Toxic to Scalability, make for a numerologically satisfying 10 sins again scalability:

  1. Slow Disk I/O – RAID 5 â€“ Multi-tenant EBS. Use RAID 10, it provides  good protection along with good read and write performance. The design of RAID 5 means poor performance and long repair times on failure. On AWS consider Provisioned IOPS as a way around IO bottlenecks.
  2. Using the database for Queuing. The database may seem like the perfect place to keep work queues, but under load locking and scanning overhead kills performance. Use specialized products like RabbitMQ and SQS to remove this bottleneck.
  3. Using Database for full-text searching. Search seems like another perfect database feature. At scale search doesn’t perform well. Use specialized technologies like Solr or Sphinx.
  4. Insufficient Caching at all layers. Use memcache between your application and the database. Use a page like cache like Varnish between users and your webserver. Select proper caching options for your html assets.
  5. Too much technical debt. Rewrite problem code instead of continually paying a implementation tax for poorly written code. In the long run it pays off.
  6. Object Relational Mappers. Create complex queries that hard to optimize and tweak.
  7. Synchronous, Serial, Coupled or Locking Processes. Locks are like stop signs, traffic circles keep the traffic flowing. Row level locking is better than table level locking. Use async replication. Use eventual consistency for clusters.
  8. One Copy of Your Database. A single database server is a choke point.Create parallel databases and let a driver select between them.
  9. Having No Metrics. Visualize what’s happening to your system using one of the many monitoring packages.
  10. Lack of Feature Flags. Be able to turn off features via a flag so when a spike hits features can be turned off to reduce load.
image image

image

Nothing to hide

via flowingdata.com

With all the stuff going on with surveillance and data privacy — especially the past week — it’s worthwhile to revisit this essay by Daniel J. Solove, a professor of law at George Washington University, on why privacy matters even if you “have nothing to hide.”

“My life’s an open book,” people might say. “I’ve got nothing to hide.” But now the government has large dossiers of everyone’s activities, interests, reading habits, finances, and health. What if the government leaks the information to the public? What if the government mistakenly determines that based on your pattern of activities, you’re likely to engage in a criminal act? What if it denies you the right to fly? What if the government thinks your financial transactions look odd—even if you’ve done nothing wrong—and freezes your accounts? What if the government doesn’t protect your information with adequate security, and an identity thief obtains it and uses it to defraud you? Even if you have nothing to hide, the government can cause you a lot of harm.

“But the government doesn’t want to hurt me,” some might argue. In many cases, that’s true, but the government can also harm people inadvertently, due to errors or carelessness.

You might not have anything to hide right now, but maybe a random string of choices that were completely harmless looks a lot like something else a few years from now, to someone sniffing around the archives. The patterns when there are no patterns sort of thing. Personal data without the person. [via @hmason]

image

Simon: Python TCP socket performance tweak on Linux

via aboutsimon.com

Short

sockopt TCP_NODELAY=1 increases performance big time if you’re doing lots of small packets blocks of data with socket.IPPROTO_TCP.

Long

Over at abusix I started a project using IMAP. For connecting an IMAP server in Python there is basically only imaplib and a few high level libs which wrap around imaplib.

In my first tests importing our old Email storage, I started with a very small amount of 5000 Emails appending to the IMAP INBOX.

import imaplib
import os

imap = imaplib.IMAP4('192.168.0.1', 143)
(status, msg) = imap.login('mail', 'testpassword')

if status == 'OK':
    imap.create('Archive')

    dir = '/root/mails/'
    for f in os.listdir(dir):
        fd = open('%s%s' % (dir, f), 'rb')
        mail = fd.read(-1)
        fd.close()

        imap.append('INBOX', None, None, mail)

Importing 5000 mails by calling append for every single Email resulted in a run time of 210 seconds, which is 23.8 messages/sec. This is slow. I checked IMAP server configs, checked I/O and CPU load. All fine. To validate if the program is the issue or server configuration, I wrote the exact same script in Perl using Mail::IMAPClient. Running the Perl script with the same amount of data, on the same server, resulted in a run time of 7.9 seconds. Wtf? This is like 632 messages/sec, which is good and the kind of result I was aiming for using Python. So I checked the IMAP protocol calls generated by Perl and Python, to see if Perl is maybe using multi appends or something different, but their wasn’t any difference. So I thought, since the Email parser of Python is damn slow compared to the Perl parsers out there, too this is maybe bad protocol parsing or slow regex stuff again. I profiled the Python code to see which calls are slow.

me@dev:~# python -m cProfile migrate_imap.py
         742868 function calls (742690 primitive calls) in 210.908 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        [..]
        1    0.068    0.068  210.908  210.908 migrate_imap.py:3()
    21040    0.110    0.000  207.605    0.010 imaplib.py:1007(_get_line)
        [..]
     5260    0.030    0.000  209.033    0.040 imaplib.py:1068(_simple_command)
        [..]
    21040    0.046    0.000  207.390    0.010 imaplib.py:238(readline)
        [..]
     5256    0.033    0.000  210.169    0.040 imaplib.py:304(append)
        [..]
     5260    0.028    0.000  206.798    0.039 imaplib.py:892(_command_complete)
    21040    0.182    0.000  208.124    0.010 imaplib.py:909(_get_response)
     5260    0.034    0.000  206.748    0.039 imaplib.py:985(_get_tagged_response)
        [..]
    21040    0.238    0.000  207.343    0.010 socket.py:406(readline)
        [..]
    10517  206.906    0.020  206.906    0.020 {method 'recv' of '_socket.socket' objects}

(I deleted all the jitter and only left the important stuff in)

So basically socket.recv() is the problem. Means something is taking ages until data is received. With absolutely no clue I stumbled upon http://bugs.python.org/issue3766 the guy reporting this issue had basically the same problem like me.

So I decided to try out setting TCP_NODELAY to 1.

imap.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)

I rerun the Python script and WOW, the run time decreased to 14 seconds, not as good as Perl, but totally sufficient. So long story short, if you’are doing networking via Python sockets and sending/receiving a bigger amount of small data blocks you really should consider using TCP_NODELAY on client and server side! This can really boost your socket performance.

For further reading about TCP_NODELAY: http://www.techrepublic.com/article/tcpip-options-for-high-performance-data-transmission/1050878

Btw. I didn’t try TCP_CORK, yet.

08

Jun

Linux Performance Analysis and Tools

via dtrace.org

At the Southern California Linux Expo earlier this year (SCaLE 11x) I presented a talk on Linux Performance Analysis and Tools. It’s a great conference, and I was happy to be back.

My talk provided an overview of over twenty performance tools, and I described the problems they solve. At the end of the talk, I summarized some methodologies for using these tools, so that you know when to reach for what.

The video is on youtube:

The slides are available on slideshare and as a PDF:

These are also linked on the Joyent blog post about my talk, by Deirdré Straughan who filmed it and then spread the word afterwards.

I’ve used pretty much everything for solving performance issues, including advanced tools like perf, DTrace, and SystemTap, and I was able to explain their role and how they fit together (see slide 16 in particular). It was pretty dense: you can treat this as a 60 minute crash course into Linux performance analysis and tools.

Trying to refactor Perl code

via devopsreactions.tumblr.com

image

by Torsten

Hans Rosling explains population growth and climate change

via flowingdata.com

Because every day is a good day to listen to Hans Rosling talk numbers. In this short video, Rosling uses Lego bricks to explain population growth and the gaps in wealth and carbon footprint.

image

Dean Abbott Featured in "Popular Mechanics" On-LIne Article

via abbottanalytics.blogspot.com

Our own Dean Abbott has been consulted for an on-line Popular Mechanics article, “Why the NSA Wants All That Verizon Metadata” (Jun-06-2013), by Glenn Derene. Since the initial report connecting the NSA with Verizon, details have emerged suggesting similar large-scale information-gathering by the American government from other telecommunication and Internet companies.

Some applications of data mining to law enforcement and anti-terrorism problems have clearly been fruitful (for detection of money laundering, for instance, which is one source of funding for criminal and terrorist organizations). On the other hand, direct application of these techniques to plucking out the bad guys from large numbers of innocents strikes this author as dubious, and has long been criticized by experts, such as Bruce Schneier. What’s plain is that people in democratic societies must remain vigilant of the balance of information and power granted to their governments, lest the medicine become worse than the disease.

07

Jun

Most Reliable Cars

via www.coolinfographics.com

Most Reliable Cars infographic

Are you looking for a new car? The Most Reliable Cars infographic from MoneySupermarket rates how reliable the manufacturers are as well as specific car models. The lower the score, the more reliable the car is. If your current car isn’t on the list. Maybe it is time to get a new one.

It is never a pleasant experience to find yourself stranded next to a broken down vehicle at the side of the road, particularly during the winter. Breakdown cover can help to reduce the pain somewhat, but it is still worth making sure that you pick the most reliable car available.

MoneySupermarket.com has therefore teamed up with Warranty Direct to put together the following lists which highlight the most reliable cars on the road. This is decided upon by taking into account overall reliability and the average cost of repairs for these manufacturers and models – coming up with an overall Reliability Index (RI) score. Just for reference- the average RI is 100, and the lower the score the better.

We’ve broken this down by both car make and by individual vehicle models to come up with a definitive list which could prove invaluable to you during the car buying process.

This is a really good use of bar charts.  The company logos or car photos and the relevant data is built directly into the chart so there is no need for a chart legend.  Very easy to read and understand.

Thanks to Mark for sending in the link!