Esquire Theme by Matthew Buchanan
Social icons by Tim van Damme

01

Sep

The Wolf

via randsinrepose.com

You’ve heard of the 10x engineer, but I am here to tell you about the Wolf. They are an engineer and they consistently exhibit the following characteristics:

  • They appear to exist outside of the well-defined process that we’ve defined to get things done, but the appear to suffer no consequences for not following these rules.
  • Everyone knows they’re the Wolf, but no one ever calls them the Wolf.
  • They have a manager, but no one really knows who it is.
  • They have a lot of meetings, but none of them are scheduled. Inviting them to your meeting is a crap shoot.
  • They understand how “the system” works, they understand how to use “the system” to their advantage, they understand why “the system” exists, but they think “the system” is a bit of a joke.
  • You can ask a Wolf to become a manager, but they’ll resist it. If you happen to convince them to do it, they will do a fine job, but they won’t stay in that role long. In fact, they’ll likely quit managing when you least expect it.
  • Lastly, and most importantly, the Wolf generates disproportionate value for the company with their unparalleled ability to identify and rapidly work on projects essential to the future of the company.

The Wolf moves fast because he or she is able to avoid the encumbering necessities of a group of people building at scale. This avoidance of most things process related combined with exceptional engineering ability allows them to move at speed which makes them unusually productive. It’s this productivity that the rest of the team can… smell. It’s this scent of pure productivity that allows them to further skirt documentation, meetings, and annual reviews.

It’s easy to hate the Wolf when you’ve just spent the day writing integration tests, but it’s also easy to admire the fact that they appear to be dictating their own terms.

In my career, I’ve had the pleasure of the working with a handful of Wolves. They appreciate that I have identified them as such and we have interesting ongoing conversations regarding their Wolf-i-ness. Two times now, I’ve attempted to reverse engineer engineering Wolves and then hold up the results to other engineers. See? Here is a well-defined non-manager very technical track. Both attempts have mostly failed. The reason was the same both times: the influence earned by the Wolf can never ever be granted by a manager.

The Wolf doesn’t really need me. In fact, the Wolf is reading this right now and grinning because he or she knows that I’ve done an ok job describing them – there is a chance this description may help inspire future Wolves, but what really matters… is what they’re working on right now.

Applying cardiac alarm management techniques to your on-call

via fractio.nl

If alarms are more often false than true, a culture emerges on the unit in that staff may delay response to alarms, especially when staff are engaged in other patient care activities, and more important critical alarms may be missed.

One of the most difficult challenges we face in the operations field right now is “alert fatigue”. Alert fatigue is a term the tech industry has borrowed from a similar term used in the medical industry, “alarm fatigue” - a phenomenon of people being so desensitised to the alarm noise from monitors that they fail to notice or react in time.

In an on-call scenario, I posit two main factors contribute to alert fatigue:

  • The accuracy of the alert.
  • The volume of alerts received by the operator.

Alert fatigue can manifest itself in many ways:

  • Operators delaying a response to an alert they’ve seen before because “it’ll clear itself”.
  • Impaired reasoning and creeping bias, due to physical or mental fatigue.
  • Poor decision making during incidents, due to an overload of alerts.

Earlier this year a story popped up about a Boston hospital that silenced alarms to improve the standard of care. It sounded counter-intuitive, but in the context of the alert fatigue problems we’re facing, I wanted to get a better understanding of what they actually did, and how we could potentially apply it to our domain.

The Study

When rolling out new cardiac telemetry monitoring equipment in 2008 to all adult inpatient clinical units at Boston Medical Center (BMC), a Telemetry Task Force (TTF) was convened to develop standards for patient monitoring. The TTF was a multidisciplinary team drawing people from senior management, cardiologists, physicians, nursing practitioners and directors, clinical instructors, and a quality and patient safety specialist.

BMC’s cardiac telemetry monitoring equipment provide configurable limit alarms (we know this as “thresholding”), with alarms for four levels: message, advisory, warning, crisis. These alarms can either be visual or auditory.

As part of the rollout, TTF members observed nursing staff responding to alarms from equipment configured with factory default settings. The TTF members observed that alarms were frequently ignored by nursing staff, but for a good reason - the alarms would self-reset and stop firing.

To frame this behaviour from an operations perspective, this is like a Nagios check passing a threshold for a CRITICAL alert to fire, the on-call team member receiving the alert, sitting on it for a few minutes, and the alert recovering all by itself.

When the nursing staff were questioned about this behaviour, they reported that more often than not the alarms self-reset, and answering every alarm pulled them away from looking after patients.

Fast forward 3 years, and in 2011 BMC started an Alarm Management Quality Improvement Project that experimented with multiple approaches to reducing alert fatigue:

  • Widen the acceptable thresholds for patient vitals so alarms would fire less often.
  • Eliminate all levels of alarms except “message” and “crisis”. Crisis alarms would emit an audible alert, while message history would build up on the unit’s screen for the next nurse to review.
  • Alarms that had the ability to self-reset (recover on their own) were disabled.
  • If false positives were detected, nursing staff were required to tune the alarms as they occurred.

The approaches were applied over the course of 6 weeks, with buy-in from all levels of staff, most importantly with nursing staff who were responding to the alarms.

Results from the study were clear:

  • The number of total audible alarms decreased by 89%. This should come as no surprise, given the alarms were tuned to not fire as often.
  • The number of code blues decreased by 50%. This indicates that the reduction of work from the elimination of constant alarms freed up nurses to provide more proactive care, and that lower priority alarms for precursor problems for code blues are more likely to be responded to.
  • The number of Rapid Response Team activations on the unit stayed constant. It’s reasonable to assert that the operational effectiveness of the unit was maintained even though alarms fired less often.
  • Anonymous surveys of nurses on the unit showed an increase in satisfaction with the level of noise on the unit, with night staff reporting they “kept going back to the central station to reassure themselves that the central station was working”. One anonymous comment stated “I feel so much less drained going home at the end of my shift”.

At the conclusion of the study, the nursing staff requested that the previous alarming defaults were not restored.

Analysis

The approach outlined in the study is pretty simple: change the default alarm thresholds so they don’t fire unless action must be taken, and give the operator the power to tune the alarms if the alarm is inaccurate.

Alerts should exist in two states: nothing is wrong, and the world is on fire.

But the elimination of alarms that have the ability to recover is a really surprising solution. Can we apply that to monitoring in an operations domain?

Two obvious methods to make this happen:

  • Remove checks that have the ability to self-recover.
  • Redesign checks so they can’t self-recover.

For redesigning checks, I’ve yet to encounter a check designed to not recover when thresholds are no longer exceeded. That would be a very surprising alerting behaviour to stumble upon in the wild, that most operators, myself included, would likely attribute to a bug in the check. Socially, a check redesign like that would break many fundamental assumptions operators have about their tools.

From a technical perspective, a non-recovering check would require the check having some sort of memory about its previous states and acknowledgements, or at least have the alerting mechanism do this. This approach is totally possible in the realm of more modern tools, but is not in any way commonplace.

Regardless of the problems above, I believe adopting this approach in an operations domain would be achievable and I would love to see data and stories from teams who try it.

As for removing checks, that’s actually pretty sane! The typical CPU/memory/disk utilisation alerts engineers receive can be handy diagnostics during outages, but in almost all modern environments they are terrible indicators for anomalous behaviour, let alone something you want to wake someone up about. If my site can take orders, why should I be woken up about a core being pegged on a server I’ve never heard of?

Looking deeper though, the point of removing alarms that self-recover is to eliminate the background noise of alarms that are ignorable. This ensures each and every alarm that fires actually requires action, is investigated, acted upon, or is tuned.

This is only possible if the volume of alerts is low enough, or there are enough people to distribute the load of responding to alerts. Ops teams that meet both of these criteria do exist, but they’re in the minority.

Another consideration is that checks for operations teams are cheap, but physical equipment for nurses is not. I can go and provision a couple of thousand new monitoring checks in a few minutes and have them alert me on my phone, and do all that without even leaving my couch. There’s capacity constraints on the telemetry monitoring in hospitals - budgets limit the number of potential alarms that can be deployed and thus fire, and a person physically needs to move and act on a check to silence it.

Also consider that hospitals are dealing with pets, not cattle. Each patient is a genuine snowflake, and the monitoring equipment has to be tuned for size, weight, health. We are extremely lucky in that most modern infrastructure is built from standard, similarly sized components. The approach outlined in this study may be more applicable to organisations who are still looking after pets.

There are constraints and variations in physical systems like hospitals that simply don’t apply to the technical systems we’re nurturing, but there is a commonality between the fields: thinking about the purpose of the alarm, and how people are expected to react to it firing, is an extremely important consideration when designing the interaction.

One interesting anecdote from the study was that extracting alarm data was a barrier to entry, as manufacturers often don’t provide mechanisms to easily extract data from their telemetry units. We have a natural advantage in operations in that we tend to own our monitoring systems end-to-end and can extract that data, or have access to APIs to easily gather the data.

The key takeaway the authors of the article make clear is this:

Review of actual alarm data, as well as observations regarding how nursing staff interact with cardiac monitor alarms, is necessary to craft meaningful quality alarm initiatives for decreasing the burden of audible alarms and clinical alarm fatigue.

Regardless of whether you think any of the methods employed above make sense in the field of operations, it’s difficult to argue against collecting and analysing alerting data.

The thing that excites me so much about this study is there is actual data to back the proposed techniques up! This is something we really lack in the field of operations, and it would be amazing to see more companies publish studies analysing different alert management techniques.

Finally, the authors lay out some recommendations for other institutions can use to improve alarm fatigue without requiring additional resources or technology.

To adapt them to the field of operations:

  • Establish a multidisciplinary alerting work group (dev, ops, management).
  • Extract and analyse alerting data from your monitoring system.
  • Eliminate alerts that are inactionable, or are likely to recover themselves.
  • Standardise default thresholds, but allow local variations to be made by people responding to the alerts.

29

Aug

Well planed, flawless execution

via devopsreactions.tumblr.com

by Johan

28

Aug

The Technology behind Hyperlapse from Instagram

via instagram-engineering.tumblr.com

Yesterday we released Hyperlapse from Instagram—a new app that lets you capture and share moving time lapse videos. Time lapse photography is a technique in which frames are played back at a much faster rate than that at which they’re captured. This allows you to experience a sunset in 15 seconds or see fog roll over hills like a stream of water flowing over rocks. Time lapses are mesmerizing to watch because they reveal patterns and motions in our daily lives that are otherwise invisible.

Hyperlapses are a special kind of time lapse where the camera is also moving. Capturing hyperlapses has traditionally been a laborious process that involves meticulous planning, a variety of camera mounts and professional video editing software. With Hyperlapse, our goal was to simplify this process. We landed on a single record button and a post-capture screen where you select the playback rate. To achieve fluid camera motion we incorporated a video stabilization algorithm called Cinema (which is already used in Video on Instagram) into Hyperlapse.

In this post, we’ll describe our stabilization algorithm and the engineering challenges that we encountered while trying to distill the complex process of moving time lapse photography into a simple and interactive user interface.

Cinema Stabilization

Video stabilization is instrumental in capturing beautiful fluid videos. In the movie industry, this is achieved by having the camera operator wear a harness that separates the motion of the camera from the motion of the operator’s body. Since we can’t expect Instagrammers to wear a body harness to capture the world’s moments, we instead developed Cinema, which uses the phone’s built-in gyroscope to measure and remove unwanted hand shake.

The diagram below shows the pipeline of the Cinema stabilization algorithm. We feed gyroscope samples and frames into the stabilizer and obtain a new set of camera orientations as output. These camera orientations correspond to a smooth “synthetic” camera motion with all the unwanted kinks and bumps removed.

These orientations are then fed into our video filtering pipeline shown below. Each input frame is then changed by the IGStabilizationFilter according to the desired synthetic camera orientation.

The video below shows how the Cinema algorithm changes the frames to counteract camera shake. The region inside the white outline is the visible area in the output video. Notice that the edges of the warped frames never cross the white outline. That’s because our stabilization algorithm computes the smoothest camera motion possible while also ensuring that a frame is never changed such that regions outside the frame become visible in the final video. Notice also that this means that we need to crop or zoom in in order to have a buffer around the visible area. This buffer allows us to move the frame to counteract handshake without introducing empty regions into the output video. More on this later.

The orientations are computed offline, while the stabilization filter is applied on the fly at 30 fps during video playback. We incorporated the filtering pipeline, called FilterKit, from Instagram, where we use it for all photo and video processing. FilterKit is built on top of OpenGL and is optimized for real-time performance. Most notably, FilterKit is the engine that drives our recently launched creative tools.

Hyperlapse Stabilization

In Hyperlapse, you can drag a slider to select the time lapse level after you’ve recorded a video. A time lapse level of 6x corresponds to picking every 6th frame in the input video and playing those frames back at 30 fps. The result is a video that is 6 times faster than the original.

We modified the Cinema algorithm to compute orientations only for the frames we keep. This means that the empty region constraint is only enforced for those frames. As a result, we are able to output a smooth camera motion even when the unstabilized input video becomes increasingly shaky at higher time lapse amounts. See the video below for an illustration.

Adaptive Zoom

As previously noted we need to zoom in to give ourselves room to counteract handshake without introducing empty regions into the output video (i.e. regions outside the input frame for which there is no pixel data). All digital video stabilization algorithms trade resolution for stability. However, Cinema picks the zoom intelligently based on the amount of shake in the recorded video. See the videos below for an illustration.

The video on the left has only a small amount of handshake because it was captured while standing still. In this case, we only zoom in slightly because we do not need a lot of room to counteract the small amount of camera shake. The video on the right was captured while walking. As a result, the camera is a lot more shaky. We zoom in more to give ourselves enough room to smooth out even the larger bumps. Since zooming in reduces the field of view, there is a tradeoff between effective resolution and the smoothness of the camera motion. Our adaptive zoom algorithm is fine-tuned to minimize camera shake while maximizing the effective resolution on a per-video basis. Since motion, such as a slow pan, becomes more rapid at higher time lapse levels (i.e. 12x), we compute the optimal zoom at each speedup factor.

Putting It All Together

“The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time.” –Tom Cargill, Bell Labs

Very early on in the development process of Hyperlapse, we decided that we wanted an interactive slider for selecting the level of time lapse. We wanted to provide instant feedback that encouraged experimentation and felt effortless, even when complex calculations were being performed under the hood. Every time you move the slider, we perform the following operations:

  1. We request frames from the decoder at the new playback rate
  2. We simultaneously kick off the Cinema stabilizer on a background thread to compute a new optimal zoom and a new set of orientations for the new zoom and time lapse amount.
  3. We continue to play the video while we wait for new stabilization data to come in. We use the orientations we computed at the previous time lapse amount along with spherical interpolation to output orientations for the frames we’re going to display.
  4. Once the new orientations come in from the stabilizer, we atomically swap them out with the old set of orientations.

We perform the above steps every time you scrub the slider without interrupting video playback or stalling the UI. The end result is an app that feels light and responsive. We can’t wait to see the creativity that Hyperlapse unlocks for our community now that you can capture a hyperlapse with the tap of a button.

By Alex Karpenko

Teaching Engineering As A Social Science

via www.kitchensoap.com

Below is a piece written by Edward Wenk, Jr., which originally appeared in PRlSM, the magazine for the American Society for Engineering Education (Publication Volume 6. No. 4. December 1996.)

While I think that there’s much more than what Wenk points to as ‘social science’ – I agree wholeheartedly with his ideas. I might even say that he didn’t go far enough in his recommendations.

Enjoy. :)

Edward Wenk, Jr.

Teaching Engineering as a Social Science

Today’s public engages in a love affair with technology, yet it consistently ignores the engineering at technology’s core. This paradox is reinforced by the relatively few engineers in leadership positions. Corporations, which used to have many engineers on their boards of directors, today are composed mainly of M.B.A.s and lawyers. Few engineers hold public office or even run for office. Engineers seldom break into headlines except when serious accidents are attributed to faulty design.

While there are many theories on this lack of visibility, from inadequate public relations to inadequate public schools, we may have overlooked the real problem: Perhaps people aren’t looking at engineers because engineers aren’t looking at people.

If engineering is to be practiced as a profession, and not just a technical craft, engineers must learn to harmonize natural sciences with human values and social organization. To do this we must begin to look at engineering as a social science and to teach, practice, and present engineering in this context.

To many in the profession, looking at teaching engineering as a social science is anathema. But consider the multiple and profound connections of engineering to people.

Technology in Everyday Life

The work of engineers touches almost everyone every day through food production, housing, transportation, communications, military security, energy supply, water supply, waste disposal, environmental management, health care, even education and entertainment. Technology is more than hardware and silicon chips.

In propelling change and altering our belief systems and culture, technology has joined religion, tradition, and family in the scope of its influence. Its enhancements of human muscle and human mind are self-evident. But technology is also a social amplifier. It stretches the range, volume, and speed of communications. It inflates appetites for consumer goods and creature comforts. It tends to concentrate wealth and power, and to increase the disparity of rich and poor. In the com- petition for scarce resources, it breeds conflicts.

In social psychological terms, it alters our perceptions of space. Events anywhere on the globe now have immediate repercussions everywhere, with a portfolio of tragedies that ignite feelings of helplessness. Technology has also skewed our perception of time, nourishing a desire for speed and instant gratification and ignoring longer-term impacts.

Engineering and Government

All technologies generate unintended consequences. Many are dangerous enough to life, health, property, and environment that the public has demanded protection by the government.

Although legitimate debates erupt on the size of government, its cardinal role is demonstrated in an election year when every faction seeks control. No wonder vested interests lobby aggressively and make political campaign contributions.

Whatever that struggle, engineers have generally opted out. Engineers tend to believe that the best government is the least government, which is consistent with goals of economy and efficiency that steer many engineering decisions without regard for social issues and consequences.

Problems at the Undergraduate Level

By both inclination and preparation, many engineers approach the real world as though it were uninhabited. Undergraduates who choose an engineering career often see it as escape from blue- collar family legacies by obtaining the social prestige that comes with belonging to a profession. Others love machines. Few, however, are attracted to engineering because of an interest in people or a commitment to public service. On the contrary, most are uncomfortable with the ambiguities human behavior, its absence of predictable cause and effect, its lack of control, and with the demands for direct encounters with the public.

Part of this discomfort originates in engineering departments, which are often isolated from arts, humanities, and social sciences classrooms by campus geography as well as by disparate bodies of scholarly knowledge and cultures. Although most engineering departments require students to take some nontechnical courses, students often select these on the basis of hearsay, academic ease, or course instruction, not in terms of preparation for life or for citizenship.

Faculty attitudes don’t help. Many faculty members enter teaching immediately after obtaining their doctorates, their intellect sharply honed by a research specialty. Then they continue in that groove because of standard academic reward systems for tenure and promotion. Many never enter a professional practice that entails the human equation.

We can’t expect instant changes in engineering education. A start, however, would be to recognize that engineering is more than manipulation of intricate signs and symbols. The social context is not someone else’s business. Adopting this mindset requires a change in attitudes. Consider these axioms:

  • Technology is not just hardware; it is a social process.
  • All technologies generate side effects that engineers should try to anticipate and to protect against.
  • The most strenuous challenge lies in synthesis of technical, social, economic, environmental, political, and legal processes.
  • For engineers to fulfill a noblesse oblige to society, the objectivity must not be defined by conditions of employment, as, for example, in dealing with tradeoffs by an employer of safety for cost.

In a complex, interdependent, and sometimes chaotic world, engineering practice must continue to excel in problem solving and creative synthesis. But today we should also emphasize social responsibility and commitment to social progress. With so many initiatives having potentially unintended consequences, engineers need to examine how to serve as counselors to the public in answering questions of “What if?” They would thus add sensitive, future-oriented guidance to the extraordinary power of technology to serve important social purposes.

In academic preparation, most engineering students miss exposure to the principles of social and economic justice and human rights, and to the importance of biological, emotional, and spiritual needs. They miss Shakespeare’s illumination of human nature – the lust for power and wealth and its corrosive effects on the psyche, and the role of character in shaping ethics that influence professional practice. And they miss models of moral vision to face future temptations.

Engineering’s social detachment is also marked by a lack of teaching about the safety margins that accommodate uncertainties in engineering theories, design assumptions, product use and abuse, and so on. These safety margins shape practice with social responsibility to minimize potential harm to people or property. Our students can learn important lessons from the history of safety margins, especially of failures, yet most use safety protocols without knowledge of that history and without an understanding of risk and its abatement. Can we expect a railroad systems designer obsessed with safety signals to understand that sleep deprivation is even more likely to cause accidents? No, not if the systems designer lacks knowledge of this relatively common problem.

Safety margins are a protection against some unintended consequences. Unless engineers appreciate human participation in technology and the role of human character in performance, they are unable to deal with demons that undermine the intended benefits.

Case Studies in Socio-Technology

Working for the legislative and executive branches of US. government since the 1950s, I have had a ringside seat from which to view many of the events and trends that come from the connections between engineering and people. Following are a few of those cases.

Submarine Design

The first nuclear submarine, USS Nautilus, was taken on its deep submergence trial February 28, I955. The subs’ power plant had been successfully tested in a full-scale mock-up and in a shallow dive, but the hull had not been subject to the intense hydrostatic pressure at operating depth. The hull was unprecedented in diameter, in materials, and in special joints connecting cylinders of different diameter. Although it was designed with complex shell theory and confirmed by laboratory tests of scale models, proof of performance was still necessary at sea.

During the trial, the sub was taken stepwise to its operating depth while evaluating strains. I had been responsible for the design equations, for the model tests, and for supervising the test at sea, so it was gratifying to find the hull performed as predicted.

While the nuclear power plant and novel hull were significant engineering achievements, the most important development occurred much earlier on the floor of the US. Congress. That was where the concept of nuclear propulsion was sold to a Congressional committee by Admiral Hyman Rickover, an electrical engineer. Previously rejected by a conservative Navy, passage of the proposal took an electrical engineer who understood how Constitutional power was shared and how to exercise the right of petition. By this initiative, Rickover opened the door to civilian nuclear power that accounts for 20 percent of our electrical generation, perhaps 50 percent in France. If he had failed, and if the Nautilus pressure hull had failed, nuclear power would have been set back by a decade.

Space Telecommunications

Immediately after the 1957 Soviet surprise of Sputnik, engineers and scientists recognized that global orbits required all nations to reserve special radio channels for telecommunications with spacecraft. Implementation required the sanctity of a treaty, preparation of which demanded more than the talents of radio specialists; it engaged politicians, space lawyers, and foreign policy analysts. As science and technology advisor to Congress, I evaluated the treaty draft for technical validity and for consistency with U.S. foreign policy.

The treaty recognized that the airwaves were a common property resource, and that the virtuosity of communications engineering was limited without an administrative protocol to safeguard integrity of transmissions. This case demonstrated that all technological systems have three major components — hardware or communications equipment; software or operating instructions (in terms of frequency assignments); and peopleware, the organizations that write and implement the instructions.

National Policy for the Oceans

Another case concerned a national priority to explore the oceans and to identify U.S. rights and responsibilities in the exploitation and conservation of ocean resources. This issue, surfacing in 1966, was driven by new technological capabilities for fishing, offshore oil development, mining of mineral nodules on the ocean floor, and maritime shipment of oil in supertankers that if spilled could contaminate valuable inshore waters. Also at issue was the safety of those who sailed and fished.

This issue had a significant history. During the late 1950s, the US. Government was downsizing oceanographic research that initially had been sponsored during World War II. This was done without strong objection, partly because marine issues lacked coherent policy or high-level policy leadership and strong constituent advocacy.

Oceanographers, however, wanting to sustain levels of research funding, prompted a study by the National Academy of Sciences (NAS), Using the reports findings, which documented the importance of oceanographic research, NAS lobbied Congress with great success, triggering a flurry of bills dramatized by such titles as “National Oceanographic Program.”

But what was overlooked was the ultimate purpose of such research to serve human needs and wants, to synchronize independent activities of major agencies, to encourage public/private partnerships, and to provide political leadership. During the 1960s, in the role of Congressional advisor, I proposed a broad “strategy and coordination machinery” centered in the Office of the President, the nation’s systems manager. The result was the Marine Resources and Engineering Development Act, passed by Congress and signed into law by President Johnson in 1966.

The shift in bill title reveals the transformation from ocean sciences to socially relevant technology, with engineering playing a key role. The legislation thus embraced the potential of marine resources and the steps for both development and protection. By emphasizing policy, ocean activities were elevated to a higher national priority.

Exxon Valdez

Just after midnight on March 24, 1989, the tanker Exxon Valdez, loaded with 50 million gallons of Alaska crude oil, fetched up on Bligh Reef in Prince William Sound and spilled its guts. For five hours, oil surged from the torn bottom at an incredible rate of 1,000 gallons per second. Attention quickly focused on the enormity of environmental damage and on blunders of the ship operators. The captain had a history of alcohol abuse, but was in his cabin at impact. There was much finger- pointing as people questioned how the accident could happen during a routine run on a clear night. Answers were sought by the National Transportation Safety Board and by a state of Alaska commission to which I was appointed. That blame game still continues in the courts.

The commission was instructed to clarify what happened, why, and how to keep it from happening again. But even the commission was not immune to the political blame game. While I wanted to look beyond the ship’s bridge and search for other, perhaps more systemic problems, the commission chair blocked me from raising those issues. Despite my repeated requests for time at the regularly scheduled sessions, I was not allowed to speak. The chair, a former official having tanker safety responsibilities in Alaska, had a different agenda and would only let the commission focus largely on cleanup rather than prevention. Fortunately, I did get to have my say by signing up as a witness and using that forum to express my views and concerns.

The Exxon Valdez proved to be an archetype of avoidable risk. Whatever the weakness in the engineered hardware, the accident was largely due to internal cultures of large corporations obsessed with the bottom line and determined to get their way, a U.S. Coast Guard vulnerable to political tampering and unable to realize its own ethic, a shipping system infected with a virus of tradition, and a cast of characters lulled into complacency that defeated efforts at prevention.

Lessons

These examples of technological delivery systems have unexpected commonalities. Space telecommunications and sea preservation and exploitation were well beyond the purview of just those engineers and scientists working on the projects; they involved national policy and required interaction between engineers, scientists, users, and policymakers. The Exxon Valdez disaster showed what happens when these groups do not work together. No matter how conscientious a ship designer is about safety, it is necessary to anticipate the weaknesses of fallibility and
the darker side of self-centered, short-term ambition.

Recommendations

Many will argue that the engineering curriculum is so overloaded that the only source of socio- technical enrichment is a fifth year. Assuming that step is unrealistic, what can we do?

  • The hodge podge of nonengineering courses could be structured to provide an integrated foundation in liberal arts.
  • Teaching at the upper division could be problem- rather than discipline-oriented, with examples from practice that integrate nontechnical parameters.
  • Teaching could employ the case method often used in law, architecture, and business.
  • Students could be encouraged to learn about the world around them by reading good newspapers and nonengineering journals.
  • Engineering students could be encouraged to join such extracurricular activities as debating or political clubs that engage students from across the campus.

As we strengthen engineering’s potential to contribute to society, we can market this attribute to women and minority students who often seek socially minded careers and believe that engineering is exclusively a technical pursuit.

For practitioners of the future, something radically new needs to be offered in schools of engineering. Otherwise, engineers will continue to be left out.

2028/1755

via www.flickr.com

june1777 posted a photo:

2028/1755



26

Aug

Announcing Scumblr and Sketchy - Search, Screenshot, and Reclaim the Internet

via techblog.netflix.com

Netflix is pleased to announce the open source release of two security-related web applications: Scumblr and Sketchy!


Scumbling The Web

Many security teams need to stay on the lookout for Internet-based discussions, posts, and other bits that may be of impact to the organizations they are protecting. These teams then take a variety of actions based on the nature of the findings discovered. Netflix’s security team has these same requirements, and today we’re releasing some of the tools that help us in these efforts.
Scumblr is a Ruby on Rails web application that allows searching the Internet for sites and content of interest. Scumblr includes a set of built-in libraries that allow creating searches for common sites like Google, Facebook, and Twitter. For other sites, it is easy to create plugins to perform targeted searches and return results. Once you have Scumblr setup, you can run the searches manually or automatically on a recurring basis.




Scumblr leverages a gem called Workflowable (which we are also open sourcing) that allows setting up flexible workflows that can be associated with search results. These workflows can be customized so that different types of results go through different workflow processes depending on how you want to action them. Workflowable also has a plug-in architecture that allows triggering custom automated actions at each step of the process.
Scumblr also integrates with Sketchy, which allows automatic screenshot generation of identified results to provide a snapshot-in-time of what a given page and result looked like when it was identified.


Architecture

Scumblr makes use of the following components :
  • Ruby on Rails 4.0.9
  • Backend database for storing results
  • Redis + Sidekiq for background tasks
  • Workflowable for workflow creation and management
  • Sketchy for screenshot capture
We’re shipping Scumblr with built-in search libraries for seven common services including Google, Twitter, and Facebook.


Getting Started with Scumblr and Workflowable

Scumblr and Workflowable are available now on the Netflix Open Source site. Detailed instructions on setup and configuration are available in the projects’ wiki pages.

Sketchy

One of the features we wanted to see in Scumblr was the ability to collect screenshots and text content from potentially malicious sites - this allows security analysts to preview Scumblr results without the risk of visiting the site directly. We wanted this collection system to be isolated from Scumblr and also resilient to sites that may perform malicious actions. We also decided it would be nice to build an API that we could use in other applications outside of Scumblr.   Although a variety of tools and frameworks exist for taking screenshots, we discovered a number of edge cases that made taking reliable screenshots difficult - capturing screenshots from AJAX-heavy sites, cut-off images with virtual X drivers, and SSL and compression issues in the PhantomJS driver for Selenium, to name a few. In order to solve these challenges, we decided to leverage the best possible tools and create an API framework that would allow for reliable, scalable, and easy to use screenshot and text scraping capabilities.  Sketchy to the rescue!

Architecture:

At a high level, Sketchy contains the following components:
  • Python + Flask to serve Sketchy
  • PhantomJS to take lazy captures of AJAX heavy sites
  • Celery to manage jobs and + Redis to schedule and store job results
  • Backend database to store capture records (by leveraging SQLAlchemy)

Sketchy Overview

Sketchy at its core provides a scalable task-based framework to capture screenshots, scrape page text, and save HTML through a simple to use API.  These captures can be stored locally or on an AWS S3 bucket.  Optionally, token auth can be configured and callbacks can be used if required. Sketchy uses PhantomJS with lazy-rendering to ensure AJAX-heavy sites are captured correctly. Sketchy also uses the Celery task management system, allowing users to scale Sketchy accordingly and manage time-intensive captures for large sites.

Getting Started with Sketchy

Sketchy is available now on the Netflix Open Source site and setup is straightforward.  In addition, we’ve also created a Docker for Sketchy for interested users. Please visit the Sketchy wiki for documentation on how to get started.  

Conclusion

Scumblr and Sketchy are helping the Netflix security team keep an eye on potential threats to our environment every day. We hope that the open source community can find new and interesting uses for the newest additions to the Netflix Open Source Software initiative. Scumblr, Sketchy, and the Workflowable gem are all available on our GitHub site now!


-Andy Hoernecke and Scott Behrens (Netflix Cloud Security Team)

The software development life cycle

via devopsreactions.tumblr.com

by starter-life

23

Aug

CoreOS Just Got Easier to Try With Panamax

via coreos.com

This is guest post by Lucas Carlson, Head of CenturyLink Labs

Here at CenturyLink Labs, we help people learn how to adopt new technologies like Docker and CoreOS into their daily lives. This has given us a unique perspective on the Docker ecosystem because we are trying to stay on top of one of the fastest growing open-source projects in history.

After talking to tens of thousands of developers and ops people, we kept hearing the same thing over and over:

CoreOS and Docker is the most transformative technology we have seen in years, but it is still really really hard to get started.

TL;DR: Try Deploying a CoreOS App on Panamax in Minutes

What is Panamax?

Instead of just blogging and podcasting tutorials and interviews (we do that too), we decided to create an open-source project that made the setup and app-creation process for Docker and CoreOS way way easier.

We call it Panamax–Docker Management for Humans.

Panamax starts with a CoreOS installation and adds a few Panamax containers into the CoreOS system that provide a great UX where you can search the entire Docker Hub for any container you want and pull it into your CoreOS system and stitch it together with other containers.

The Panamax containers don’t get in your way, in fact they just run fleet commands for you but you can always go back and run fleetctl yourself and get under the hood. We just wanted to connect the dots and setup best practices so that to get started you wouldn’t need to spend weeks getting up to speed on new technology.

App Template Repository

The thing we are most excited about with Panamax are the application templates. It is like a Fig specification that can run on top of a clustered CoreOS. Here are just a few we created to seed the community: https://github.com/CenturyLinkLabs/panamax-public-templates and people are building a bunch more right now for a contest we started here: https://github.com/CenturyLinkLabs/panamax-contest-templates.

There are application templates for setting up CoreOS-based apps in one-click for:

  • Heroku Buildpacks
  • Minecraft
  • WordPress
  • Ghost
  • Drupal
  • GitLab
  • Magento
  • Shippable
  • Ngrok

And a bunch more. Most of these templates are built with 12-factor micro-services in mind so they combine multiple containers for you automatically and create the connections for you.

Build Your Own Apps on CoreOS

Panamax isn’t just for deploying pre-baked applications, it is also a web gui to help you create your own apps. In fact, if you create an app and contribute it by making a pull request back to the template repository before Tuesday August 26th, 2014, you can have a chance to win one of 30 new Mac Pros or 30 iPad Airs that CenturyLink is giving away to kick-start the Panamax community.

What Are You Waiting For?

Panamax runs on any cloud that supports CoreOS: Amazon, Rackspace, Google, CenturyLink Cloud and even your laptop. The install process on your Mac to try it couldn’t be easier if you have Homebrew installed already:

brew install http://download.panamax.io/installer/brew/panamax.rb && panamax init

If you are on Linux, check out the installation instructions on GitHub.

If you haven’t used CoreOS yet because you have been intimidated or not sure you were ready to make the commitment yet, now’s the time to give it a try. Panamax will get you up and running with CoreOS is just minutes. Don’t take my word for it, try it yourself!

21

Aug

The End of Printed Newspaper

via medium.com

Clay Shirky on Medium:

Contrary to the contrived ignorance of media reporters, the future of the daily newspaper is one of the few certainties in the current landscape: Most of them are going away, in this decade. (If you work at a paper and you don’t know what’s happened to your own circulation or revenue in the last few years, now might be a good time to ask.) We’re late enough in the process that we can even predict the likely circumstance of its demise.

1-ZwkK_TvtEvAHjPd5lAfggw

#

18

Aug

Why product designs fail

via devopsreactions.tumblr.com

by uaiHebert

15

Aug

2135/1754

via www.flickr.com

june1777 posted a photo:

2135/1754



VM provisioning

via devopsreactions.tumblr.com

by Ino

14

Aug

CSV Fingerprint: Spot errors in your data at a glance

via flowingdata.com

CSV Fingerprint

You get your CSV file, snuggle under your blanket with a glass of fine wine, all ready for the perfect Saturday night. Then — what the heck — there’s a bunch of missing data and poorly formatted entries. Don’t let this happen to you. CSV Fingerprint by Victor Powell provides a simple, wideout view of your CSV file, color-coded for quick quality control.

To make it easier to spot mistakes, I’ve made a “CSV Fingerprint” viewer (named after the “Fashion Fingerprints” from The New York Times’s "Front Row to Fashion Week" interactive ). The idea is to provide a birdseye view of the file without too much distracting detail. The idea is similar to Tufte’s Image Quilts…a qualitative view, as opposed to a rendering of the data in the file themselves. In this sense, the CSV Fingerprint is a sort of meta visualization.

Try it with your own CSV data. Never let a subpar CSV file ruin your Saturday night again.

Tags: , ,

13

Aug

Running to office after getting an alert during the lunch break

via devopsreactions.tumblr.com

by Wojtek