Moving from ‘Command and Control’ to ‘Trust but Verify’
Note from the Editor: Occasionally, we offer sound advice on business and technology from outside writers who don’t have a traditional mainframe background but certainly have knowledge worth sharing with the mainframe community. As silos disappear and the mainframe becomes more integrated into a cross-platform enterprise DevOps ecosystem, it’s important to share these different perspectives and apply their truths to the mainframe community.
In this post, we take a look at how you give developers the necessary freedom to experiment without letting them go off the rails by moving from a classic model of “command and control” to one of “trust but verify.” While some of the technical concepts may not seem directly applicable to mainframe IT, the underpinning theme is. I hope you can learn something from it.
– Mike Siemasz
Developers now have the freedom to experiment in a way that was previously impossible under older development models.
Container technology and distributed, ephemeral computing give companies the cost-effective flexibility they need in order to operate competitively in today’s business environment. The segmented, dynamic nature of these technologies allows innovative ideas to get to market at breakneck speeds. Modern continuous integration/continuous deployment frameworks make it so that a developer can have an idea on Monday and by Friday that idea is running as a full-fledged application feature in production.
But, this freedom comes with a price.
While companies enjoy the benefits that developer-driven innovation has realized, these enterprises have had to rethink the way they manage their development staff and the way their software is created. Companies have had to move from a managerial style of “command and control” to one that is based on “trust but verify.”
This transformation has not been without its challenges, but as you are about to read, the challenges can be overcome, and the rewards that come from making the change are considerable.
Understanding the Value of Trust But Verify
As strange as it may seem, one of the best examples for understanding the trust but verify paradigm is to look at the way governments collect income tax.
One belief held in taxation theory is that the best taxation system is based on voluntary compliance. The logic is as follows. Having a tax collector go door-to-door to assess and collect taxes is an acceptable taxation method in a small town of a hundred residents. But when you have a city of thousands, maybe hundreds of thousands of people, you’d need a legion of tax collectors to collect revenue. At the national level with millions of citizens, you’d need even more. The cost of all those tax collectors becomes an unsustainable expense.
Thus, it’s cheaper and easier just to have the taxpayer determine the taxes due and pay them on his or her own accord. In such a scenario, a collection method goes from one based on control to that of trust. However, trusting the taxpayer goes only so far. Some people will cheat. So, just to make sure everything is on the up-and-up taxpayers are subject to random audits by the government to verify that their tax filings are accurate and honest. The government trusts, but it also verifies.
Trust but Verify in IT
The analogy maps surprising well onto IT. In the old days of IT when you had system administrators who were responsible for a finite number of well-known systems, it was economically and technically feasible for the admin to control every aspect of machine provisioning and application deployment.
But today, with the growth of on-demand machine virtualization and continuous application deployment, a system administrator just can’t keep up with the volume of systems and applications that need to be brought online. Thus, more system and application deployment processes are being pushed back upstream to developers.
Part of this upstream push includes giving developers the freedom to provision the computing environments they need. The result is the growth of self-service provisioning in IT. Developers make the environments they need by using self-service provisioning tools provided by the system administrators and then write and test code in those environments.
Allowing developers to create environments requires that system admins trust that developers will do the right thing. But, as we saw with self-reporting taxation, trust goes only so far. A developer’s work needs to be verified. This is where monitoring comes in.
The Three Pillars of Verification
The well-being of any computing environment is only as good as the monitoring mechanisms that are observing its behavior. This is particularly true when it comes to working in a trust but verify paradigm. Continuous monitoring ensures that developers empowered to create and deploy computing environments are doing the right thing.
Yet, monitoring takes planning and effort. Monitoring well-known, slow changing environments is a challenging undertaking in itself. Monitoring distributed, ephemeral systems that are created on demand is even more difficult.
So, what’s to be done?
The first step for any IT department is to establish a very clear set of policies that define adequate monitoring exactly. After the policies are established, the next steps are to put the tools and procedures in place that support the established policies.
Leveraging industry standard policies such as those defined by PCI-DSS, GDPR and the various benchmarks published by Center for Internet Security provides a good foundation for policy definitions. Using existing guidelines avoids reinventing the wheel and thus saves time and money.
Once a set of common regulatory standards are accepted as the foundation for general policy guidelines, a company can define additional policy points that are specific to its industry and as its technical infrastructure. Defining monitoring policies that are appropriate to the given technical infrastructure is important, particularly when ephemeral computing is involved.
Taking a Holistic Approach to Monitoring Ephemeral Environments
Ephemeral computing is a relatively recent addition to the IT landscape. Before virtualization technology came along, creating new computing resources involved installing a physical computer on a rack in the data center and wiring it up to a network. This took time and it wasted resources. And, many companies never fully utilized all the hardware they owned.
The notion of infrastructure as code—which is the basis of ephemeral computing—allows computing resources to be created and destroyed on demand using either script or human interaction. While it’s true that physical hardware still needs to be installed in the data center, virtualization allows companies to load a lot more computing onto that hardware. The cost savings are considerable.
However, the dynamics that go with creating virtual resources creates a special set of problems in terms of monitoring. Not only do IT departments need to monitor the physical host machines, now they need to monitor their virtual assets as well. A holistic approach is required.
Companies need to know what’s going on inside, outside and among their virtual assets. In addition, when assets have a lifetime of only seconds, implementing effective monitoring becomes a matter of split-second complexity. The on-demand scripts that create the virtual resource not only need to create and configure the application hosted in the virtual resource, but they also need to install and configure monitoring agents and tools as well.
But, that’s not all. Once an ephemeral environment is brought online and monitoring tools and agents are properly installed, it needs to support distributed tracing.
Distributed Tracing Promotes Comprehensive Verification
Distributed tracing reports the system behavior associated with a request as it moves among servers and endpoints toward an eventual response. When a request arrives at an endpoint, it leaves a “thumbprint” usually by making an entry to the system log or logging service.
Monitoring technology aggregates these thumbprints into a report that describes the route that the request took as it hopped among a variety of endpoints, servers, and physical hosts on the internet. System admins use this route information to correlate the request with the system’s behavior overall.
In an ephemeral system, assets are being created and destroyed continuously. Thus, if something goes wrong, there might be no way to go back and check on the state of a particular asset. It very well might not exist.
Thus, distributed tracing is essential for understanding the behavior of ephemeral systems. The information that it provides is the source of truth required to get an accurate picture of the system at any point in time.
Comprehensive monitoring and distributed tracing are important parts of the trust but verify equation. But, one more piece is needed: automated testing.
Automated Testing Verifies Software Quality
When it comes to verifying that the code that destined for production works to expectation, automated testing is not a “nice to have,” it’s essential. These days, as the volume of code that needs to be released into production increases and as more provisioning and deployment activity is being pushed further upstream into the hands of developers, there is absolutely no way that any person or persons can manually perform the comprehensive testing required to make sure the code going to end users is safe and satisfactory. Except in rare cases of very complex, one-off testing scenarios, automated testing is the most viable option to make sure quality code gets out the door.
Without comprehensive automated testing in place, it’s quite possible for bad things to happen in good environments. Automated testing in all its forms—unit testing, functional testing, integration testing and performance testing—is an essential mechanism for verifying that developers are doing the right thing in terms of both code and environments.
Remember, taking a holistic approach to system monitoring requires having clear access to what’s going on outside, inside and among the various software components and virtual assets that are part of the distributed computing environment. Thus, it’s important to make automated testing integral to all parts of the software development process.
Continuous, automated testing in conjunction with comprehensive monitoring and distributed tracing are the three pillars upon which to build an approach to verification that works.
Putting It All Together: Building a Trusting Environment
Using the right monitoring, tracing and testing technologies in the right way is important for creating a trusting environment that gives developers the freedom they need to make new, innovative products. However, trust is about more than technology. It’s about a set of beliefs woven into a company’s culture that assumes employees want to do good work and cooperate with others.
Trust is a feeling that transcends written policies and procedures. It’s about corporate culture. Trust cannot be faked, at least not for a long period of time.
When trust becomes nothing more than a meaningless mantra and the true reality of a corporate belief system is that employees are implicitly untrustworthy and that the purpose of verification is to catch fundamentally irresponsible employees in the act of reckless behavior, then the notion of trust but verify will never be anything more than the butt of flavor-of-the-month jokes passed between disenchanted employees. However, when trust but verify is embraced and practiced as a set of benevolent corporate values, the results can be transformational.
Moving from command and control to trust but verify requires that companies not only put the right policies, procedures and tools in place to give IT personnel the freedom they need to do the best work possible, but also that the company truly embraces the cultural values that foster a trusting workplace. People work best when they are trusted to do the best work possible.
Latest posts by Bob Reselman (see all)
- Moving from ‘Command and Control’ to ‘Trust but Verify’ - February 21, 2019