Chapter 11: Modern Testing II – Test Data Management

Chapter 11: Modern Testing II – Test Data Management

Send Us Feedback

Listen on Itunes

Listen to Stitcher

Matt: Hello there, welcome to the Modern Mainframe podcast and our series, Building a Better Software Delivery Platform. I’m Matt DeLaere. Thanks for joining us for the second part of our discussion of modern testing and how it can help you build a world-class software delivery system. To get the ball rolling, let’s say hi to the guy who makes it all possible, Compuware DevOps Solution Architect Rick Slade. Hey Rick, how are you?

Rick: I’m great, Matt, I hope you are. Hope everyone is.

Matt: I am, thanks. So, today, we’ll be joined by a couple of special guests who will share some of their expertise on test data management. But first, Rick, can you tell us a little more about modern testing techniques?

Rick: Well, it’s a topic that’s close to my heart. For years and years and years, testing has been a challenge, especially for large, very monolithic, very tightly coupled applications that typically are running on the mainframe. And so, testing continues to be a challenge. As organizations try to adopt Agile techniques with regard to delivering software, it’s going to require some rethinking of how we test and how we provision our test environments. And so, last week we talked with Steen and I was excited about that, and we talked about some of the testing capabilities that we’re making available to the marketplace and those capabilities. But another very important part around testing is the set-up and the provisioning of our test environments at a unit level, at an integration level, at a functional level, at a regression level, and managing that test data, and the right test data, is incredibly important in quality testing, but also maximizing the throughput or the velocity at which we test.

And so, I’m real excited to have Kevin Corbett and Irene Ford with us today. Kevin and Irene are the product managers for our… I call them data solutions. I’ll let them talk about the products in a second. But it’s so important in the software delivery process in building these test environments. Again, typically, when I talk to clients, people who have heard me speak, will know that I refer to this as the 800-pound gorilla in the room. When it comes to building a more automated modern software development environment and the incorporation of a test data management discipline, a framework within your organization, is so critical to automation success. So, I’m excited about having them. I’ve got tons of questions I want to run by them, so at this point in time, I want to introduce again Irene Ford, Kevin Corbett.

Irene, you first. Tell us a little bit about ourself and your role, and then Kevin, and then we will dive into it.

Irene: Okay. Thanks, Rick. So, I’m Irene Ford. I’m the Product Manager for our Topaz for Enterprise Data solution, and I’ve been with Compuware for 24 years now, it’s hard to believe, but 24 years.
And the scary part of this is, I’ve been in the field for a few decades beyond that, and we just will leave it at that.

Rick: Not a problem, I can relate..

Irene: So, our vision for Topaz for Enterprise Data is really to converge the capabilities for acting on mainframe and non-mainframe data into a simple-to-use modern interface for managing data across the enterprise. And Topaz for Enterprise Data is really the modernized interface for our File-AID back-end products. And most people in this field are familiar with the File-AID product. Kevin and I work closely on this integration as Kevin manages our mainframe side of the File-AID family, and I manage the Topaz for Enterprise Data side. So, with that I’ll turn it over to Kevin.

Kevin: Well, thank you, Irene and Rick. It’s great to be here. My name is Kevin Corbett, as you’ve already pointed out. I’ve been with Compuware for about 25 years, just a little bit longer than Irene, but we kind of grew up at Compuware together. I too, unlike Irene, I should say, I’ve been in the business for 41 years, which is amazing because I’m only 25. And as Irene said, I’ve been with the File-AID family my entire career here at Compuware. I started off as a developer for File-AID for IMS and held several roles in the development ranks, and then about five years ago as we began our transition, I was presented with an opportunity to move into the product management side, which I jumped at, thought that it was a very intriguing thing, so here I am today, enjoying it very much and just happy to be part of the podcast today.

I’m going to, kind of like Irene, I’m going to give a little bit of overview or introduction of what we offer here from Compuware. First, people think that test data management is important, but you know what, it’s not easy. It’s very, very difficult to do it right. So, with our solution, what we’ve attempted to do, and I think we’ve done it, is we’ve designed this to reduce the time spent on data management, which is absolutely critical in controlling IT costs. And by controlling those IT costs, we can also then enable our DevOps teams to deliver business value at a much faster rate, so that’s really critical today, right?

Our solution is extremely powerful, it covers cross-platform, cross-environment, does just about everything you need it to do from an end-to-end perspective, which is important in our current environments today. We are able to subset data, we’re able to edit data of all different data sources, so IAM, VSAM, DB2, IMS, Oracle, DB2 LUW, so we span mainframe and distributed platforms. It allows developers to quickly and easily find and create and extract and transfer, converge, load, and you can see from my words here, that theres a lot of functionality that’s included. But maybe one of the most important is the ability to protect data, and that’s through our data privacy feature, which enables the creation of very meaningful, manageable, disguised test data with minimal expertise. “Meaningful” meaning that a name becomes a name, but it’s not a real person’s name, you can’t associate that with anybody and identify them, so we protect the test data as we move forward into our test environments. So, that’s kind of an overview of what our solution provides, a lot of stuff. I’m sure you’re going to touch on a little bit of it.

Irene: Hey Kevin…

Kevin: Yes, Irene?

Irene: Hey, Kevin, it sounds like it’s the Swiss Army Knife of data, doesn’t it?

Kevin: It does kind of sound that way, doesn’t it?

Rick: Well, and it is, and it’s all critically important guys, and again, thank you for joining us. I’ve got tons of questions, so I’d like to just jump right into this thing, if we could do, and I’ll just let you guys decide who will take the question. But the first question is, what do you see as the primary challenges in test data management today?

Irene: Wow, Rick, there is no shortage of questions and challenges here. When customers are looking at their data management requirements and really looking to select tools to help them with their data management, there’s a lot of things they need to consider. First of all, you have to think about how will you get the right data to test your applications, and that’s not an easy question because most applications are going to require a related set of data, hardly ever do you need just a single file.

So, you need to be able to extract data that is corresponding from many different data sources, and we are finding that more and more applications need data that is mainframe and also distributed data to be able to process that application end-to-end. So, it’s important to have tools that can operate against all of those data sources. And something Kevin mentioned there is data breaches, and in this day and age of data breaches, it is so absolutely critical that data is disguised whenever it’s used in a non-production environment. So, the requirements for privatizing data may vary a little bit from environment to environment, and the requirements for data, what conditions are needed and what volumes of data are needed, will vary a little from environment to environment. But overall, there are a lot of challenges that our customers face as they step down the path to address data management and data management across their enterprise.

Rick: Yeah, I think that one of the things that I have seen, Irene and Kevin, in helping clients transform how they deliver software is changing the mindset around testing. Moving to an agile environment, working on tasks and stories that can be completed in a two-week sprint is going to force us to rethink how we test. And it’s not just about, as you just said, it’s not just about getting the right amount of data, but getting the right data in order to properly test the modifications that you’ve made or the additions to the code that you’ve made. So, being able to easily be able to extract that data quickly is so important in a fast-moving Agile software delivery environment. I couldn’t agree more.

Irene: Absolutely, and being able to automate the loading of that data so that it is in sync with the testing that you’re intending to do is survival in this world.

Rick: Yeah, from a DevOps standpoint, what is… Let’s talk about the role of data in a DevOps-centric software delivery ecosystem.

Kevin: That’s really kind of… Irene and I, some time ago now, kind of coined the phrase that data is really like the life blood of the entire DevOps infinity loop. It flows throughout the process, right? At all aspects of that infinity loop, you need to have data. Even during the analysis phase, it’s important to have data so that you understand what it is this application needs to do and what it needs to act on. So, data is really the life blood, and I thought that was a great term, Irene actually first used it in another webinar we were doing some time ago, but it’s really appropriate.

Rick: I’m going to leverage that now.

Irene: You have to pay us royalties.

Kevin: Yes, royalties, please. But, as Irene alluded to, and I think I even said it in my opening there, is that applications today are extremely complex, we go end to end, so I start… I look at my bank account all the time, I’m sure everybody does. So, I start that off on my mobile device, but then it travels all through the system, but ultimately it winds up on a mainframe someplace and does a transaction to say, “Hey, Kevin wants to see is balance.” I’m not going to tell you what it is, but it does send it back to my phone, so that’s really nice. But it’s really important, then, that all aspects of that are… we have the ability to test along the way, so at every different facet, it’s really important.

And Irene talked about automated testing—it’s absolutely critical. An Agile team today can’t keep up if we don’t do automated testing. And even here at Compuware, our own development teams, when they’re developing through their sprint and at the end of their sprints and so on, as they’re checking code in, it’s being tested. There’s different unit tests going on, there’s regression vehicles that kick off and all of these kinds of things that are really important. But in order for those things to work, you need to have the right data there. So, as Irene was talking about, you’ve got to get the right data, and again, I mentioned it, it’s very difficult to do that. I think our tools can really help make that a little bit easier for customers as they’re working through their flows.

Rick: So, then how does good data management actually support those DevOps objectives? Because I agree with you totally.

Irene: Well, quality is absolutely critical, right? We depend on these applications, our customers depend on the applications to be fully tested and the quality has to improve. So, it may seem obvious, but to improve quality, you’ve got to build applications that have fewer defects. Wow, insight, huh? But to do that, developers have to be able to quickly and easily get the data they need for testing, and that data has to be realistic, it has to really mirror the conditions that exist in the real production data. So, the data you’re using has to be relationally intact, and it should just represent a subset of your real data. It is almost impossible to test with full volumes of data, so to improve your quality and move forward in a rapid pace, you really have to have a subset of that data.

So, as we were saying earlier, our applications today, really do operate across platforms, so that data needs to be able to be consistent between data sources, even though the data is stored on different platforms. And you have to be able to continuously enhance that test data to build in conditions that may have been omitted when you first took the slice of data. You have to be able to build out that test data.

And velocity is another key focus these days, and quality and velocity are really closely related. Customers are continuously striving to find that balance between velocity and quality. And unfortunately, increasing velocity can easily cause your quality to suffer. So, customers need to be able to achieve greater throughput with higher quality just to meet the demands of business today. There are more demands being put on our IT shops than ever before, and all of this increased quality, increased velocity has to be accomplished even though our customers are often losing staff on a regular basis. And as we said earlier, automated testing is really critical to that, improving quality and increasing velocity. But your tests are only as good as the data upon which the tests are based. I think we’ve all seen application failures in production that are caused by anomalies in the data, so you have to make sure that you have good solid data for testing.

The efficiency is also one of our objectives with DevOps, and we think that providing a consistent approach for working with data really helps to reduce the cost of data management and increases the efficiency of the development team. Having a simple-to-use modern interface that allows you to manage data across the enterprise really makes it easier for developers to shift that testing left and have it take place earlier in the whole development cycle.

And then privacy, we can’t forget. Privacy is always a requirement. It’s not an option to mask your data, it is the law, and given the importance of protecting personal identifying information, as well as the need to abide by mandates like GDPR, means that most customers will make data masking a top concern. And data masking has to apply to data that is used in development and in testing, and this is especially important if these functions are outsourced.

So, regulations vary from country to country, but all countries require that the data which could identify in an individual must be masked. And then in addition to that, any data that might be damaging to the business if it were exposed has to be masked. So, data privacy is really a very key part of a broader data management strategy. That’s all I have.

Rick: No problem. That’s awesome. The work that you guys are doing from a management standpoint with regard to data is just amazing. Talk to me a little bit about how we get good data for testing.

Kevin: I’ll take this one. We’ve worked with a lot of customers in the couple of years that Irene and I have been here at Compuware, and really what they’ve told us and what they have confirmed is that obtaining test data from their production environments is the best way of getting that realistic set of data. And, as Irene pointed out, the related set of data. So, it’s really important then to have tools that can work both in a production environment and a test environment or development environment. So, that’s what we really are pushing: is use your production data because it has the right test conditions, it has the relationships, and unless you’re adding some new line of business or something, your production environment is going to have all of the scenarios you need. You simply need to subset it the right way, disguise it using data privacy as Irene just talked about, and then load that into your gold copy of your test environment.

There’s this notion going around today that synthetic data is the way to go. I’m going to build out a model and it’s just going to generate all the test data. And I think that works fine, if you’re talking about maybe a single file or maybe a couple of files or databases that are going to be used, but when you’re talking about a complex business application with perhaps hundreds or thousands of relationships, it just doesn’t work. It just doesn’t work. So, getting that data from production is what we recommend, and that’s what most of our customers are doing, so it’s really a good thing. And then if you need to, you can augment that with our toolset, as well to add that new business line if you need to, or whatever the case is.

So, you established that data management group, generally, it’s not the developer who’s doing this, it’s a data management group that takes the business requirements from whatever the application team is, and then they build out that gold copy test environment than I was talking about, and then developers can pull from that to get just the test cases they need to run through their particular program or series of programs.

Rick: Are there any… This has just been wonderful, guys, these are all things that are so important with regard to modernizing our testing to coincide with a modern software delivery ecosystem. Are there any other additional considerations when managing these test beds?

Kevin: Yes, there are. First of all, they have to be very effective, right? You can’t just take data and throw it there and say, “Okay, here’s the data you need.” They need to go out and really just beyond that and just understand what the application flow is, what the business logic is. This is where it gets hard, because if you’ve got a new developer coming in, they don’t know all of this. This is where that specialized group comes in to build out your test bed and manage it. Because it’s created, but then it’s like any kind of a living thing. You need to make sure it gets refreshed on some kind of a basis. You need to make sure that new cases are added as business requirements change and those kinds of things.

We’ve got two other tools that aren’t part of our test data management, but program analysis functionality that you can use in Topaz is great for identifying the types of data that you’re going to need to bring into your test environment, and that’s helping set up the requirements of how do I build out that test bed in the first place.

So, understanding all the data conditions needed by the application to facilitate the selection and creation of the data is very important. Automating that creation is also important, which again, with our tools, you can do. Because that’s really going to make… help and assist in you going faster. If I can get that test bed refreshed in a much more timely manner because it’s automated, that allows us to continually move faster.

Again, here at Compuware for the mainframe perspective, we use Hiperstation to do a lot of that automation and run our tests, our regression vehicles, as we call them. And the Agile teams do something very similar, they use a different tool, but we’re constantly refreshing our data, our test bed, we’re running our tests, refreshing them again, running our tests, and again, that goes a long way.

And then of course, we’ve kind of touched on it, you mentioned it, Topaz for Total Test and our recent integration with Topaz for Enterprise Data is really, really… It’s coming to a point now where we can actually talk about data and testing and have people understand what we’re talking about. In the past, it seems like whenever we would talk about data, you’d kind of get a blank look like, “Wow, that’s an awful lot of stuff.” But now, when we actually bring it into a real life scenario of how we add it to a test scenario and then automate that whole process, people start to say, “Oh, this is starting to make sense.”

Rick: Yeah. Yeah.

Irene: So, I wanted to comment a little bit on our testing. I’m on the distributed side of the fence, on the Eclipse side. We really view testing as a team sport, and the quality of our product is really owned by the entire team. So, every single morning our team gets an email telling us the status of our nightly regression run, and then you can click on that email and get all the detail of the Jenkins executions from that previous night. And then we also stream those test results into Kibana so that we can have some really nice dashboards on viewing the status of our testing. And that team focus, that focus that it is not the responsibility of QA people, it is the responsibility of the entire Agile team to make sure that our tests run and run dependably because we depend on those tests. And it’s those tests running dependably that allow our team to move more rapidly.

Kevin: And to kind of piggyback on what Irene said, so that everyone that’s listening knows. As she said, she’s on the distributed side, I’m the mainframe side—you know, those dinosaurs, right? We have exactly the same reporting from Jenkins and into Kibana, and the team does the same thing. Every morning, they get an email if there’s failures and they get looking at them, we get them corrected very quickly, and things move very smoothly. So, the two teams, even though the technologies are totally different from a programming perspective and a platform perspective, use basically the same type of tools to orchestrate that testing and communication of the results. Very, very, very important in a customer’s application development environments.

Rick: A lot of capabilities that you guys are making part of the product are actually coming from our own experiences.

Irene: Exactly, absolutely. In fact, I’d like to highlight the integration that we delivered in July of this year. It’s really an integration between our Topaz for Total Test product and Topaz for Enterprise Data, and this is so important to the automated test message because this allows the loading of test data to be integrated into the test scenario, so that the data needed for that specific test will be loaded at the time when that test scenario is run. And this is just a huge step forward in tightly tying together the test and the data.

Rick: I love that, and we mentioned it briefly last week in our conversation with Steen, but what that will do is just, it will shorten that provisioning and setup time in an Agile ecosystem. And not only shorten it, but based on the work that you’re doing, ensure that you’re getting the right amount of data and more importantly, the right data.

Irene: Exactly. For that test.

Rick: For that test. That’s awesome. Let’s see, we’re running a little short on time, guys, but I do have two questions I’d like to get your thoughts on, very quickly. Protecting data is such a huge issue now with all the privacy requirements that are in place across the planet, what are the challenges in protecting data and how are we helping customers with that?

Irene: Well, this is in my ballpark, since data privacy is my baby. So, we did a survey some time ago and found that there were really a couple of interesting results. We found that production data is being used in test, and more often than not, it is not being privatized.

Rick: Oh, say it’s not true, Irene!

Irene: Oh yeah, it was scaringly true.

Rick: I’m kidding. I’ve seen it, yeah.

Irene: Yeah, we found that over 80% of businesses are using production data in test, and slightly over 40% were not disguising that data. Now, as a consumer, that is a very scary thought. So, that’s the number one risk we have here. But really the second challenge with privacy is that disguising test data is not easy because data is complex. You’ve got data that’s stored in a variety of platforms and lots of different formats and applications need data from multiple platforms, so you’ve got data and DB2 tables, and some in Oracle databases, some in IMS segments, and all that data has to be disguised consistently or the application will fail. So, disguising data has to produce data that abides by the rules of the application, it has to be consistent across platforms and across data types, and it has to be able to disguise data that’s in very complex data structures. We have customers who have files that have a whole variety of different types of records all stored within the same file. You have to be able to pull out the identifying information, any sensitive data, and disguise it consistently across all of those platforms. And our data privacy solution runs across all of the File-AID family and provides that consistent disguise across data regardless of what platform where that data resides.

Rick: It is a difficult thing to do because if you think about it, I call it logical referential integrity—I’m not even sure that’s an accurate term—but it’s where you’ve got keys in database records that logically point to some other record in another dataset ,being able to maintain that referential integrity between the two sets is so critical to managing buildings supporting your test activities.

Irene: Absolutely, yeah. The relationship, the related set of data has to remain intact after the disguise has taken place.

Rick: Guys, I’ve got just tons of more questions. I wish I had a ton more time. I want to wrap up with one final question. What should clients look for in selecting data management tools. What are the things, what are the key capabilities, features that are a must in today’s modern software delivery ecosystem?

Irene: Well, I kind of look at three key things. First, you need to simplify data management; second, we need to make sure that our data is protected; and third, we need to have tools that will operate across multiple environments. And I could elaborate more on those, but in the interest of time, we should probably let it go with that.

Kevin: I think, to steal a phrase from Irene earlier, I would say a Swiss Army Knife, and Compuware has it.

Rick: And it needs to be all integrated and easily accessible. In a modern IDE that provides these capabilities, having to go in and out of different tools, different IDEs to accomplish that work, because it’s got to be done, it just slows the process down, and so being able to have those capabilities, features available to the developer at the time they need them, in a consistent user interface is so important to maximizing velocity.

Irene and Kevin: Absolutely.

Irene: We have to make our developers self-sufficient so that they can move as fast as possible.

Rick: Well, guys, I can’t… Time is up, and I can’t thank you enough for all of this insight. This is incredibly important and critical to those organizations thinking about modernizing how they deliver software, testing, test data provisioning, standing up test environments. A big, big part of the software delivery life cycle, and we have to look at the work involved in each of those to find opportunities to maximize throughput. And I think the things that you guys are doing within Compuware data management technologies are phenomenal.

So, I thank you for coming on. Thank you. We appreciate this time, we appreciate the information. Matt, I’ll turn it back over to you to kind of wrap us up.

Matt: Thanks, Rick. And thanks to Irene and Kevin for joining us. It was a great conversation, great job. And we’ll keep the thank-yous going and extend one to everyone out there listening. We appreciate it.

So, our next Office Hour with Rick Slade Q&A session will take place next Friday, July 24th at 10 AM Eastern Daylight Time. You can visit to set a calendar reminder for that session. And be sure to bring any questions you have on modern testing or any other aspect of the modern SDS. You can also submit questions on Twitter using the hashtag #goodcoding.

And while you’re on setting your calendar reminder, be sure to check out recordings of past Q&A sessions, and you can listen to any of the episodes of this series that you may have missed.

So, Rick, that wraps it up for me, I’ll throw it back to you to close us out.

Rick: Alright, well, thank you very much, Matt. And again, thanks to Irene and Kevin, your insights into this are amazing. We appreciate this. I appreciate it. I’m sure the audience appreciates it. This is such an important topic. If you’re looking to really improve the level or the quality and the time required to deliver software, having a solid test data management framework, discipline, I call it a system. To me, we’ve talked about the software delivery system, a test data management system is one of those tools that I consider a critical business application and should be managed as such. And so, this is just so important in an effective, efficient software delivery management system.

So, thanks again, everybody, we appreciate your time here today. Again, I hope that you’ll join us in the live Q and A on July 24th. We’ll answer questions about testing. We’re going to refer back to the podcast with Steen about our testing tools and capabilities, along with all of our data management, test data management capabilities that have been so clearly articulated by Irene and Kevin today. So, I hope you’ll join us. Again, if you’ve got questions, please, All the information about the podcast and the live Q&As should be there. We hope that you’ll join us on Friday, July 24.

For now, take care, be safe, everybody. Be productive, have a good week. Good coding, y’all.