Masking of Sensitive Data: A Case Study of Medicare Numbers
Overview: Learn how Topaz for Enterprise Data can resolve real-world issues organizations face when masking sensitive data.
One of the most underappreciated aspects of data masking is having a solution generic enough to address records containing sensitive identification numbers that are prone to format changes. These can cover a wide spectrum, ranging account IDs, national IDs, credit card numbers, Medicare numbers, etc.
When these numbers undergo a format change, more often than not, it triggers major modifications to the current masking definitions. Needless to say, over time, this results in solutions becoming complicated and a nightmare to maintain.
Topaz for Enterprise Data, a feature install of Topaz—Compuware’s Eclipse-based development, testing and data management platform that integrates into any DevOps toolchain—addresses this problem by providing an easy-to-use interface to create and group privatization rules and their individual actions into meaningful Test Data Privacy projects. These are easy to debug, manage and port across environments. Once a project is defined, products across the Compuware File-AID portfolio can invoke it and privatize data across a wide range of storage, such as:
- RDBMS tables
- z/OS files
- IMS databases
In this example, I’ll show an approach to mask Medicare numbers, which underwent a format change some time ago and was a concern for one of our customers. A related article on the Compuware Support Center that lists step-by-step instructions for the definition of the Test Data Privacy project, and addresses each restriction and complexity highlighted below as the solution is developed, to be available soon.
The original Medicare numbers were known as HICN (Health Insurance Claim Number), which were based on the SSNs of people with Medicare.
The new format is the MBI (Medicare Beneficiary Identifier) number, which is a mix of numbers (0-9) and upper-case letters, except for S, L, O, I, B and Z, which is intended to make the whole combination easier to read.
An example of an MBI number is: 1EG4-TE5-MK73. The restrictions on the MBI format are highlighted below:
I’ll address these restrictions for the MBI as the solution is developed, ensuring that the replacement values generated are compliant with them, so as not to cause other failures in downstream systems.
As mentioned above, the struggle for organizations would be that, once implemented, the fields containing Medicare data across different system records could be a mix of the old HICN format or the new MBI format, making the process of defining and maintaining privatization rules complicated and difficult to manage over time.
Another layer of complexity that is quite common is the presence of special characters such as hyphens and spaces, which are irrelevant to the actual data itself and should not have an impact on the masking results. Consider the simple table highlighted in the image below:
Another example of the data can be from a QSAM file (sequential data file found on the mainframe), which I’ve opened in Compuware File-AID’s Data Editor, as shown below:
Notice that in addition to the complexity added by the special characters (dashes and spaces), the HICN-NOS column is right-aligned, while the MBI-NOS column is left-aligned. Needless to say, post execution, the results would be expected to match the alignment too.
Detailed instructions for the definition of the test data privacy project to mask Medicare numbers and address each restriction and complexity are available on the Compuware Support Center. A brief overview can be found below.
As seen in the screenshot, multiple tabs (Data Elements, Rules, Coverage, Composites and Extensions), which allow the user to manage and fine tune the project, are available. To give a brief overview:
Helps define generic placeholders for data items or fields and the normalization that should occur for them during pre and post-privatization. It provides the flexibility to add additional field or column names and expect the same processing. In the screenshot outlined below, if a new column containing HICN data is identified, simply adding it to the list of Source Data Identifiers will ensure it is masked identically to the existing fields.
This helps to define the various Rules and their specific Rule Actions. This is where we either analyze the field to identify its origin (Overloaded Rule Action) or define the masking conditions (Encryption/Translation Rule Actions).
This helps preview the results of data element identification and rule assignment. It’s a useful mechanism to make sure everything looks good as per the intended design of the Test Data Privacy project.
Encryption Sets is a mechanism to identify a range of characters or Unicode values, which would be the set from which search and replacement encryption values would be chosen. They are particularly useful when there is a need to define a specific Code Page for encryption purposes but the Code Page itself is too large.
The screenshot below shows the series of Encryption Sets defined to address the restrictions associated with Medicare numbers and its usage in the Rule Actions where a Field Mask is applied to restrict the Encryption Set to only certain positions.
Step-by-step instructions for creating all the required Encryption Sets are available in the aforementioned article on the Compuware Support Center.
Once the Test Data Privacy project is defined and ready to be used, we can execute a couple of jobs, which invoke the project in order to privatize the data prior to writing it out to the target.
The first job would be a ConverterPro job (part of Topaz for Enterprise Data), which acts on the previously listed RDBMS Source Table and writes to a Target Table.
On execution, a comparison of the Source and Target results shows that the numbers are consistently masked across the MBI and HICN columns, and the formatting of the numbers is maintained. In addition to this, the MEDICARE column numbers were categorized and encrypted correctly. The invalid entry 1EG4 TE5 MK7Z has also been encrypted, albeit differently, since we can’t be sure of its origin.
A look at the ConverterPro execution log shows the Warning messages from the Rule Logic listed.
The second job is a File-AID Data Solutions JCL job, which simply reads and writes QSAM files while utilizing the same Test Data Privacy project.
Similar to the execution in the distributed world, a comparison of the Source and Target results shows that the numbers are consistently masked across all the columns.
This was a quick example to demonstrate the power of Topaz for Enterprise Data in resolving real-world issues customers face when managing data. There are numerous other features across Topaz for Enterprise Data that you can leverage to discover, visualize and privatize both mainframe and non-mainframe data in a common, intuitive manner.