Data Anonymization

Sanitizing information to comply with data privacy standards

Data Anonymization

The goal of the Anonymization routines is to Anonymize and De-identify the Protected Health Information in the Clinical Trial datasets based on the rules defined by HIPPA and by Compliance Legal Department of the Company

Use Machine learning to automate anonymization

Machine identifies and applies last applied rules bypassing data classification and user review

Anonymization on Unstructured Data

We make use of “NLP POS” recognition and named entity extractions to annotate the unstructured data.

>  NLP POS Recognition

>  Named Entity Extractions

>  Master Data Elements

Machine learning training for document/ sentence classification

Why is Anonymization useful?

In this process, the comparison of column values is done across different tables and a hash code against the column is generated. Irrespective of what the column name is labelled across different tables, if the column shares the same data, then a score will be generated from 0 to 1 as how much of data is matched and then the mapping of the data will be done and the data will be merged. This score will be generated using an algorithm.

For example, if there are different tables where the column is labelled as “col”,”column”,”col1”, but the data which is shared in the columns are same, then the data is checked, a hash will be generated against that column, a score between 0 to 1 is generated and then mapping of the data takes place by merging the columns.