Data curation indicates processes and activities related to an organization, authentication of data, integration of data from various sources, annotation, publication and presentation of the given data such that its value is maintained and is available for reuse and preservation.
It includes the process of extraction of important information from scientific texts, research articles, to be converted into an electronic format or a biological database.
It is the active and ongoing management of data that makes it more useful for the users engaging in data discovery and analysis. By using domain experts, the task of converting independently created structured and semi-structured data into unified data sets for further analysis is data curation.
3. Data Standardisation
4. Data Unification
5. Data Fingerprinting
6. Data Anonymization
Companies are beginning to understand that they can’t just continue to blindly “store up” the vast piles of data streaming into them without developing a way to value this data and to determine which data has present or potential value, and which will always virtually remain useless.
Curated Data Helps to
• Enable data discovery and retrieval
• Maximize access
• Allow data usage
• Leverage human responses towards customized knowledge.
While data volumes are constantly growing and heterogeneity of data sources, getting the data we need for analysis has become time-consuming. Multiple data sets from different sources must first be catalogued and connected before they can be used by various analytics tools. Duplicate data and blank fields need to be eliminated, misspellings fixed, columns split or reshaped, and data need to be enriched with data from additional or third party sources to provide more context.