A recent Gartner study states that more than 70% of big data projects failed to materialize due to the large amount of time spent on Data Preparation 

Augmented Data Preparation (Curation)

A recent Gartner study states that more than 70% of big data projects failed to materialize due to the large amount of time spent on data preparation and curation.

Traditionally the approach is to apply the machine learning and automation on insight generation the approach is to apply the machine learning and automation on insight generatiom ignoring the large chunk of effort required in data prepararion, ingestion and curation.

Once the Data reaches the insights phase it become either outdated or technology becomes outdated. Using our metaprogramming and applying augmented data preparation and curation, focus on the major chunk on Big Data analytics, i.e,. the data preparation phase. Our governed Data Lake makes sure the data is governed and prevents it from turning out to a swamp. The metaprogramming approach reduces data preparation and curation time frame by nearly 80%

70% of the time spend on a big data project is on data preparation. Companies tend to focus more on developing analytical solutions while ignoring the automation efforts required at the data preparation phase.

Using our Meta-programming approach we have been able to gain a significant advantage in the industry

Modak’s Meta Programming Approach

Data Curation

Data curation indicates processes and activities related to an organization, authentication of data, integration of data from various sources, annotation, publication and presentation of the given data such that its value is maintained and is available for reuse and preservation.

It includes the process of extraction of important information from scientific texts, research articles, to be converted into an electronic format or a biological database.

It is the active and ongoing management of data that makes it more useful for the users engaging in data discovery and analysis. By using domain experts, the task of converting independently created structured and semi-structured data into unified data sets for further analysis is data curation.

It involves:

1. Staging

2. Profiling

3. Data Standardisation

4. Data Unification

5. Data Fingerprinting

6. Data Anonymization

7. Integration

Companies are beginning to understand that they can’t just continue to blindly “store up” the vast piles of data streaming into them without developing a way to value this data and to determine which data has present or potential value, and which will always virtually remain useless.

Curated Data Helps to

• Enable data discovery and retrieval

• Maximize access

• Allow data usage

• Leverage human responses towards customized knowledge.

While data volumes are constantly growing and heterogeneity of data sources, getting the data we need for analysis has become time-consuming. Multiple data sets from different sources must first be catalogued and connected before they can be used by various analytics tools. Duplicate data and blank fields need to be eliminated, misspellings fixed, columns split or reshaped, and data need to be enriched with data from additional or third party sources to provide more context.

Case Study: Applying
Meta programming approach in Drug Discovery