What is DATA?

Data is collection of facts such as numbers, measurements , words or observations. We generate billions and billions bytes of data every day. In fact the time i was writing this post, i was generating data at google servers and similarly when you are reading this post, even you generated some amount of data. We are generating data at an UN-measurable rate. Some of the sources of data generations include, social media accounts like Facebook, Twitter, Instagram, Pintrest and more. Other sources include your mobile phone, sensors in a car, Forex trading, banking servers and many countless other sources.

 

Data is collection of facts such as numbers, measurements , words or observations.

In machine learning there are basically 3 types of data namely training data, validation data and test data.


What is training Data ?

Training data are the set of data we use to train the machines so as in future it will be able to predict correct outcome if the data of similar nature is fed to it. The training data  is generally 80% of overall data. This number can vary based on the amount of data and objective of the machine learning algorithm.

Artificial intelligence is a technology using which we can create intelligent systems that can simulate human intelligence.

Data used to train the algorithm are called training data.

What is Validation Data ?

  Validation data is nothing but the data we use to check the outcome of machine learning algorithm after it has been trained. This data is to check whether the result provided by the algorithm is correct or not. If the outcome is desired output we move to next step or else if the outcome is flawed we tune the data, feed it again in machine learning algorithm, train it and recheck the outcome. This process is also called tuning of data.

Data used to validate the accuracy of the algorithm are called validating data.

What is Test Data ?

Once the algorithm is trained and validated then comes the testing phase where we feed the remaining data to test and check if the algorithm is working correctly and in order. This data sets type is you can say the final evaluation that a model need to go through after the training and validation stage in model development. This data basically defines the working accuracy of a given model.

Data used to test the algorithm are called test data.

  The image below shows an simple overview of data flow


 

What is DATA? What is DATA? Reviewed by Mihir Jha on January 24, 2020 Rating: 5

No comments:

Powered by Blogger.

Search This Blog

Blog Archive

About Me

My photo
Myself Mihir Kumar Jha hailing from Bangalore, India, an electrical and electronics engineer by degree, an software engineer by profession , a physicist by luck and lastly a creative website developer by choice.