ML Terminology

Basic Terms

Machine Learning (ML): A field of artificial intelligence that uses statistical techniques to give computer systems the ability to learn from data and make predictions or decisions without being explicitly programmed.

Algorithm: A step-by-step procedure or formula for solving a problem. In ML, an algorithm is a set of rules that the model follows to learn from the data.

Model: The output of a machine learning algorithm that has been trained on data. The model is used to make predictions or decisions.

Training: The process of feeding data to an ML algorithm to help it learn and build a model.

Testing: The process of evaluating the performance of an ML model on a separate dataset that was not used during training.

Validation: A technique to assess the performance of the model and to tune hyperparameters. The validation set is a subset of data used to provide an unbiased evaluation.

Data-Related Terms

Dataset: A collection of data used for training, testing, or validating an ML model.

Feature (Attribute): An individual measurable property or characteristic of a phenomenon being observed. Features are the input variables used by the model.

Label (Target, Output): The output variable that the model is trying to predict. In supervised learning, the label is known and used to train the model.

Training Set: The subset of the dataset used to train the model.

Test Set: The subset of the dataset used to evaluate the trained model.

Validation Set: The subset of the dataset used to tune the model's hyperparameters and to prevent overfitting.

Model Evaluation

1: Accuracy: The ratio of correctly predicted observations to the total observations. It is used as a metric for classification tasks.

2: Precision: The ratio of true positive predictions to the total predicted positives. It measures the accuracy of positive predictions.

3: Recall (Sensitivity): The ratio of true positive predictions to the total actual positives. It measures the ability of the model to identify positive instances.

4: F1 Score: The harmonic mean of precision and recall. It provides a single metric that balances both precision and recall.

5: Confusion Matrix: A table used to describe the performance of a classification model. It shows the true positives, true negatives, false positives, and false negatives.

Learning Types

1: Supervised Learning: A type of ML where the model is trained on labeled data. The algorithm learns the mapping between input features and the output label.

2: Unsupervised Learning: A type of ML where the model is trained on unlabeled data. The algorithm tries to learn the underlying structure or distribution in the data.

3: Semi-Supervised Learning: A type of ML that uses both labeled and unlabeled data for training. Typically, a small amount of labeled data and a large amount of unlabeled data are used.

4: Reinforcement Learning: A type of ML where an agent learns to make decisions by taking actions in an environment to maximize cumulative reward.

Model-Related Terms

1: Overfitting: A situation where the model learns the training data too well, capturing noise and details that do not generalize to new data. This results in poor performance on the test set.

2: Underfitting: A situation where the model is too simple to capture the underlying patterns in the data. This results in poor performance on both the training and test sets.

3: Hyperparameters: Parameters that are set before the learning process begins and control the training process. Examples include learning rate, number of trees in a random forest, and regularization strength.

4: Parameters: Variables that the algorithm adjusts during training to minimize the loss function. In a neural network, weights and biases are parameters.

5: Regularization: Techniques used to prevent overfitting by adding a penalty to the loss function for larger coefficients. Common methods include L1 and L2 regularization.

Advanced Concepts

1: Cross-Validation: A technique to assess the generalizability of a model by dividing the data into multiple subsets and training/testing the model on different combinations of these subsets.

2: Feature Engineering: The process of using domain knowledge to create new features or modify existing features to improve the performance of the model.

3: Dimensionality Reduction: Techniques used to reduce the number of input features in a dataset. Common methods include Principal Component Analysis (PCA) and t-SNE.

4: Ensemble Learning: Techniques that combine multiple models to produce a better-performing model. Common methods include bagging, boosting, and stacking.

ML Terminology

Basic Terms

Data-Related Terms

Model Evaluation

Learning Types

Model-Related Terms

Advanced Concepts

Q3 Schools : India

Online Complier

Website Development

Campus Learning