Machine Learning (ML): A field of artificial intelligence that uses statistical techniques to give computer systems the ability to learn from data and make predictions or decisions without being explicitly programmed.
Algorithm: A step-by-step procedure or formula for solving a problem. In ML, an algorithm is a set of rules that the model follows to learn from the data.
Model: The output of a machine learning algorithm that has been trained on data. The model is used to make predictions or decisions.
Training: The process of feeding data to an ML algorithm to help it learn and build a model.
Testing: The process of evaluating the performance of an ML model on a separate dataset that was not used during training.
Validation: A technique to assess the performance of the model and to tune hyperparameters. The validation set is a subset of data used to provide an unbiased evaluation.
Dataset: A collection of data used for training, testing, or validating an ML model.
Feature (Attribute): An individual measurable property or characteristic of a phenomenon being observed. Features are the input variables used by the model.
Label (Target, Output): The output variable that the model is trying to predict. In supervised learning, the label is known and used to train the model.
Training Set: The subset of the dataset used to train the model.
Test Set: The subset of the dataset used to evaluate the trained model.
Validation Set: The subset of the dataset used to tune the model's hyperparameters and to prevent overfitting.
1: Accuracy: The ratio of correctly predicted observations to the total observations. It is used as a metric for classification tasks.
2: Precision: The ratio of true positive predictions to the total predicted positives. It measures the accuracy of positive predictions.
3: Recall (Sensitivity): The ratio of true positive predictions to the total actual positives. It measures the ability of the model to identify positive instances.
4: F1 Score: The harmonic mean of precision and recall. It provides a single metric that balances both precision and recall.
5: Confusion Matrix: A table used to describe the performance of a classification model. It shows the true positives, true negatives, false positives, and false negatives.
1: Supervised Learning: A type of ML where the model is trained on labeled data. The algorithm learns the mapping between input features and the output label.
2: Unsupervised Learning: A type of ML where the model is trained on unlabeled data. The algorithm tries to learn the underlying structure or distribution in the data.
3: Semi-Supervised Learning: A type of ML that uses both labeled and unlabeled data for training. Typically, a small amount of labeled data and a large amount of unlabeled data are used.
4: Reinforcement Learning: A type of ML where an agent learns to make decisions by taking actions in an environment to maximize cumulative reward.
1: Overfitting: A situation where the model learns the training data too well, capturing noise and details that do not generalize to new data. This results in poor performance on the test set.
2: Underfitting: A situation where the model is too simple to capture the underlying patterns in the data. This results in poor performance on both the training and test sets.
3: Hyperparameters: Parameters that are set before the learning process begins and control the training process. Examples include learning rate, number of trees in a random forest, and regularization strength.
4: Parameters: Variables that the algorithm adjusts during training to minimize the loss function. In a neural network, weights and biases are parameters.
5: Regularization: Techniques used to prevent overfitting by adding a penalty to the loss function for larger coefficients. Common methods include L1 and L2 regularization.
1: Cross-Validation: A technique to assess the generalizability of a model by dividing the data into multiple subsets and training/testing the model on different combinations of these subsets.
2: Feature Engineering: The process of using domain knowledge to create new features or modify existing features to improve the performance of the model.
3: Dimensionality Reduction: Techniques used to reduce the number of input features in a dataset. Common methods include Principal Component Analysis (PCA) and t-SNE.
4: Ensemble Learning: Techniques that combine multiple models to produce a better-performing model. Common methods include bagging, boosting, and stacking.