Scikit-learn is one of the most widely used machine learning libraries in Python. It provides simple and efficient tools for data mining and data analysis. In this article, we will explore two basic machine learning algorithms using Scikit-learn: Linear Regression and Classification. We will also walk through examples of implementing both algorithms in Python.
Linear Regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. It assumes a linear relationship between the input features and the output target. Linear regression is used for prediction tasks where the output variable is continuous.
# Import required libraries from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error import numpy as np # Sample data: hours studied vs marks obtained X = np.array([[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]]) y = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90, 100]) # Split the data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Create the linear regression model model = LinearRegression() # Train the model model.fit(X_train, y_train) # Make predictions y_pred = model.predict(X_test) # Evaluate the model mse = mean_squared_error(y_test, y_pred) print('Mean Squared Error:', mse) print('Predicted values:', y_pred)
This example demonstrates how to:
train_test_split()
.LinearRegression()
.mean_squared_error()
.Classification is a supervised learning task where the goal is to predict the class label of an object. The input data is mapped to discrete class labels (e.g., spam vs. not spam). A popular classification algorithm is the Logistic Regression, which is used for binary classification tasks.
# Import required libraries from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from sklearn.datasets import make_classification # Create a synthetic dataset for binary classification X, y = make_classification(n_samples=100, n_features=2, n_informative=2, n_classes=2, random_state=42) # Split the data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Create the logistic regression model model = LogisticRegression() # Train the model model.fit(X_train, y_train) # Make predictions y_pred = model.predict(X_test) # Evaluate the model accuracy = accuracy_score(y_test, y_pred) print('Accuracy:', accuracy) print('Predicted labels:', y_pred)
This example demonstrates how to:
make_classification()
to create a synthetic binary classification dataset.train_test_split()
.LogisticRegression()
.accuracy_score()
.Linear regression and classification are both essential machine learning techniques, but they are used for different tasks:
Here are some important points to remember when using Scikit-learn for linear regression and classification:
Scikit-learn is an excellent tool for implementing machine learning algorithms like Linear Regression and Classification. In this article, we demonstrated how to implement both algorithms in Python, using real-world examples to predict continuous values with linear regression and classify binary data with logistic regression. With these basic algorithms, you can start building your machine learning models and dive deeper into more advanced techniques.