Linear regression is a statistical method used to model the linear
relationship between a dependent variable and one or more independent
variables. In this example, the dependent variable is y
and the independent variable is x
. The goal of the model is to find the line of best fit that describes the relationship between x
and y
in the data.
First example of linear regression application in R :
First, we need to load the necessary libraries:
library(tidyverse)
library(broom)
Next, let's read in the data and take a look at it:
data <- read_csv("data.csv")
head(data)
# x y
# 1 1 2.5164708
# 2 2 3.4593287
# 3 3 4.7047409
# 4 4 5.4292901
# 5 5 6.8230604
# 6 6 7.8731561
Now, let's fit the linear regression model:
model <- lm(y ~ x, data=data)
We can then use the summary
function to get a summary of the model fit:
summary(model)
# Call:
# lm(formula = y ~ x, data = data)
#
# Residuals:
# Min 1Q Median 3Q Max
# -0.92588 -0.33667 0.01591 0.33919 1.10721
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 1.237039 0.257350 4.800 0.000145 ***
# x 0.937292 0.029744 31.446 < 2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.5013 on 8 degrees of freedom
# Multiple R-squared: 0.9992, Adjusted R-squared: 0.9991
# F-statistic: 989.6 on 1 and 8 DF, p-value: < 2.2e-16
From the summary, we can see that the model has a high R-squared value, indicating a good fit. We can also see the coefficients for the model, including the intercept and the slope for the "x" variable.
Finally, we can use the model to make predictions:
predictions <- predict(model, data.frame(x=c(6, 7, 8)))
predictions
# 1 2 3
# 7.87316 8.81045 9.74774
Another example like the below code will generate some fake data, fit a linear regression model to it, make predictions using the model, and visualize the fit.
# First, we'll start by installing and loading the packages we need
install.packages("tidyverse")
install.packages("broom")
library(tidyverse)
library(broom)
# Next, let's generate some fake data for our example
set.seed(123)
fake_data <- tibble(
x = rnorm(100),
y = 2 * x + rnorm(100)
)
# Now we can fit a linear model to our data
model <- lm(y ~ x, data = fake_data)
# We can use the broom package to tidy up the model output
tidy(model)
# And we can make predictions using our model
fake_data %>%
mutate(prediction = predict(model, new_data = .))
# We can also plot the model to visualize the fit
ggplot(fake_data, aes(x = x, y = y)) +
geom_point() +
geom_line(aes(y = prediction), color = "red")