How to create and interpret the boxplot in R

How to create and interpret the boxplot in R

What is a Boxplot

A boxplot is a graphical representation used to identify the distribution of a dataset. Boxplot is based on the five main quantities

1) Minimum Value

2) 25th percentile

3) Median

4) 75th percentile

5) Maximum Value

Now, if you don’t know what these five quantities are, together they are called “five-number summary”

Don’t worry, you do not need to calculate these values by hand as R will do it for you.

Case 1 – Consider the height of all the students in your class

12345678910
170185162169180142159153154180

Code for Boxplot in R

To create the boxplot, we will enter this data as a vector in R

#Create a vector of data values in R

height <- c(170, 185, 162, 169, 180, 142, 159, 153, 154, 180)

#Create histogram using “hist()”

boxplot(height)

BFU83xNwhjSFu1srT7nsjJIKMh2H1yfptG

Once you run the commands in R, you will get a figure similar to the above. A box with a dark center line and two wings is called a boxplot. 

  • The wing extending above the boxplot represents the maximum value of the data, and one extending below represents the minimum value in the data. 
  • The dark line inside the box represents the median value. It is the middle most observation of the data, meaning 50% of the data points will be above, and 50% will be below this line. Here it is approximately 165, meaning the students’ median height is 165cm. 50% of the students have a height above 165cm, and 50% will have a height below 165cm.
  • The upper line in the box represents the 75th percentile or third quartile. Here it is approximately 180cm, meaning 25% of the students have a height of more than 180cm, and 75% have a height of less than 180cm. 25% of the data points will lie above the third quartile, and 75% of the data points will lie below it.
  • The bottom-most line of the box represents the 25th percentile or first quartile. It is just the opposite of the third quartile. Here, 25% of the data points will lie below the first quartile, and 75% will lie above it. Here it is approximately 155cm, meaning 75% of the students have a height above 155cm, and 25% have a height below 155cm.
  • Overall, you can say that the boxplot divides the entire dataset into four pieces, with 25% inside each part. 

Case 2 – Consider the height of basketball players in your college (require tall people)

12345678910
170185175169180142181179182184

To create the boxplot, we will enter this data as a vector in R

#Create a vector of data values in R

height <- c(170, 185, 175, 169, 180, 142, 181, 179, 182, 184)

#Create histogram using “hist()”

boxplot(height)

wNK2dVBc14sONzK ho czD9D6G7lmkvNnuOw0wGkk1nffy7eMe3FwYg3am2f DmlMyKecm3yuNTV7yCUNGCInL0UwTE8sKjpxTZzIsY XYvxQhimKJ7G lenQRIqVxuIa4rHD7GQZZxM512nJ185vteraQte79UBqlt5xfzO45kgOzk6nA lTEgPc5y8xN6dJM8 Cw

Note: Do not worry about how R calculated all these values. Just focus on interpretation.

  • Now observe this boxplot! Did you find it similar to the previous one or different? If you watch closely, you will see that everything is identical, excluding the dot below the boxplot. All the quartiles and minimum & maximum are represented similarly to the previous one. So why do we have this big dot at the bottom? It is called an outlier. Why?
  • Imagine a case where the basketball team in your college has all long-height players. Now, there is a student who is not very long and still plays basketball well. So, he got selected for the team. He may have excellent skills in playing basketball, but his short height made him look different from the remaining. This means he is an outlier. 
  • Remaining things you can interpret the same way as above but look at the outlier! The height is less than 150cm which, compared to other players, is very low. We call it a lower outlier. 
  • So, to conclude, an outlier is something that you can differentiate from a group. You can imagine this team playing a match, and you will be able to point out this short-height player easily compared to any other player! 

Case 3 – Consider the height of rock-climbing players in your college (require short people)

12345678910
144143142147148142159147172154

To create the boxplot, we will enter this data as a vector in R

#Create a vector of data values in R

height <- c(144, 143, 142, 147, 148, 142, 159, 147, 172, 154)

#Create histogram using “hist()”

boxplot(height)

3P0jV9NRo6GySfxn7Gd3HHCN Ndj19aLOSK6yB1M9SjZYku4Igy92OZ34Ye6QfRQPYMQ4DbLewFlMPuegMy3F6yMCpRmBvEdflhPoq5SYLmwXKSVsvWNNFPGiZNITbpILL5dyP 9cp0xrq wuuJY ZkVOl886R tEU6FT HOLgEmpCMy AJ
  • Now, you can observe this boxplot in the similar way but make the judgement. All things same, you can see the outlier in rock climbing team is above all the other players which means it is an upper outlier. 

Submit a Comment

Your email address will not be published. Required fields are marked *

academic Sidebar Image

Unlock the power of data with our user-friendly statistics calculator.

Explore now
academic Sidebar Image

Explore our data science courses to supercharge your career growth in the world of data and analytics.

REGISTER FOR COURSES
academic Sidebar Image

Test Your Skills With Our Quiz

SOLVE QUIZ

Contact me today! I have solution to all your problems.

Please enable JavaScript in your browser to complete this form.