Standford University Student Population Statistics Using Bayesian Inference
- sherry salek
- Aug 1, 2022
- 5 min read
We will examine student population statistics for Standford University during the 2021-2022
academic year.
We will apply Bayes law to determine whether the university is successfully diversifying its student population.
The common data set or CDs for short is a free public data set of summarized statistics about the demographic of the students attending a given University. A new data set is released after the start of every new academic year. Furthermore, these sets are available for public use and can be found through the Standford University website. Section B on the third page is titled Enrollment and Persistence, and it showcases enrollment in the University divided into different categories. Let's start off by examining Table B one, which showcases the distribution of students based on their gender. Since the survey, students fill out to apply only has the options male and female. The data is split solely into these two groups.


The columns represent full time and part time students separated by gender. Now let us look at the rows. We see that they are split into two major groups, undergraduates and graduates. We are going to examine undergraduate students.
1,012 is the number of people who are simultaneously full time males and degree seeking first time freshmen. In a Bayesian sense, these 1,012 students represent the intersection of the sets denoted in the first row and the first column.
Let's observe the total undergraduates row of women.
It includes all degree seeking.
This means that the value of 3,867 we observe in the Total Undergraduates row of the second column is the union of all
full time undergraduate women in the university.
Now, the union of all women, both full time and part time, would be 3,867 plus 0 or 3,867.
This is true because no students can be simultaneously part time and full time.
A Bayesian way of expressing this relationship would be to say that the sets of part time and full time
students are mutually exclusive.
Additionally, we have 3,778 plus 0 or 3,778 male students attending the university.
The total number of enrolled undergraduate students is 7,645.
Since there are 3,867 women and 3,778 men, they combined to complete the sample space.
Furthermore, due to the nature of the survey, nobody was allowed to mark any answer different from
male or female.
So the two sets have no overlapping elements.
Satisfying both conditions means the two sets are complements.
Now let's move on to a different part of Section B. We will focus on the racial ethnic diversity in the university summarized in Table B two below.

We want to use Bayes law to determine whether the student body is successfully diversifying its population. Therefore, we need to use the table to determine whether the freshman class is more diverse than the average for the specific ethnicity. To do so, we need to be able to accurately compute the appropriate size of each set we are interested in. Event A = a degree seeking first time first year student
Event B = being black or African-American non-Hispanic
We're going to refer to elements of A simply as first years and elements of B as black. Since we have a total of 7645 undergraduates students and 2101 of them are first years, then the probability of being a freshman:
P (A) = 2101/7645 = 0.275
This suggests that approximately 27.5% of the student body are freshmen.
Similarly, we can estimate the probability of a random student at the institution being of African American descent:
P (B) = 547/7645= 0.072 or close to 7.2%.
Now the intersection of A and B would represent all black first year students going back to the table. Only 156 students represent both demographics. The probability of being a black first year student:
P (A n B) = 156/7645 = 0.020, which is close to 2.0%.
We know the likelihood of a student being African-American and we know the chance of a random student being both black and a freshman. Thus, we can use the conditional probability to see that the likelihood of a black student being in his first year at the university:
P (A | B) = 156/547 = 0.285
Since the university undergraduates program provides a four year program, any representation higher than 25% would indicate a higher than average value. 0.285 is greater than the expected average of 0.25.
So we can see a rising trend in the representation of minority in the student population.
The Union of A and B represents all students who are either first time, first years or black. We know that there are 2101 first years, 547 black and 156 first year black students. To find the number of students within the Union of A and B, we would apply the additive law. According to the additive rule, we would have:
A u B = 2101 + 547 - 156 = 2492 students that are either freshmen
or black.
We would find the probability of being part of the union:
P (A u B) = 2492/7645 = 0.326, which indicates that approximately 32.6% of the student body is either a freshman or identifies as black.
Event C = the set of all Hispanic Latino students at the university.
P (C) = 1339 / 7645 = 0.175
Since Event B clearly says non-Hispanic, then the two must be mutually exclusive. Thus, the intersection:
B n C = empty set
But their union equals the sum of their elements:
B u C = B + C = 1339 + 547 = 1886 students who identify as either
African-American or Latino.
The probability of picking a random student and them identifying as either one equals:
P (B u C) = 1886 / 7645 = 0.25, which equals 25%.
Not that great as a percentage, but great work on figuring that out.
Let's dig a bit deeper and examine some conditional probabilities.
In Table B to the entire first column only represents values for first year students.
Therefore, any number we get would represent the size of the intersection of freshman and another demographic.
This is important when we wish to compute conditional probabilities.
Recall that the conditional probability formula states that the likelihood of an event occurring given
another event has already occurred equals the likelihood of the intersection over the likelihood of
the second event. The likelihood of being black, given you are a freshman, equals the probability of being a black freshman
over the likelihood of being a freshman.
P (B | A) = P (B n A) / P(A) = 156 / 2101 = 0.074
Therefore, there is a roughly 7.4% chance for any freshman student to identify as black.
Similarly, we can compute the likelihood of a given student to be Hispanic first year.
We can compute the likelihood of being a Latino given you are in your first year of college as well
as the likelihood of being a freshman and apply the multiplication rule.
P (C | A) = P (C n A) / P(A)
P (C n A) = P (C | A) * P(A) = (382 / 2101) * 0.275 = 0.05, which is a 5% likelihood of being a Latino first year.
What if we want to find out the likelihood of being a freshman given you are Hispanic?
We could also apply Bayes law to solve this.
P (A | C) = P (C | A) * P(A) / P(C) = (0.182 * 0.275) / 0.175 = 0.286
P (A | C) > P (C | A)
That means there is a 29% chance a student is a first year assuming they are Hispanic. Thus, we can say that a person is more likely to be a first year given they are Hispanic than to be Hispanic given they are a freshman. If we think about the favored overall formula, this makes sense because there are more freshmen than Hispanic students in the university. Our short analysis suggests that the university is improving its minority representation with the current freshman class. However, further research would be required to account for attrition among the student population, as well as moving to other universities within the region.
コメント