During an NPR segment on my morning flash briefing, I heard an alarming, unchallenged, comment, “No one intended this to happen.” This is utter bull. The context doesn’t even matter, but in this case it’s about, what amounts to, interest rates based on what university you attended. Specifically, how alumni of Historically Black Colleges and Universities (along with Hispanic leaning institutions) receive higher interest rates than people that went to other universities.
Let’s start at the most basic point, what is an algorithm? A quick Google search returns this:
It calls out specifically that an algorithm can be as basic as for division. So, I think it’s fair that many people use the word algorithm over “computer program” because it sounds magical and difficult to understand. Now, there are plenty of algorithms that are very complicated, like the original algorithm for page rank.
However, the most common algorithm used is the basic regression model. This looks something like ax + c = y. You should recognize this as the most basic equation for a line. Each ‘x’ is a conscious decision of what to include. For example, let’s say you want to calculate fuel economy for the vehicle. In this case, x could be your speed at steady state, such as driving down the highway. If you want to make it more accurate, you need to include other factors, like temperature, your acceleration habits, the time between accelerations, how much weight you have in the vehicle, time between oil changes, you get the idea. This would look like a*speed + b*temp + c*acc+d*freqAcc + e*weight + f*oil + g = y.
Each factor that you include is a choice of the person making the algorithm. In fact, there are tools, like sum of least squares which helps you identify factors that are actually significant (important) to properly estimating y. Furthermore, how you gather the data in your underlying set is also a choice. This is called sampling. There are statistical tools to allow a person to get closer to matching the actual population, but given how large the population is, it’s unreasonable to collect data on the entire population. So you’ll necessarily be estimating for a large portion of the people in your data set.
So, if your goal is to calculate interest on a loan using a regression model (algorithm) you have to pick which factors to include in your regression model. You have to pick how you sample the population. You have to pick the values you’re trying to get. You have to pick if you think something could be a proxy for race. You have to decide if you are going to do something about that.
Any bank will know all the factors that represent race. Zip code, high school, university, names, etc… In this case, they made the choice to ignore how these factors could be skewing the credit score and therefore the interest rate.
Anyone telling you that they had no way of knowing an algorithm is racist, is lying to you, thinks you’re stupid, or shouldn’t be trusted with your money. Probably all three. Reporters need to be more critical about accepting “iT wAs ThE AlGoRiThM” from any company.