The Long Read: Employers are turning to mathematically modelled ways of sifting through job applications. Even when wrong, their verdicts seem beyond dispute and they tend to punish the poor
A few years ago, a young man named Kyle Behm took a leave from his studies at Vanderbilt University in Nashville, Tennessee. He was suffering from bipolar disorder and needed time to get treatment. A year and a half later, Kyle was healthy enough to return to his studies at a different university. Around that time, he learned from a friend about a part-time job. It was just a minimum-wage job at a Kroger supermarket, but it seemed like a sure thing. His friend, who was leaving the job, could vouch for him. For a high-achieving student like Kyle, the application looked like a formality.
But Kyle didnt get called in for an interview. When he inquired, his friend explained to him that he had been red-lighted by the personality test hed taken when he applied for the job. The test was part of an employee selection program developed by Kronos, a workforce management company based outside Boston. When Kyle told his father, Roland, an attorney, what had happened, his father asked him what kind of questions had appeared on the test. Kyle said that they were very much like the five factor model test, which hed been given at the hospital. That test grades people for extraversion, agreeableness, conscientiousness, neuroticism, and openness to ideas.
At first, losing one minimum-wage job because of a questionable test didnt seem like such a big deal. Roland Behm urged his son to apply elsewhere. But Kyle came back each time with the same news. The companies he was applying to were all using the same test, and he wasnt getting offers.
Roland Behm was bewildered. Questions about mental health appeared to be blackballing his son from the job market. He decided to look into it and soon learned that the use of personality tests for hiring was indeed widespread among large corporations. And yet he found very few legal challenges to this practice. As he explained to me, people who apply for a job and are red-lighted rarely learn that they were rejected because of their test results. Even when they do, theyre not likely to contact a lawyer.
Behm went on to send notices to seven companies, including Home Depot and Walgreens, informing them of his intent to file a class-action suit alleging that the use of the exam during the job application process was unlawful. The suit, as I write this, is still pending. Arguments are likely to focus on whether the Kronos test can be considered a medical exam, the use of which in hiring is illegal under the Americans with Disabilities Act of 1990. If this turns out to be the case, the court will have to determine whether the hiring companies themselves are responsible for running afoul of the ADA, or if Kronos is.
But the questions raised by this case go far beyond which particular company may or may not be responsible. Automatic systems based on complicated mathematical formulas, such as the one used to sift through Behms job application, are becoming more common across the developed world. And given their scale and importance, combined with their secrecy, these algorithms have the potential to create an underclass of people who will find themselves increasingly and inexplicably shut out from normal life.
It didnt have to be this way. After the financial crash, it became clear that the housing crisis and the collapse of major financial institutions had been aided and abetted by mathematicians wielding magic formulas. If we had been clear-headed, we would have taken a step back at this point to figure out how we could prevent a similar catastrophe in the future. But instead, in the wake of the crisis, new mathematical techniques were hotter than ever, and expanding into still more domains. They churned 24/7 through petabytes of information, much of it scraped from social media or e-commerce websites. And increasingly they focused not on the movements of global financial markets but on human beings, on us. Mathematicians and statisticians were studying our desires, movements, and spending patterns. They were predicting our trustworthiness and calculating our potential as students, workers, lovers, criminals.
This was the big data economy, and it promised spectacular gains. A computer program could speed through thousands of rsums or loan applications in a second or two and sort them into neat lists, with the most promising candidates on top. This not only saved time but also was marketed as fair and objective. After all, it didnt involve prejudiced humans digging through reams of paper, just machines processing cold numbers. By 2010 or so, mathematics was asserting itself as never before in human affairs, and the public largely welcomed it.
Most of these algorithmic applications were created with good intentions. The goal was to replace subjective judgments with objective measurements in any number of fields whether it was a way to locate the worst-performing teachers in a school or to estimate the chances that a prisoner would return to jail.
These algorithmic solutions are targeted at genuine problems. School principals cannot be relied upon to consistently flag problematic teachers, because those teachers are also often their friends. And judges are only human, and being human they have prejudices that prevent them from being entirely fair their rulings have been shown to be harsher right before lunch, when theyre hungry, for example so its a worthy goal to increase consistency, especially if you can rest assured that the newer system is also scientifically sound.
The difficulty is that last part. Few of the algorithms and scoring systems have been vetted with scientific rigour, and there are good reasons to suspect they wouldnt pass such tests. For instance, automated teacher assessments can vary widely from year to year, putting their accuracy in question. Tim Clifford, a New York City middle school English teacher of 26 years, got a 6 out of 100 in one year and a 96 the next, without changing his teaching style. Of course, if the scores didnt matter, that would be one thing, but sometimes the consequences are dire, leading to teachers being fired.
There are also reasons to worry about scoring criminal defendants rather than relying on a judges discretion. Consider the data pouring into the algorithms. In part, it comes from police interactions with the populace, which is known to be uneven, often race-based. The other kind of input, usually a questionnaire, is also troublesome. Some of them even ask defendants if their families have a history of being in trouble with the law, which would be unconstitutional if asked in open court but gets embedded in the defendants score and labelled objective.
It doesnt stop there. Algorithms are being used to determine how much we pay for insurance (more if your credit score is low, even if your driving record is clean), or what the terms of our loans will be, or what kind of political messaging well receive. There are algorithms that find out the weather forecast and only then decide on the work schedule of thousands of people, laying waste to their ability to plan for childcare and schooling, never mind a second job.
Their popularity relies on the notion they are objective, but the algorithms that power the data economy are based on choices made by fallible human beings. And, while some of them were made with good intentions, the algorithms encode human prejudice, misunderstanding, and bias into automatic systems that increasingly manage our lives. Like gods, these mathematical models are opaque, their workings invisible to all but the highest priests in their domain: mathematicians and computer scientists. Their verdicts, even when wrong or harmful, are beyond dispute or appeal. And they tend to punish the poor and the oppressed in our society, while making the rich richer. Thats what Kyle Behm learned the hard way.
Finding work used to be largely a question of whom you knew. In fact, Kyle Behm was following the traditional route when he applied for work at Kroger. His friend had alerted him to the opening and put in a good word. For decades, that was how people got a foot in the door, whether at grocers, banks, or law firms. Candidates then usually faced an interview, where a manager would try to get a feel for them. All too often this translated into a single basic judgment: is this person like me (or others I get along with)? The result was a lack of opportunity for job seekers without a friend inside, especially if they came from a different race, ethnic group, or religion. Women also found themselves excluded by this insider game.
Companies like Kronos brought science into corporate human resources in part to make the process fairer. Founded in the 1970s by MIT graduates, Kronoss first product was a new kind of punch clock, one equipped with a microprocessor, which added up employees hours and reported them automatically. This may sound banal, but it was the beginning of the electronic push now blazing along at warp speed to track and optimise a workforce.
As Kronos grew, it developed a broad range of software tools for workforce management, including a software program, Workforce Ready HR, that promised to eliminate the guesswork in hiring. According to its web page, Kronos can help you screen, hire, and onboard candidates most likely to be productive the best-fit employees who will perform better and stay on the job longer.
Kronos is part of a growing industry. The hiring business is becoming automated, and many of the new programs include personality tests like the one Kyle Behm took. It is now a $500 million annual business and is growing by 10 to 15% a year, according to Hogan Assessment Systems Inc, a company that develops online personality tests. Such tests now are used on 60 to 70% of prospective workers in the US, and in the UK, according to the Association of Graduate Recruiters, 71% of employers use some form of psychometric test for recruitment.
Even putting aside the issues of fairness and legality, research suggests that personality tests are poor predictors of job performance. Frank Schmidt, a business professor at the University of Iowa, analysed a century of workplace productivity data to measure the predictive value of various selection processes. Personality tests ranked low on the scale they were only one-third as predictive as cognitive exams, and also far below reference checks. The primary purpose of the test, said Roland Behm, is not to find the best employee. Its to exclude as many people as possible as cheaply as possible.
You might think that personality tests would be easy to game. If you go online to take a five factor personality test, it looks like a cinch. One question asks: Have frequent mood swings? It would probably be smart to answer very inaccurate. Another asks: Get mad easily? Again, check no.
In fact, companies can get in trouble for screening out applicants on the basis of such questions. Regulators in Rhode Island found that CVS Pharmacy was illegally screening out applicants with mental illnesses when a personality test required respondents to agree or disagree with such statements as People do a lot of things that make you angry and Theres no use having close friends; they always let you down.