Machine Learning – You Decide What Is Fair
If you are evaluating ‘over-the-counter’ machine learning (ML) or artificial intelligence (AI) solutions, how do you know the algorithms they use are fair? I would ask your vendor. In a perfect world, we assume all data is ‘clean’ meaning it is accurate, up-to-date, and can be called the one source of truth. I know, this is only pretend.
First, we have our clean algorithms that no one can fault. Next, we have the algorithms that are faulty by anyone’s definition, hopefully to be corrected. Then, we have our stumbling block – what’s fair to me may not be fair to you. In some cases, an algorithm is a reflection of today’s society. But you and I probably have varying levels of agreement about the world around us. Data can be biased because it is not diverse, representative or has, according to data scientists, ‘unequal base rates.’
When automating decision making, as that’s what most of these applications aim to do, is accuracy the primary goal? They are used in areas such as mortgage and loan approval, the correctional system, or just simple evaluation of people. Here is an excellent article on machine bias gone wrong in the correctional system.
For now, let’s assume a mortgage company is offering a $300,000 mortgage, and a qualified buyer must have an income of at least $50,000. The algorithm would be straightforward and simple. But there can be a problem here – achieving accuracy alone will result in different treatment of people considering factors of age, race, and gender, and may not deliver appropriate results.
What is the solution? Right now, it’s tweaking the algorithm, which will lead to decreased accuracy and therefore erroneous results. It’s up to the ‘human’ data scientists to alter the algorithm as they see fit – another source of bias.
Coming back full circle, let’s take a look at data. Insight into the context within content is invaluable to a data scientist in developing algorithms. Approximately 85 percent of data within an organization is unstructured, and less than 1 percent is used for analysis. Without knowing what is in your content repositories, understanding the varying factors that will impact the accuracy of an algorithm is difficult, if not impossible. It sort of astounds me that you could write an algorithm without understanding what you were writing about.
Why our software? It delivers insight into your structured and unstructured data. We generate multi-term metadata that represents a fact, idea, topic, subject or concept, then auto-classifies it, and manages it within a hierarchical, taxonomy format. The taxonomy component was designed for ease-of-use by business professionals, so taxonomies are deployed rapidly and can be altered easily. The main benefit is providing information transparency and visibility into all relevant structured and unstructured data, to improve decision making and identify vagaries in designing algorithms. The added benefit for organizations is eliminating the shortfalls of enterprise search.
Data scientists or not, our conceptClassifier platform puts text mining and analytics into the hands of business folks, easily and essentially out-of-the-box, delivering actionable insight into structured and unstructured data. You’ll probably be surprised at what you will find.
Our webinars also address the topics explored in our blogs. Access all our webinar recordings and presentation slides at any time, from our website, in the Recorded Webinars area, via the Resources tab.