Bring Your Own Bias

I enjoy making beer. Generally, I’ll try to make a batch for special occasions or just because it’s mid-January. It is easy and cheap to make—and therefore, ubiquitous. However, there are nearly limitless possibilities for beer recipes, so people’s tastes for it are extremely subjective. Unless you’ve brewed a batch that is described as “chewy,” it’s not objectively bad.

What’s the connection between beer and big data? Let’s say we can generate labeled data cheaply, and we want to separate our data points by “good” and “bad.” It’s a simple process of finding the features that distinguish the groups to the greatest degree possible. This is a no-brainer with beer; there are many people willing to show up and give detailed opinions on it. Then, you can make the beer with the broadest appeal. This happens with many highly subjective things. In Ian Ayres’s fantastic book, Super Crunchers, he even hints that there may be an algorithm for top grossing movies. The book was published in 2007, and the first Marvel movie, Iron Man, came out in 2008. Apparently, the presence of on-screen snow can increase the draw for some odd reason.

We can translate this into a business solution by never underestimating A/B testing and customer feedback. It may be impossible to please everyone, but there is a mathematical route to pleasing as many as possible. If you are coming out with a product, consider any current products you have that are easy to produce. Historical examples can also help you identify essential features for the new product and crunch the right numbers, thus ensuring a successful launch. Everyone has his or her own unique biases and tastes—but think of how many people have a taste for some type of beer!

Big data is daunting and confusing, yet it is becoming more of a necessity to remain competitive in the marketplace. We here at Convergent Technologies specialize in sensible data solutions. Let us help your company or organization make an objective process out of what may seem like a hopelessly subjective one!

The Target

First, I just want to say how excited I am to be part of Convergent Technologies. My dear friend, Rick Gregson, asked me to join the company for the purpose of bringing my knowledge of data science to the table. Additionally, I was invited to contribute a weekly blog, so I will back up for a second and define its purpose.

My bio is available on the Convergent Technologies website if you’re curious about who I am and why I may be positioned to tackle these subjects. I’ve invested a great deal of time studying “big data.” With that said, there is a plethora of material on big data—in particular, AI (artificial intelligence). Many news stories focus on the sensational such as self-driving cars or the toppling of champions of the Chinese “Go” game. This can be daunting and intimidating. If you want to learn more about some of these topics, you can be inundated by math, programming, and statistics.

My overall purpose focuses on two basic questions. The first is, “How does data science impact/potentially improve my daily life?” For example—Why did I use a dating site only to end up on a terrible date, while my Netflix account always knows what I want to watch? Do I need to worry about automation taking over my job? Do we even need doctors anymore when a machine can diagnose our ailments? The second and more pertinent to our consultancy goals is, “Can data science help my business/help me make money?” For example—What problems can I solve with data science? Do I have enough data for that? I hired a consultant who just graduated college, and he’s already spent $9,000 of my money on AWS (Amazon Web Services); is that normal?

We will even tackle fun topics like using ML (machine learning) to get a sports betting advantage and maybe even delve into day trading stocks with algorithms. If I discuss specific algorithms in this blog, it will be at an intuitive level only, so let’s leave the math out and opt for comparative examples. However, if you feel like diving deep into the mathematics and science of anything I explore, please feel free to reach out to me. I am always up for that! Feedback is appreciated, and topic suggestions are welcome. Let’s dig in and see what data can do!

Can Data Really Lead to Love?

My wife is from Kenya. This has nothing to do with big data or the fun yet ineffective algorithms (spoiler alert) that I will get into later. This just means that she has not seen a lot of the same movies that I have, so we have been going on a bit of a romantic comedy binge of late. This is her favorite genre. This is not mine. My favorite genre (big surprise) is science fiction. Very rarely have these two genres ever met. However, I would challenge someone to re-edit “2001: A Space Odyssey” into a romantic comedy. Hal 9000 is an impressionable AI, fresh from the planet, trying to make it in the big space station. He thought he had it all figured out until one day. . .

Rick, Convergent Technologies’ indisputably tech savvy senior consultant, and I have often discussed the concept of data and dating since he has actually tried a few online sites designed to streamline the search for a soulmate. I was lucky enough to find my wife working down the hall from me, but for those who haven’t had it that easy, the onslaught of machine learning algorithms begs the question: Can they help me find love?

There are a couple great papers on this. The first, User Recommendations in Reciprocal and Bipartite Social Networks–An Online Dating Case Study, takes a Netflix approach in building a collaborative filtering technique. i.e. Bob likes Suzy. Bill is like Bob. Sally is like Suzy. Bill will like Suzy. They incorporate an attractiveness metric in there for good measure, so that it is not just like buying groceries. The second is Online Dating: A Critical Analysis from the Perspective of Psychological Science, which digs into the psychology side of things. Its conclusion is that online dating offers great access to a tremendous amount of people, but metrics are a poor substitute for experience . . .

With all of our technical expertise and the sophistication of modern systems, maybe it would still be better to invite someone over for a good romcom than wait to hear that “you’ve got mail.”

 

Competitive Advantage

I was asked to have a cup of coffee by a fellow data scientist in the valley. He had a particular motive. He wanted to know if there was a way to make a profit in daily fantasy sports betting. He had done well betting some games in the NBA, but when he tried baseball, he was getting his clock cleaned. He wanted to know why, so I asked about his approach and methods. He was closed off to sharing, which in the era of open source, was a bit of a red flag—but I pressed on. I asked, “What exactly  are you trying to do?” His response was that he wanted to have better projections than everyone. I asked him why he wanted to do that. He stared back blankly at me in the same way my four-month-old son does when I ask him why he keeps trying to put everything in his mouth. I expounded on it. Statisticians love love love baseball. They love it so much, that love manifested itself into a Brad Pitt movie. If bioinformatics did this, we would get a feature film starring Tommy Wisseau. Therefore, it would be extremely difficult to outproject the leading sites. So how do you gain a competitive advantage when everyone is already using state of the art? First, make sure everyone is using state of the art. Rather than building a better mousetrap, maybe find where the mice are. In this particular example, I asked if they tracked the users’ stats. He said they did. I told him to follow the 80/20 rule and go after the weaker players. If that did not work then change to a different sport. The takeaway is that data science is much like gambling in general; a lot of progress is made in the strategy and not the implementation. No one group will compete against Google, but luckily Google is not ubiquitous . . . yet. Pick the battles that no one else is fighting.