Learning from Social Data Processing

Article Author

May 15, 2017

Link to Original Article


Online shopping is a world of behavior tracking. If you buy a shirt from one of your regular sources such as Amazon, the supplier knows a lot about what to show you—not just shirt styles that are popular this minute but what you’ve bought in the past and what other buyers with some of your characteristics are buying right now.

In a brick-and-mortar store, however, seller decisions are trickier, says Devavrat Shah, MIT professor of electrical engineering and computer sciences. When a shirt goes out on a retail hanger, that’s probably the first time it’s been on sale, “so how does the retailer decide what shirt to put out?” he asks.

To complicate matters, customer interests constantly change, a trend accelerated by social media. “All of this makes prediction of effective customer demand really hard,” Shah points out.

Fortunately, these predictions can be honed by looking at how we do comparison shopping, both online and off. “Fundamentally we are very good at expressing our preferences through comparisons, rather than product star ratings,” says Shah, who studies how best to extract information from all the comparison information now available from social data and related data.

In the case of retail stores, sellers can make better product allocation decisions by seeing how buyers choose between products on their store floors—not just what gets bought, but what gets bought when surrounded by which competing products.

Shah and Vivek Farias, associate professor of management at the MIT Sloan School, co-founded the Boston-based startup Celect to commercialize decision-making tools for predictive analytics in retail. In one comparison test, a set of stores where Celect’s algorithms were used to allocate products saw a net 7% increase in revenue compared to control stores.

Celect is one spinoff from Shah’s broad spectrum of research at the marriage of social sciences with statistics and machine learning. “You need the right model coming from social sciences trying to capture people's behavior in a meaningful manner,” he says. “We think about people's choices in a heterogeneous environment, build a model for their choices, and deploy it at scale.”

Storing up product selections

Consider a giant retailer that might have tens or hundreds of millions of customers and a choice of products on the same massive scale. “You’re trying to match customers to products,” Shah says. “Which products should you put in your New York store versus your Boston store? That’s a difficult problem, because you can put only a very, very small fraction of the products in any given store. You also want to change the allocations of products over time, and you want to price them correctly.”

Retailers can improve these decisions by examining their customers’ choices in context. “If you show a customer products A, B and C, they might end up buying product A,” he says. “Or if you show them products B and C, they might end up buying product B. What you show collectively at each store location, at what time and at what price, determines the effective demand you will see.”

Celect begins its analysis by gathering the company’s point-of-sale data, inventory snapshots of products out for sale, and online customer activity data. Combining these three sets of data with the catalog information gives a good starting point. Sophisticated retailers might collect many other sources of data, such as social data feeds that can highlight customer and product trends, or weather information that might help to explain consumption of beverages or air conditioners.

The startup firm’s clients include Urban Outfitters, Bon-Ton Stores and a number of other retail chains, as well as other players in the retail supply chain.

“You can think of retail as a much larger system where on one end the manufacturers are producing the products, on the other end the consumers are consuming them, and retailers are just sitting in the middle,” Shah points out. “At each stage of this supply chain, making accurate demand predictions can help in bringing a massive operational efficiency.” In one test case, a distributor using Celect tools to suggest products to retail customers saw an 11% increase in sales.

Challenging choices

Shah’s academic research takes on many puzzles at the interface of social science and data processing. “We can partner with any company that has questions about a combination of interactions with their end customers and how to do data processing at scale,” he says.

Many challenges come from online social marketplaces, where it may be very difficult to efficiently match up buyers and sellers. For instance, online dating sites such as Tinder struggle with many participants who only try to meet up with those they see as extremely attractive.

Other large challenges loom in polling and predicting. “As recent poll predictions across the world have showed, we are terrible at predicting outcome results,” he says. “One issue of course is how we are collecting the data. But more importantly, we've got the models wrong.”

Another huge set of opportunities comes from smart transportation, smart energy and other infrastructure initiatives built on sensors, cells phones and the incoming Internet of Things.

“Everybody’s talking about smart transportation and automated energy grids,” says Shah. “But we as engineers can’t just design these systems without understanding that the end users are people and people do act idiosyncratically. We need to build models thinking carefully about their choices and then engineering systems around them—rather than first engineering the systems and then making people behave the way the systems expect them to behave.”

Experts in computer science, statistics and electrical engineering need to work with social scientists in examining and improving all these systems, says Shah, who is also director of the Center for Statistics and Data Science within the MIT Institute for Data, Systems and Society.

“The key is understanding the right social behavioral model,” he says. “Once we understand the right model, we can describe it mathematically, which in turn leads to the right collection of data, and the right data processing algorithms.”