Applied machine learning techniques can identify tweeters’ behaviors in real time and locate them to within 100 meters, according to a paper
Rochester University computer scientists published earlier this month.
The research team focused on discovering patterns of alcohol use in urban and suburban settings to better understand where and how people drink. Drinking results in about 75,000 deaths annually in the United States, and the information could be used in public health efforts.
“The more interesting information that could be derived from this study is not the location tracking, but the personality links between those that post frequently and their drinking habits,” said Jim McGregor, principal analyst at
“The potential information that can be derived is almost endless, depending on who’s looking at the data and what they’re looking for,” he told TechNewsWorld.
The technique “could be a useful tool for analyzing data for direct marketing, traffic patterns, … or it could be used by police to track potential drunk drivers, by criminals to track when people are away from their home, or by insurance companies to track the unhealthy habits of clients,” McGregor added.
How the Researchers Did It
The team collected geotagged tweets from urban, suburban and rural areas of New York state from July 2013 to July 2014, preprocessed them to make them easier to analyze, then used Amazon’s Mechanical Turk to create a training set that captures details such as whether the tweets mentioned drinking alcohol, the user drinking or the user drinking when tweeting.
The approach adhered to the concept of human guided machine learning introduced in a 2013
Researchers created a hierarchy of three support vector machine, or SVM, classifiers to distinguish the fine details.
They performed fine-grained home location inference of Twitter users by training an SVM classifier to predict home location for active users — those with five or more geotagged tweets — within 100-x-100-meter grids.
For each SVM, the team used 80 percent of the labeled data for training and the rest for testing. It employed fivefold cross-validation.
The team wrote its own software in Python and used the scikit-learn library, said Professor Henry Kautz, director of the Institute for Data Science at Rochester University.
Not Just for Nabbing Drunken Tweeters
The methodology can be used to analyze what users of social media are doing if they post about their activities, when they’re doing it, and, if their devices are location-enabled, where.
Analyzing people’s location and content together and correlating it has been done before, pointed out Mukul Krishna, a senior global director at Frost & Sullivan.
“The approach is general,” he told TechNewsWorld. “The new ideas are a way to distinguish tweets that are about a topic in general from those that are about the user doing that thing at that moment and guessing the user’s home location.”
The data can be used to build communities of people, then segment those communities’ members and serve up, through ad exchanges, very focused ads for that particular segment, Krishna said.
The approach could be used during elections. “If a significant number of people discussed a particular … candidate, conclusions consistent with what was discussed could be made,” suggested Rob Enderle, principal analyst at the Enderle Group.
It also could help in deciding where to locate a business. “Based on travel patterns of the sample, uses could [include] ideas where to build the next big bar,” he told TechNewsWorld.
Marketing and advertising efforts also could be further refined, as “you could likely even make accurate differentiations based on age and sex from the information supplied by Twitter,” Enderle surmised.
That might backfire, though, because “the more effective this becomes in terms of its use to target consumers, the more negative reaction and backlash we’ll see from the people who believe privacy is more important than effective marketing,” Larry Chiagouris, a professor of marketing at
Pace University, told TechNewsWorld.
Or perhaps not. Consumers are willing to share personal information,
Sitecorefound, in order to have a more personalized customer experience when buying online.