Andreas Weigend
Social Data Revolution
MS&E 237, Stanford University, Spring 2010


Class 19-Implicit Vs. Explicit Data Guest Speaker:Auren Hoffman, Rapleaf


Final Project Presentation: HappyMap Group- (http://sites.google.com/site/stanfordsocial/)

Real-time, location based, sentiment analysis of social data. (Making something useful from implicit data at virtually no cost)
- Some findings:
- West coast tends to be more happy than the East coast
- Lunch time is when people seem to be most happy
- As number of tweets goes up, amount of happiness decreases
- Uses sentiment analysis to determine "happiness"


Implicit Vs. Explicit Data


Auren Hoffman (founder and CEO of Rapleaf), started off his talk like this: "Raise your hand if you use Yelp". Then, he asked people to describe to him what Yelp was all about. The first two answers included the words "community" and "reviews". Hoffman pointed out that these words describe an explicit data set, because users are engaged with the website and they actively interact and change the information that you can find in Yelp.

Next, we will mention some of his key points regarding the comparison and the contrast between implicit and explicit data.

Implicit Data


  • No user generated data. Find already existing data and turn that into an explicit product.
    • Learn from past behavior.

  • An engineering challenge

    • Crawling of data,

    • Algorithms to make sense of the data

    • Machine Learning

  • Examples: Google, RottenTomatoes, rapLeaf.

Explicit Data

  • User specifically says what he/she thinks/wants.
    • e.g. Written recommendations

  • A marketing challenge.
    • Not necessarily a technology challenge. Only need to figure out how to get the mechanics right
      • A/B testing
      • Figure out incentives, how to motivate users
  • Examples: Facebook, Twitter, Yelp.



When to use implicit vs. explicit data?

  • Sometimes it is immediately obvious.

    • Imagine Google getting explicit data from users (explicitly ask users to create reviews for each url. Hard to do, even harder to process all that data and make sense of it)
  • Depends on your team:
    • If your team is made up of brilliant engineers/algorithmicists--> Use implicit collection/analysis of data.
    • Business/marketing culture, not many with strong technical background--> Explicit
  • Marry the two approaches
    • e.g. Amazon:
      • Explicit costumer reviews of products
      • Implicit collection of data(from crawling the web, figuring out price, number of pages, etc

Case Study: Yelp


What is it?
  • A tool based on user reviews
  • Rates anything that's a service to create recommendations for people.
  • Only survivor of similar 50+ companies that initially raised millions.

What makes it valuable?
  • They foster a reliable community of people
    • We trust Yelp because we trust the Yelp community
  • They understand the game mechanics.
    • Star-rating, comment.
    • Establish credibility of reviewer based on quality of review
    • Get a "cool" star if you're the first reviewer for a particular sevice that can never be taken a way. Motivation for users
    • Yelp parties
Student Concern: User has higher incentive to write a review if she has had a bad experience-->Mainly negative comments. Leads companies to monitor their Yelp reviews (controversies)

Challenge: How to build an implicit version of Yelp (without explicit data from reviewers)

  • Measure tips
    • Need credit card info (partner with Visa, MasterCard. Blippy??)
    • Premise: The more customers tip, the better the service.
  • Tweets
    • What people are saying about a particular service
  • Busyness of restaurant
    • Needs to be busy enough, but not too busy.
  • Can crawl FourSquare. Can crawl Yelp. Get their data.
  • Traffic detectors
    • Can track Mac address of cellphones
    • Like Google Analytics for places. See whether people are returning, etc.
  • Quantcast-Website traffic
  • Government Data:
    • When did company open?
    • Lawsuits? Health violations?
    • Liquor licence(Can serve only wine? Maybe only mixed drinks?)
    • Does the owner of this restaurant own other restaurants (They usually do)

Fundamental Problem:Matching data from different sources

Once we have the data, how do we know when people are talking about the same thing?

  • Phone Number, address(harder to do match)
  • Good key on people: E-Mail address.
    • Great key to aggregate info, as even though some people may have more than one e-mail,the mast majority of email addresses are used by only one person.

Other Great Keys used by companies to aggregate data from different sources:
  • Amazon:ISBN # for books.
    • Sometimes ISBN not available from 3-rd party reviews. Use Author/Title info.
      • Complex engineering problem.Amazon is able to do it nevertheless.
  • Google: URL
    • Great key. That's why PageRank works.
    • Each website has unique url.
  • Bloomberg: Stock ticker symbols.

- Conclusion
  • When thinking about a product - try to frame it as an explicit site or an implicit site
  • Perhaps there exists an explicit search engine (i.e. Facebook maybe?) that might be a Google killer
  • Perhaps there exists an implicit restaurant review site that might be a Yelp killer
    • These things might exist - but no one has been able to do them well yet


A piece of advice:


  • You never know what ideas will work, what won't.
    • Paypal changed 6 times. groupOn
Thus,
  • Start with one idea. Realize it's not gonna work, one idea leads to another
    • Iterate quickly, understand what users want
  • Success determined by ability to iterate quickly and the strength of team.

Surround yourself with:
  • Smart people (not just in terms of education. There are smart people who are not in an elite college, there are people at Stanford who are not very smart)
  • People who get stuff done (Able to get good enough answer, instead of wasting a lifetime trying to find the perfect answer)
  • Do I like this person?Would I enjoy working with him?
    • "No as*holes at RapLeaf"

Related Links

  • Even though Auren Hoffman discussed in class how RottenTomatoes was an example of a website that shows implicit data, we can see in this link that companies are better off gathering data implicitly and explicitly instead of just using one way or the other. http://www.rottentomatoes.com/community/
By building a community part in their website (a term that we can now associate with explicit data), RottenTomatoes will be able to take advantage of the traffic that they generate to their website by having discussion boards and the option for comments. This will generate more data and will help them deliver a better product.
  • Bloomberg is another company that Hoffman talked about on Tuesday that uses implicit data. He also mentioned how implicit data is often related with engineering and technology and how explicit data has more to do with people, socializing and marketing. If you follow this link (http://about.bloomberg.com/thinktechnologysolutions/) you can see how Bloomberg advertises its image as innovative, and boasts about its use of technology. Specifically, "Think Fast" refers to implicit data gathering by talking about the real time information they get an about their automatic system to gather this data. I encourage you all to read more about Bloomberg so you can reflect on last Tuesday's class.
  • Finally, check out ( http://www.rapleaf.com/ ) so that you can understand in more depth how Auren Hoffman uses data analysis in his day to day life and how companies like this are catalysts for the Social Data Revolution.

Rapleaf History
  • Originally were an explicit site for people to rate other people
  • Original goal: "By showing people how good they are and incentivizing people to be good, make everyone in the world good"
  • Turns out, "Good" is hard to evaluate in different contexts.(e.g. Bill Clinton may be considered a good president, perhaps not a really great husband..)
  • Eventually, became a data company that focuses on helping companies market their products to the right segment of consumers


Students:
Diego Molino, Maurizio Calo, George Tang, Jason Wei