7.31.2015

Data science means shopping and plucking, not just cooking.

Many people think that data science works like this:


But that’s not the whole picture---not even close.

Unless your data pipeline is quite mature, your data is probably more like this.



Unstructured, uncleaned. Still very messy.

Your whole data set is probably more like this:




It contains some of the key ingredients, but not all of them.

I can’t make cookies out of mustard. And I can’t make them out of just chocolate chips and vanilla, either.



Most of the time, building great data products requires shopping and plucking, not just cooking. You can’t cook a great meal until your fridge is stocked with the right ingredients.

Bottom line: your "secret sauce" isn’t an algorithm. It a combination of data cleaning, processing, and curation—plus a judicious choice of the right algorithms.

(Even if you have the right ingredients, you can’t boil your way to good cookies.)




That means you want to work with data scientists who understand the whole process of shopping, plucking, and cooking good data products. If you hire analysts or machine learning specialists who don't know how to pluck and shop, you're going to either (1) get stuck baking mustard cookies, or (2) put a heavy burden on your engineering team to grab and process new data. (1) is yucky. (2) is very slow.

It also means that you don't want to constrain your data scientists to only use the ingredients you already have in your kitchen. You should expect a good data scientist to improve your options by looking for more ways to bring in more data. ("Hm. No eggs. Before we go any further, we're going to need some eggs." "These cookies are okay, but they'd be much better with a dash of cinnamon.")

Practically speaking, "more ways to bring in data" includes things like
  • additional instrumentation within your app/website
  • mashups with public data sources
  • feedback mechanisms within your app/website (e.g. additional profile fields)
  • hand-curated data sets to clean and normalize large data feeds
  • merging in additional sources of user feedback (e.g. customer support tickets)
  • user surveys or interviews
  • etc.

In conclusion, three cheers for cookies!



PS: I’m not saying you should wait for all the perfect ingredients to begin. Great data science usually involves smart sequencing—rapidly learning which data streams add the most value, and developing the systems to gather and process them effectively. Make sugar cookies for now, and add the chocolate chips as soon as you can get them.

PPS: Peter Norvig says that “more data usually beats better algorithms." I’m not disagreeing. Instead, I’m pointing out that at any given point in the life cycle of a data product, your volume of data is more or less fixed. Great data science is about working within that constraint, creating useful data products with the tools and ingredients that are close to hand, and bootstrapping yourself up to the next level.



PPPS: There’s another layer to this conversation: developing the tools (and culture) to enable rapid exploration and deployment of data products. It’s a bit like making sure your kitchen is equipped with a food processor, not just a microwave. But I think this metaphor is already strained enough, so we’ll save that conversation for another day.

Image credits



11 comments:

  1. data scientist online training is already a popular subject for to explore for fresh graduates in India as many students from Europe, South America and Australia are also showing more interest in getting training from Indian centers.

    ReplyDelete
  2. Hadoop training institutes in hyderabad .All the basic and get the full knowledge of

    hadoop.
    hadoop training institutes in

    hyderabad

    ReplyDelete
  3. A microwave oven can as well warm food in some countable seconds while still in its plate as served before unlike using utensils such as pans where one is supposed to use a cooking utensil to warm food. www.bestmicrowavesguides.com

    ReplyDelete
  4. However, while it may be preferable to buy local, many areas no longer have department or discount stores which carry such cookware.
    Dutch Oven Cookware

    ReplyDelete
  5. On the off chance that you have wide feet, you will think that its more hard to discover shoes that fit, and the same goes for little feet. In any case, your shoe shopping can be simple once you consider certain components. storage.googleapis.com/renewalcoupon/index.html

    ReplyDelete
  6. The availability of a nutritional diet has become a reality because of the weight loss meal delivery providers. http://www.deliciousfoodgroup.com

    ReplyDelete
  7. Most importantly, this analysis shows the workflow and data flow, and how people work together and collaborate-and how that can be improved with the data warehouse development.google dashboard

    ReplyDelete
  8. he fat loss achievable on a ketogenic diet is nothing short of staggering! And, despite what people might tell you, you will also enjoy incredible high energy and overall sense of well being click for more

    ReplyDelete
  9. I am very depressed by having information connected drawback on my project I even have the task to convert the massive quantity of tabular knowledge into the short and valuable visual kind and that I don’t have any plan for doing this task so suddenly I check the Google and realize this Activewizards blog for obtaining the simplest data person that are expert in changing the info into visual kind and are the info person and my task is completed by them.

    ReplyDelete
  10. Dashiki Store sale white dashiki shirts and black dashiki shirts which suitable for dashiki for men and dashiki womens. dashiki

    ReplyDelete
  11. Excellent article. Very interesting to read. I really love to read such a nice article. Thanks! keep rocking. Mason Jacob

    ReplyDelete