7.31.2015

Data science means shopping and plucking, not just cooking.

Many people think that data science works like this:


But that’s not the whole picture---not even close.

Unless your data pipeline is quite mature, your data is probably more like this.



Unstructured, uncleaned. Still very messy.

Your whole data set is probably more like this:




It contains some of the key ingredients, but not all of them.

I can’t make cookies out of mustard. And I can’t make them out of just chocolate chips and vanilla, either.



Most of the time, building great data products requires shopping and plucking, not just cooking. You can’t cook a great meal until your fridge is stocked with the right ingredients.

Bottom line: your "secret sauce" isn’t an algorithm. It a combination of data cleaning, processing, and curation—plus a judicious choice of the right algorithms.

(Even if you have the right ingredients, you can’t boil your way to good cookies.)




That means you want to work with data scientists who understand the whole process of shopping, plucking, and cooking good data products. If you hire analysts or machine learning specialists who don't know how to pluck and shop, you're going to either (1) get stuck baking mustard cookies, or (2) put a heavy burden on your engineering team to grab and process new data. (1) is yucky. (2) is very slow.

It also means that you don't want to constrain your data scientists to only use the ingredients you already have in your kitchen. You should expect a good data scientist to improve your options by looking for more ways to bring in more data. ("Hm. No eggs. Before we go any further, we're going to need some eggs." "These cookies are okay, but they'd be much better with a dash of cinnamon.")

Practically speaking, "more ways to bring in data" includes things like
  • additional instrumentation within your app/website
  • mashups with public data sources
  • feedback mechanisms within your app/website (e.g. additional profile fields)
  • hand-curated data sets to clean and normalize large data feeds
  • merging in additional sources of user feedback (e.g. customer support tickets)
  • user surveys or interviews
  • etc.

In conclusion, three cheers for cookies!



PS: I’m not saying you should wait for all the perfect ingredients to begin. Great data science usually involves smart sequencing—rapidly learning which data streams add the most value, and developing the systems to gather and process them effectively. Make sugar cookies for now, and add the chocolate chips as soon as you can get them.

PPS: Peter Norvig says that “more data usually beats better algorithms." I’m not disagreeing. Instead, I’m pointing out that at any given point in the life cycle of a data product, your volume of data is more or less fixed. Great data science is about working within that constraint, creating useful data products with the tools and ingredients that are close to hand, and bootstrapping yourself up to the next level.



PPPS: There’s another layer to this conversation: developing the tools (and culture) to enable rapid exploration and deployment of data products. It’s a bit like making sure your kitchen is equipped with a food processor, not just a microwave. But I think this metaphor is already strained enough, so we’ll save that conversation for another day.

Image credits



13 comments:

  1. data scientist online training is already a popular subject for to explore for fresh graduates in India as many students from Europe, South America and Australia are also showing more interest in getting training from Indian centers.

    ReplyDelete
    Replies
    1. The development of artificial intelligence (AI) has propelled more programming architects, information scientists, and different experts to investigate the plausibility of a vocation in machine learning. Notwithstanding, a few newcomers will in general spotlight a lot on hypothesis and insufficient on commonsense application. machine learning projects for final year In case you will succeed, you have to begin building machine learning projects in the near future.

      Projects assist you with improving your applied ML skills rapidly while allowing you to investigate an intriguing point. Furthermore, you can include projects into your portfolio, making it simpler to get a vocation, discover cool profession openings, and Final Year Project Centers in Chennai even arrange a more significant compensation.


      Data analytics is the study of dissecting crude data so as to make decisions about that data. Data analytics advances and procedures are generally utilized in business ventures to empower associations to settle on progressively Python Training in Chennai educated business choices. In the present worldwide commercial center, it isn't sufficient to assemble data and do the math; you should realize how to apply that data to genuine situations such that will affect conduct. In the program you will initially gain proficiency with the specialized skills, including R and Python dialects most usually utilized in data analytics programming and usage; Python Training in Chennai at that point center around the commonsense application, in view of genuine business issues in a scope of industry segments, for example, wellbeing, promoting and account.


      The Nodejs Training Angular Training covers a wide range of topics including Components, Angular Directives, Angular Services, Pipes, security fundamentals, Routing, and Angular programmability. The new Angular TRaining will lay the foundation you need to specialise in Single Page Application developer. Angular Training

      Delete
  2. Hadoop training institutes in hyderabad .All the basic and get the full knowledge of

    hadoop.
    hadoop training institutes in

    hyderabad

    ReplyDelete
  3. Big data and data warehousing related information is always updated to me at hadoop online training
    in hyderabad. Nice insight on the topic refer the details at
    hadoop online training

    ReplyDelete
  4. A microwave oven can as well warm food in some countable seconds while still in its plate as served before unlike using utensils such as pans where one is supposed to use a cooking utensil to warm food. www.bestmicrowavesguides.com

    ReplyDelete
  5. However, while it may be preferable to buy local, many areas no longer have department or discount stores which carry such cookware.
    Dutch Oven Cookware

    ReplyDelete
  6. On the off chance that you have wide feet, you will think that its more hard to discover shoes that fit, and the same goes for little feet. In any case, your shoe shopping can be simple once you consider certain components. storage.googleapis.com/renewalcoupon/index.html

    ReplyDelete
  7. The availability of a nutritional diet has become a reality because of the weight loss meal delivery providers. http://www.deliciousfoodgroup.com

    ReplyDelete
  8. Most importantly, this analysis shows the workflow and data flow, and how people work together and collaborate-and how that can be improved with the data warehouse development.google dashboard

    ReplyDelete
  9. he fat loss achievable on a ketogenic diet is nothing short of staggering! And, despite what people might tell you, you will also enjoy incredible high energy and overall sense of well being click for more

    ReplyDelete
  10. I am very depressed by having information connected drawback on my project I even have the task to convert the massive quantity of tabular knowledge into the short and valuable visual kind and that I don’t have any plan for doing this task so suddenly I check the Google and realize this Activewizards blog for obtaining the simplest data person that are expert in changing the info into visual kind and are the info person and my task is completed by them.

    ReplyDelete
  11. It is truly a great and helpful piece of info,Excellent blog,Thanks for sharing. Instant Pot IP-DUO60 7-in-1 Pressure Cooker Review

    ReplyDelete
  12. Dashiki Store sale white dashiki shirts and black dashiki shirts which suitable for dashiki for men and dashiki womens. dashiki

    ReplyDelete