12.07.2013

You might be a data scientist if...


As I meet up-and-coming data scientists, I've realized that we share a surprising number of very specific experiences.  Here's a list of things of these data science rites of passage, in no particular order.

1. Word count in MapReduce.
2. Write a script to send yourself an email.
3. Get emotionally involved in a debate about statistical software (e.g. R vs. python) or graphing libraries.
4. Mess up a git repo by accidentally committing a very large data file.
5. Scrape a website (e.g. ebay, Amazon, IMDB, wikipedia) to answer a personal question.
6. Read a math, stats, or programming book while riding public transportation (train, plane, bus, etc.)
7. Bang your head on a timestamp conversion problem for two hours or more.
8. Train a text classifier, probably using books from project Gutenberg or movie reviews
9. Start writing a poker bot.  (Bonus points for actually finishing.)
10. Fill up a piece of paper with times and percentages to estimate when a long-running job will finish.
11. Enter a Kaggle contest.
12. Get back a batch of really bad results from mturk.
13. Set up a dummy account with a web service solely for the purpose of collecting data.
14. Read a math, stats, or programming book in bed.
15. Write a regular expression to avoid a couple dozen copy-pastes.

Probably no one has done all of them (scavenger hunt, anyone?)  But they're still common enough that you could grab a handful and train a pretty effective Naive Bayes classifier.

What other features would you add to this model?

9 comments:

  1. I've done quite a lot of these. However I will never participate in #3 (getting emotionally involved in a debate about statistical software [e.g. R vs. python] or graphing libraries) or any of the other holy wars in computing. Perhaps it is because I extensively use both R and python as well as many graphing libraries (they all have pros and cons), but I believe it's beneficial to know many tools in and out - you want to have flexibility when working with new collaborators. Inclusivity elevates all!

    ReplyDelete
    Replies
    1. I'm not saying it's a good thing, but it sure happens a lot.

      Delete
    2. The development of artificial intelligence (AI) has propelled more programming architects, information scientists, and different experts to investigate the plausibility of a vocation in machine learning. Notwithstanding, a few newcomers will in general spotlight a lot on hypothesis and insufficient on commonsense application. machine learning projects for final year In case you will succeed, you have to begin building machine learning projects in the near future.

      Projects assist you with improving your applied ML skills rapidly while allowing you to investigate an intriguing point. Furthermore, you can include projects into your portfolio, making it simpler to get a vocation, discover cool profession openings, and Final Year Project Centers in Chennai even arrange a more significant compensation.


      Data analytics is the study of dissecting crude data so as to make decisions about that data. Data analytics advances and procedures are generally utilized in business ventures to empower associations to settle on progressively Python Training in Chennai educated business choices. In the present worldwide commercial center, it isn't sufficient to assemble data and do the math; you should realize how to apply that data to genuine situations such that will affect conduct. In the program you will initially gain proficiency with the specialized skills, including R and Python dialects most usually utilized in data analytics programming and usage; Python Training in Chennai at that point center around the commonsense application, in view of genuine business issues in a scope of industry segments, for example, wellbeing, promoting and account.


      The Nodejs Training Angular Training covers a wide range of topics including Components, Angular Directives, Angular Services, Pipes, security fundamentals, Routing, and Angular programmability. The new Angular TRaining will lay the foundation you need to specialise in Single Page Application developer. Angular Training

      Delete
  2. "Fill up a piece of paper with times and percentages to estimate when a long-running job will finish" oh my god yes. Great post!

    ReplyDelete
  3. 3. (Dispense with all rationality and) get emotionally involved in a debate about the ethics of using various "available" datasets.

    ReplyDelete
  4. Thanks for sharing your points. Data science is deep knowledge discovery through data inference and exploration. This discipline often involves using mathematic and algorithmic techniques to solve some of the most analytically complex business problems, leveraging troves of raw information to figure out hidden insight that lies beneath the surface. It centers around evidence-based analytical rigor and building robust decision capabilities. Ultimately, data science matters because it enables companies to operate and strategize more intelligently. It is all about adding substantial enterprise value by learning from data.
    Thanks! https://intellipaat.com/

    ReplyDelete