You might be a data scientist if...
As I meet up-and-coming data scientists, I've realized that we share a surprising number of very specific experiences. Here's a list of things of these data science rites of passage, in no particular order.
2. Write a script to send yourself an email.
3. Get emotionally involved in a debate about statistical software (e.g. R vs. python) or graphing libraries.
4. Mess up a git repo by accidentally committing a very large data file.
5. Scrape a website (e.g. ebay, Amazon, IMDB, wikipedia) to answer a personal question.
6. Read a math, stats, or programming book while riding public transportation (train, plane, bus, etc.)
7. Bang your head on a timestamp conversion problem for two hours or more.
8. Train a text classifier, probably using books from project Gutenberg or movie reviews
9. Start writing a poker bot. (Bonus points for actually finishing.)
10. Fill up a piece of paper with times and percentages to estimate when a long-running job will finish.
11. Enter a Kaggle contest.
12. Get back a batch of really bad results from mturk.
13. Set up a dummy account with a web service solely for the purpose of collecting data.
14. Read a math, stats, or programming book in bed.
15. Write a regular expression to avoid a couple dozen copy-pastes.
Probably no one has done all of them (scavenger hunt, anyone?) But they're still common enough that you could grab a handful and train a pretty effective Naive Bayes classifier.
What other features would you add to this model?