I came across Ben Horowitz’s post on Staying Great that I found extremely
informative when it comes to building an executive team. Most of Ben’s posts in
general are illuminating for people trying to walk the line beween managing
produts and people, but the last paragraph of this post drives home the onus
executives have to their employees, stakeholders, and company:
Finally, what about being loyal to the team that got you here? If your current executive team helped you 10X your company, how can you dismiss them when they fall behind in running the behemoth they created? The answer is that your loyalty must go to your employees—the people who report to your executives. Your engineers, marketing people, sales people, finance and HR people who are doing the work. You owe them a world-class management team. That’s the priority.
Your goal as an executive in a tech company isn’t to build an amazing product–your goal is to build a team that builds an amazing product.
I’m currently in New York with no power or
internet, so I’ve decided to knock
off some tasks that’s been sitting in to-do list. I’m currently using
track my progress on things that I want or need to do:
I’ve been on it for the past month or so, and I think it’s been pretty useful.
When I started caring about rigorously tracking my to-do list, I came up with
the following list of features that I wanted from my tracking tool:
- Ability to colloborate with other people (e.g., shared lists)
- Deadline tracking and reminders
- Hooks into Github for personal coding projects
- Moblie alerts
- Historical archiving of finished tasks
- Personal analytics of how fast I’m closing tasks
Six months and five tools later, I’ve updated my feature list to reflect what
actually helps me get things done:
- Sustained adherence to one tool and one methodology, no matter what
This point is trite and beaten to death, but it bears repeating for me, so maybe
someday you might find this post helpful: all the time you spend figuring out
which to-do list, text editor, or mail client to use is time taken away from
actually getting things done. This especially applies if you use any tool that
requires extensive dotfile
I used to justify putting extensive work into my dotfiles and tool selection as
a way of investing in future productivity. What I’ve found though is that it hasn’t
amortized for me personally, and that I reinvent my toolchain months down the
There’s this article that’s being passed
around by Thomas
Davenport and DJ
Patil on the newfound “sexiness” of what I do for a
living, and why data science is here to stay. I’m well behind the curve on this one so
I’ll leave it to you to find the many other analyses and responses to this
article, but I wanted to call attention to the one paragraph that really
resonated with me:
Data scientists don’t do well on a short leash. They should have the freedom to
experiment and explore possibilities. That said, they need close relationships
with the rest of the business. The most important ties for them to forge are
with executives in charge of products and services rather than with people
overseeing business functions. As the story of Jonathan Goldman illustrates,
their greatest opportunity to add value is not in creating reports or
presentations for senior executives but in innovating with customer-facing
products and processes.
A very clear and present danger that exists among companies wanting to
capitalize on the recent influx of data talent is that they’ll want to put these
men and women right away to work on short-term business objectives that add
immediate value to the company. That’s fine and dandy, but what makes this
dangerous is that continued prioritization of these tactical units of work will
only take time away from the long-term strategic research directions that
(competent) data scientists were hired to enact. As a result, shortsighted
companies will fail to realize any long-term value from their data scientists
(now glorified BI analysts), and the data scientists will become increasingly
frustrated with being siloed into a pure analyst role.
I write this as a couple of my peers are exhibiting symptoms of a “failure to thrive” in their respective companies, either because their charter of data creativity and research has been perverted into being a full-time data puller, or because the peer is inflexible in doing anything other than pure research. As a cofounder, you can’t help the latter case (you hired poorly), but you can help the former by:
- Setting expecations for data science hires early on by being writing a very clear list of responsibilities and expectations in your job requisition, and
- Once the data scientist has been hired, staying true to these roles and
I suspect that first point is something a lot of startups haven’t really thought about–they simply assumed “big data” was a thing and that hiring data scientists blindly was going to save us all (hint: it won’t). Likewise, fresh graduate students might assume being a data scientist for a startup is just like working in an academic research lab (hint: it’s not).
One last note: I’m not saying it’s entirely the company’s fault in the paragraphs above–sometimes we all have to grit our teeth and do what’s necessary to fight fires at any given moment in time. All I’m saying is that while the onus falls on data scientists to prove that their research and products add value to a business (Hilary Mason does an excellent job explaining how to prioritize research directions), so too should companies ensure that their developing data teams possess a significant, stated commitment to data science and research.
Otherwise, there are plenty of companies that are doing just fine without data scientists.
I recently took
a look at our users' first name initials at
Causes, and found the following distribution of
first-name initials in our userbase:
What’s interesting is that we see an unusually high number of users whose name
starts with A. Was it actually that unusual though? How could I find out whether
this distribution of first name initials was anomalous?
I initially started with some very creative ideas to construct a model
distribution to compare against: I tried scraping top 1000
names; I had grand ideas to ping 500k
random public FBUIDs, etc. Sometimes though, it’s best to find prior art:
Someone had already asked the same
question on Google Answers. Granted, it’s based off of tweleve year old data, and it’s only for names in the US, but I figured that was a good rough distribution to compare to our 179 million names in our userbase.
The A’s have it
By taking the total percentages from the page and creating an expected value
based off of our user numbers, I came up with this:
I was very surprised at how well our distribution actually fit with the
expected value calculated from a 2000 US Census. That is, except for the
extrordinary high count of A-names. In fact, if we’re to take this census data
on good faith, we’re about 95.8% over our expected value, or almost double the number of
A-names than that in the US. That’s unusually high!
Why do you think we have so many A-names? For those unfamiliar with the product
relaunch, Causes is a platform that allows you to create a pledge, a petition,
or a fundraising campaign for something you believe in. After creating an
action, we encourage you to invite your friends to take action with you. And
just what does that inviter look like?
I’ve whited out the surnames, but the point is clear. With such a nontrivial proportion of our userbase coming from viral channels,
sometimes it becomes necessary to revisit the little things that we gloss over
when shipping a minimum viable relaunch.
If you’ve ever wondered what I use on a daily basis, I was recently interviewed
Setup. The only thing that’s changed from when I
did the interview is that I now have a 4S, after losing my Droid Bionic. Oh
Last week I had a really good Skype converation with Hunter Whitney, someone I met at the R
Users Meetup where I gave my talk a couple of weeks ago. He’s
currently writing a book on data visualization and we talked about all kinds
of data science-y topics, but there was one thing we touched on that I
feel strongly about: that all data scientists are really also data
By this, I don’t mean to say that we write data articles for a non-technical audience (although the
Guardian does this really well). What
I mean is that as data scientists who acquire, parse, filter, mine,
represent, and refine data (totally stealing Ben Fry’s
design), we have to acknowledge the fact that
at every step in this process, we editorialize something on some
level. We have to: Our
job is to turn data into some kind of statistically significant narrative for
people who have neither the analytical background or time to validate our research themselves. It’s in that respect we’re sort of like journalists (without the credentials or beautiful prose, of course).
Maybe I do actually mean that we write data articles for a non-technical
audience, but our audience varies–we create dashboards and internal
reports for various teams, we visualize data in handy infographics for
public consumption, we machine learn datasets for product features, and so on. All I know is that somewhere along the way, I became just as concerned with
what I was trying to communicate as well as what I was hacking. I think
everyone who works with data has at some point come to this same
I gave a “lightning talk” at last night’s Bay Area R Users Group. This
was a format where we had ten minutes to talk about how various ways in which we
use R. I decided to talk about what little progress I made on the Heritage Health
Prize. This was a concerted three-month effort at Courant to learn R, Hadoop, and
data mining . PS: The rankings there are inaccurate now–I haven’t touched it
since the first progress prize in August and I think I’m about 4321748th place
I had a great time–everyone was nice and I had a lot of fruitful discussions
with people after the event.
After spending some time setting up Jekyll, I realized I
was serving my
.git directory as well. Whoops. A quick google search yielded
a pretty cool (but probably simple) trick that allows for handy deployment
without serving embarassing commit comments.
hooks/ directory in your repo, create a file called
containing the following:
GIT_WORK_TREE=/path/to/public_html/ git checkout -f
This is a post-receive hook that will push to my
public_html directory upon
receiving a push from my local computer. I feel like I should’ve known this ages
Blogging is hard.
Well, it’s hard for me because I always feel like I have nothing
meaningful, insightful, educational,
intelligent, poignant, inspiring or
amusing to say without coming off as trite, forced, corny, pompous,
hokey, flat, or just plan stupid (I’m not going to provide examples of those,
but they’re everywhere). All blogs are an exercise in thinly-veiled narcissism,
but the successful ones manage to convince you that they are, in some way,
deserving of their vainity domain and your two minutes of attention.
After several hours of futzing around with Jekyll, we’re now live. The
hardest parts were setting up Disqus, and getting boastful to work (which is
probably premature optimization, given the complete lack of traffic to
Jekyll for fun and profit
I’m lazy, so I pretty much cloned the Jekyll repo:
> git clone https://github.com/mojombo/jekyll.git
Created a .css from 960 grid, and downgraded liquid (as Jekyll doesn’t
play well with 2.3.0:
> sudo gem uninstall liquid
> sudo gem install liquid --version '2.2.2'
The rest is pretty straightforward. Set up a YAML config file:
And start writing posts in markdown!