(Alright, for HackerNews, here’s the TL;DR; version: Built a Twitter Analytics weekend-project, started charging, money comes steadily without advertising, for months now, celebrities sign up, major investors sign up, etc. / Problem: anyone with more than 500k followers will wait days, some wait months to be processed and allowed to see results. Distributed architecture, S3/Dedicated Servers/MYSQL+SQlite+Twitter API)
FRUJI.com has become a very popular service without any marketing or advertising work on my behalf. It just started as a lazy weekend project, I sold PRO accounts to see if anyone would pay for them and people did, lots of them. It doesn’t get better than this. The code was re-written from scratch 3 times now, I started all over and pushed out new versions of it to keep up with demand, with Twitter API changes, with the ugly UI and other problems. There even is an iPhone version that I haven’t had a chance to update.
Now, with all this, you’d have a perfect startup many try to establish. It happened while I was busy working on things I’d considered to be a real startup, this was just a weekend project. Not anymore. In fact, it’s got a lot of potential and a lot of users requesting more features, smarter features and they are willing to pay for it. A perfect perspective.
Except, Twitter.
Twitter has been ruthless and tough on developers with their API limits. They are trying to allow businesses to built on top of their platform, but are unwilling to charge nor adopt to requests made from the outside. I’ve found ways to work with their API limits (without violating policies or rules like many others do), but I’ve hit a nasty bug (after two others I got them to fix) now that they are currently not willing to fix. Here goes:
Twitter allows me to issue a certain number of API calls to a specific endpoint in a specific time frame. All the limits are on their API pages, but here are some made up example numbers for illustration:
Say, I get to call their API a 100 times per hour (it’s more than that, but as said, example).
Now, with every one of these 100 calls, I can request detail records for a user’s followers. What’s more (and crucial), I can request up to 100 follower detail records per 1 API call.
So, say JOHN has 5000 followers. I would need to issue 50 API calls, requesting 100 follower details with each call. That means I can process 10000 follower records per hour (I can do more, but again, example).
So, this Twitter’s official limit. I’m mostly OKish with that.
Here’s the bug they won’t fix:
Twitter has smart developers, so they implemented a time-out. Say, I request 100 follower details for the user JOHN. The Twitter API goes into the databases, fetches all these records while it has me waiting. This requires a bit of time (up to 5 seconds), sometimes it goes fast, sometimes it takes a bit. I am fine with waiting. But Twitter is not. They time out and drop the connection once 5 seconds or so have passed. This means, the API never got back to me with any result. But, and this is the bug, they charged me 1 API call for it. Well, what’s one call you say? They suggest to issue the same call again, but with less than 100 follower details requested. Ok, so my algorithm issues another API call and requests 50 records. Time out. Hm. I go back and request 10. Time out. Hmmmm, what? I go back and request just the first, just one follower record for the user JOHN. I receive the result (or sometimes, if their record is damaged, I receive a dead record). So I scale back up to 100 follower requests for the next call. It goes through. Next 100, fails. I scale down again … you get the idea.
Problem is this: In order to process, say, an account with 250.000 followers, I need at least one full day.
That’s one day, with the user waiting to be logged in and seeing results after signing up.
Now, I’ve had a couple of celebrities that I can’t name here sign up (ask me via e-mail and I’ll send you the list), all of them having way over 1M followers. Guess how long it takes the tool to work through their accounts? 4-5 days? No. Unfortunately not (and even that amount of time would suck in terms of service experience).
It takes up to 60 days or more.
The larger an account is (especially 1M+), the more damaged records, the more time outs (it mostly resorts to 1 API call = 1 follower detail record, a hundredth of what Twitter tells me I can have).
Most celebrities who signed up for FRUJI, haven’t seen the results page yet, and some have waited for over 2 months now.
This sucks!
So, partial results you say? Well, here’s where the complications come in. Back in the day, I had one large MYSQL database containing all follower records. Once I had JOHN’s 5000 follower details, I put them into a large table. This table grew to well over many gigabytes of data (slowly duplicating Twitter’s database) and constantly crashed. Then with more users signing up and especially spikes caused by blog articles, the service was down for days in row, while I tried to repair the database.
My fix:
So I came up with a smarter solution. Every PRO user has his own SQLite database now. It’s stored safely on S3. Then, every night, a cron triggered PHP file downloads the SQLite from S3 to a performant dedicated server, and checks back with Twitter for each individual follower record, if that person is still following the user. If not, I’ll be able to figure out the Followed You / Unfollowed You tables. Also, it helps me keep track of my Most Popular as well as Most Valuable Followers by adding new ones to this list.
This server is cut off from the web server you see when you open FRUJI. Large amounts of visitors should not impact the crawler service. So I came up with a (I feel it’s pretty smart) different way of separating the user from the data. The user is redirected to a basic HTML page on Amazon S3. So Amazon takes the traffic. Then, for authentication, I use a dedicated server that runs the session details and account authentication through a slim MySQL instance. That server then outputs very light-weight, basic HTML data for the user.
The trick is this: These HTML pages (the results) are empty mostly, but have one JavaScript call to pick up JSONP data from S3 for that particular user. So all results (anything that contains data on FRUJI) is actually pre-rendered and waiting on S3 for every user. So the user’s JavaScript / Browser session requests all heavy data from S3. Again, my server is out of the traffic loop, so perfect scenario.
So this means that every night, the crawlers are re-generating new reports and auto-upload them to S3.
So whenever you see data on FRUJI, it’s manipulated/ordered through JavaScript, but pre-rendered and can not be queried dynamically. If I change the code, it’ll require a night to re-render for everyone.
Basic users can render their reports manually and we don’t keep their SQLite databases (they are being re-generated every time). For PRO users it happens automatically and we keep, refresh and re-upload their SQLite databases to S3, so we can track followers/unfollowed data.
So, how do we solve the problem with large Twitter accounts having to wait up to 60 days or more? If you have an idea, e-mail me at: office@twentypeople.com and I’d be happy to work with you on making this happen.
We can do a 50/50 split on upcoming FRUJI PRO accounts (this is all I can pay right now, since it’s all I get semi-regularly). I have reached my technical limits and would love some serious help.
You’ll get access to all systems and can party like you want on it.
We’re shutting down any job-related components of twentypeople.com effective immediately and here is the what and why:
I quit my job at Microsoft effective February 2011. In 5 months, this will have been two-years ago. I left with the intention to revolutionize whatever I felt was currently being exploited by an old-fashioned, boring and greedy industry. There’s the real-estate business that springs to mind, as well as the recruiting industry. Knowing how tough the real-estate industry has been to crack over the past years (how did you find your apartment? A friend? A long list of apartments on a heavily advertising-packed website?) – I figured I might try to push the next iteration of matching great talent with great employers. It should be easy, I’ll look at the friction points, I’ll look at where the money is coming from and who tries to make sure this never changes. Then I’ll push out the revolutionary patch for all of this and party on it.
Turns out, people behave in patterns and boy do they love those old-fashioned patterns. I am not a fan of charging companies a frivolous amount of money for allowing them to be one tiny entry in a long list of similar entries (I am looking at you, monster.com - and EVERYONE else). Quite the opposite actually, I felt it was an ancient form of advertising, simply put on the web in 1998 and not changed a bit since then.
Facebook built social graphs and has become pretty successful with it. People love to explore their social circles virtually even more than in real life. So the approach with twentypeople.com was simple: Why not build a skill-based graph of a person and use this to connect them with either like-minded or perfectly suiting employers? I wanted to stay away from forcing people to build up another social graph (“Hey, do you want to add your address book?”) simply because there was no reason for doing this other than pure viral fuel that I did not want to implement at the cost of user experience.
Then, post-launch, people kept asking for job boards. I kindly said that this is not how it is supposed to work. People told me again that they want job boards and long lists of jobs, long long lists. So we built it. People sort of loved it and started applying with all the companies. Ok, great - except, nobody felt like it’s worthwhile filling out those skill graphs anymore and rather just bomb any potential e-mail address with their CV. Lots and lots of classic »Hey I am from (RANDOM COUNTRY YOU HAVE NEVER HEARD OF), I want to work in your country and I am great and your company is great, write me, bye.«
Then, companies started contacting me, whether I’d be interested in recruiting / headhunt on their behalf for a modest fee. The fees were very much more than just modest and I was in conflict of becoming part of the dark side industry I so dearly wanted to eradicate. So I figured, to understand the enemy, I could try to imitate their act and possibly understand the industry more. So I started recruiting and made great money in no time. I am not even kidding. You can earn thousands of dollars in a weekend by playing your contacts or scanning social networks really well.
After life on the dark side, I saw where the real problem was: Great talent is scarce (we knew that), but even more: it’s actually so rare and hard to find that companies are using job boards not as a form “best way of doing things” but much rather a call for hope that just anything might flush in just a tiny bit of good talent. This is why charging $2-5k per hit / for recruiting work (career hitman as I often referred to this type of work) was rarely challenged by a ‘ugh, we can’t afford this’. Anything that brings talent is money well spent.
Great, so we started revolution part 2. Distribute hiring/recruiting bounties to our members. Companies tie a recruiting bounty to a job, our smart algorithms help track and monitor who is running is social networks in what way through which people to finally produce a candidate that is being hired. That person (depending on the payout model) and possibly his friends/contact get the lion share of the bounty, we’ll charge a tiny fee and a market was established. Except, nobody felt this was a market worth investing time. There’s little recruiter spirit in the masses it seemed. I felt that through the power of social networks, recruiting might become the absolute easiest task in the world. Add a job, tie it to a bounty, distribute with our most effective hunters and their networks. Receive lots of candidates, notify us of who got hired - we’ll run the payout. Nah, that didn’t work either.
For this to work, we should have focused on becoming just that: A service to make money as a freelance recruiter. Nothing else. No skill badabingbadathat, just a marketplace.
So, after the rewards idea didn’t fly well on twentypeople.com, we launched a microsite called Pareer. Core focus: Jobs with rewards, social relay tracking, earn on hire.
Did not produce the desired results. Don’t get me wrong: All of the above had users and people who loved it, but we’re talking potential here, great upwards potential. And I literally was unable to see any of it.
So, TL;DR: Effective today, we’re shutting down any job-offering, job-hunting and recruiting related parts across our network of sites. I personally really wished that this is the time and place to revolutionize the industry and push it forward just a tiny step but none of that happened through our efforts here. There will be a time for all of this, but as they say, the time wasn’t right, or as the realist in me quietly whispers: We weren’t what the moment needed. I hope somebody else is able to execute on this.
As for twentypeople: We are currently coordinating a shutdown procedure for the job related components. The core assets will be maintained and iterated into a new strategy. Your profiles remain intact, your information is and was safe and we are truly curious to hear what you think is something we should be doing next, around all the skill profiles our thousands of users have carefully built. Contact us anytime at: support@twentypeople.com
So, in the spirt of the title, we believe that burning out the job-related assets is what we need to do right away. There’s no point in letting this fade any more. For everything else, we’ll leave on the lights until we find a smart way to shine again.
Thanks for all the time and hundreds of messages. We read every single one of them. And for those who found a job through our help, we wish you all the best of success in the future. Negotiate hard, stay on top of your game. Good luck.