On January 1st, 2009, dreamhost.com had a 95% off sale. You could purchase a two year web hosting plan that included shell access to a Linux server, unlimited users, and unlimited storage space for $20.00. Greg McAlpin (OfADifferentKind), a software developer in Houston, TX area, bought a two-year subscription with vague notions of setting up a website some day.
On February 25, 2009, Greg's Probe File Exchange website went online. It was an invitation-only website where members could upload their probe prediction files and see how their score in the Netflix Prize contest might improve if they were to combine their results. Probe prediction files are files that competitors could use to measure the effectiveness of their algorithms. Netflix supplied a suggested set of probe data. Since we were all using the data Netflix suggested, it was easy to compare our results.
Greg invited six people to join the Probe File Exchange. They were not chosen because they had the lowest scores. They were chosen because they were all active on the Netflix forum and their posts were consistently helpful, friendly, and funny. They were chosen because they are the sort of people that you want to work with. The Netflix forum ( http://www.netflixprize.com//community/) is the place where competitors could ask questions and help each other. There has been amazing openness in the forum. People have shared everything from ideas to source code. Five of the six people who were invited on February 25th are now members of The Ensemble.
On the first day that the Probe File Exchange was online, Bo Yang (Newman !) proposed to Greg that they create a new joint team. Bo and Greg went on to form the team "Newman and George !". They hoped that a submission of their combined files would have an RMSE lower than 0.8712 (the 2007 progress prize RMSE). RMSE, or Root Mean Squared Error, is a way of measuring the average error for a set of predictions. On February 27th, Newman and George ! made their first submission with an RMSE of 0.8689.
In order to share files, members created directories on the same Linux server that was hosting the website (on dreamhost.com). That original setup grew into the infrastructure that allowed Vandelay Industries ! to easily support many members.
Bill Bame (clueless) began uploading files the first day that the Probe File Exchange was online. He has always had extremely creative ideas and unique approaches. The files that he uploaded to the Probe File Exchange combined extremely well with those of Newman and George !. On February 26th, Bill was invited to a new team named "Newman, George, and Peterman !".
The Probe File Exchange had its own private forum where members could share ideas. Chris Hefele (chef-ele) posted some information about the non-linear ways that he used to combine files. The most common way for competitiors to combine files is linear regression. That's a mathematical way of taking many points and finding the line that passes nearest to all of the points. Nonlinear regression is much more complex. It attempts to find a curve that passes closest to all of the points. The results that Chris achieved were extremely impressive. On March 12, 2009, Chris was invited to join the team. He was going to be "Bania", but the team name was growing too long and the "Newman and ... !" teams were all on the front page of the leaderboard.
So a new team "Vandelay Industries !" was formed. The name "Vandelay Industries !" is of course a whimsical reference to "Seinfeld", as is our goal to become a coalition "for the rest of us" who are not at the top of the leaderboard. Chris continued to develop his blending techniques and he continued to produce amazing results. He is one of the main blenders on The Ensemble.
In March, George Tsagas of Feeds2 was invited to join Vandelay Industries !. He answered "not yet". He was already part of one of the leading teams and he said that there would be time to make collaborations when the leaders' improvement neared the 10% mark. Feeds2 is now a member of The Ensemble.
During March and the beginning of April, Vandelay Industries ! continued to make almost daily progress. In May, Bo made huge improvements in his personal score. With his improvements Vandelay Industries !, made up of four people working in their spare time, reached 15th place on the leaderboard among 5000+ teams.
Vandelay Industries ! was started by sending out emails to strangers asking if they wanted to work together. The team made contacts with other top teams and started dialogs with them. The person who gave the most help and encouragement was Larry Ya Luo (Dace). Larry/Dace is also the highest ranked single-member team on the Netflix Prize leaderboard. There was some disagreement about how Vandelay Industries ! should recruit new members. Some thought that we should contact teams lower than us on the leaderboard. They would be more likely to work with us. And we had already seen that a few people with no previous experience could achieve quite a bit by working together. There was hesitation about contacting the top teams on the leaderboard because Vandelay Industries ! really had nothing to offer them.
But the possibility that someone might turn us away has never deterred the team. We asked Larry if he would mind downloading our probe files and seeing how they mixed with his. He accepted, downloaded our files, and did significant analysis of them. Even though our files could barely improve his own, he offered suggestions for how we could make improvements and what he thought we needed to do to reach the top 10 on the leaderboard. Each time that Vandelay Industries made a significant improvement, Larry would look at our files and try to help us.
In June, Jeff Howbert (team Howbert) contacted Bo about combining efforts, and Jeff joined Vandelay Industries !. The team was preparing a new submission and was quietly confident that Vandelay Industries ! would get into the top 10 on the leaderboard for the first time. Then BellKor's Pragmatic Chaos made their submission that made a 10.05% improvement.
Immediately Vandelay Industries ! began sending emails to all of the top teams, inviting them to join or cooperate with Vandelay Industries !. Larry was one of the first to agree to join our team. Others followed. The infrastructure that we had in place made it simple for us to add more teams. People were able to quickly integrate into the team and become productive.
As the final moments of the competition approached on July 24th 2009, Greg McAlpin (OfADifferentKind) and Christopher Hefele (chef-ele) reflected on what the contest meant to them, the unique qualities Vandelay Industries ! offered The Ensemble, and what's next for the group.
"Joining with Grand Prize Team to create The Ensemble put us in the incredible position of making a 10% improvement over the Cinematch program that Netflix uses" said Greg Mcalpin. Larry has said it well: our goal was to make a 10% improvement. When we do that, we'll have finished successfully with a job well done. "A million dollars isn't why we've worked so hard", Greg says, "at the beginning of the contest, a lot of people said that it would be impossible for anyone to reach the 10% improvement. From February until now, in six months, this group has done the impossible." Greg continued: "if we come in second place or last place, it has been fun and it's been an awesome experience working with the great and brilliant people on this team."
"The merged team's name 'The Ensemble' not only refers to the large group of team members that's been merged together, but it's also a reference to "ensemble methods,'" says Chris/chef-ele, "which is the term researchers use for the techniques we're using to combine our individual predictions into a group prediction that is better than any of the individuals. "
"Next, although some of our teammates have formal backgrounds in machine learning, they're working side-by-side with many others who were drawn to this problem as an interesting hobby or puzzle", Chris/chef-ele says "it's like a data-miner's Rubik's Cube...very addictive."
"It's my opinion that the successes of this team is not only being driven by the technologies we're using to combine or data, but also by our ability to combine many people together & create a cohesive, functioning team in less than 30 days", Chris continued, "So our successes will be not only technological, but also organizational. It will be interesting to see if a large group of underdogs can defeat a small group of the leaders."