Do too many Hadoops spoil the code?
At the Hadoop Summit this week, Yahoo released its own distribution of Hadoop, one it says it uses internally.
This comes the same week Cloudera, a commercial Hadoop distributor, closed a $6 million round of financing.
Google was the initial implementer of what is now Hadoop, under the name MapReduce, but Yahoo is now its best known sponsor, since its hiring of project founder Doug Cutting, who named the project for his child’s stuffed elephant.
Microsoft’s Bing search engine also uses Hadoop, which since it breaks all sorts of data sorting records is becoming a lingua franca of cloud computing.
The formal home of Hadoop is the Apache Foundation, and Yahoo is offering its code under the Apache license.
Community reaction to the Yahoo release has been positive, with Savio Rodrigues writing on his personal blog that the community still drives the software.
But some reporters are beginning to ask who is really in charge of Hadoop. Is it Apache or Yahoo? Was Yahoo’s distribution a diss of Facebook, which previously developed its own Hadoop SQL, called Hive?
Most projects have a community and a commercial arm. Hadoop’s importance has drawn a number of corporate sponsors to separately deliver their implementations. Microsoft, Yahoo, Google, and Facebook all have their own takes on Hadoop, alongside Apache and Cloudera.
All these various Hadoops can be seen as a positive or a negative. As a positive, there is growth and momentum for the framework. As a negative, there are many organizations pulling Hadoop in different directions.
In my view all this illustrates strengths and weaknesses in open source. So long as incompatibilities aren’t developed, and tweaks remain compatible at heart, it’s a good thing. If the code base becomes a forum for corporate intrigue and incompatibilities appear, that’s a bad thing.
Which is it for you?
Related posts: