Hadoop is a framework, written in Java, for handling large datasets. According to the Apache website:
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
Aniket Maithani has worked out how to install this powerful data processing software on the Pi. Now, he’s from India so his English isn’t perfect, but the guide he has written is comprehensive enough that you should be able to get Hadoop running without much of an issue. He’ll be blogging fairly soon about creating a distributed data processing architecture using multiple Pis.