:: Amila Manoj's Blog ::: January 2012

As the technology advances and research areas widen, demand for computational power increases day by day. Such computational demands can be mainly observed in several categories. Physical simulations from molecular level to universe level, analysis of large data from optical telescopes, gene sequencers, gravitational wave detectors, particle colliders and biology-inspired algorithms are some of those categories.
These tasks require high-performance computing (HPC). So one solution is to use supercomputers. But typically, the rate of job completion is more important than the turnaround time of individual jobs since overall result is what’s useful. The term to refer to that idea is high-throughput computing.

To achieve high-throughput computing, distributed computing is a better approach since individual job can be processed in parallel in large quantities.
Available distributed computing options are:
○ Cluster computing - dedicated computers in a simple location.
○ Desktop grid computing - PCs within an organization as a computing resource.
○ Grid computing - sharing computing resources by separate organizations.
○ Cloud computing - a company selling access to computing power
○ Volunteer computing

Volunteer computing (also sometimes referred to as global computing) uses computational power volunteered by the general public to perform distributed scientific computing. Volunteers may include individuals as well as organizations such as universities.
This approach allow ordinary Internet users to volunteer their computer resources on idle time by forming parallel computing networks easily, quickly and inexpensively without needing expert help.

Typically when it comes to volunteer computing, the volunteers who contribute with their resources are considered to be anonymous, although some volunteer computing frameworks may collect information like a nickname and email address of the volunteers for the usage of credit system, etc.

Each of the distributed computing paradigms have different resources pools. For example, number of computers owned by a particular university when it comes to grid computing, and the number of servers owned by a company in cases of cloud computing. The number of total possible personal computers is the resource pool in the case of volunteer computing.

To understand the importance of volunteer computing, we have to consider its resource pool.
The number of privately-owned PCs around the globe is currently estimated as 1 billion and is expected to grow to 2 billion by 2015. Also, the resource pool is self-financing, self-updating and self-maintaining. Users buy and maintain their own computers. Therefore various costs associated with other types of grid computing do not apply to volunteer computing. Another important point is that consumer market adopts the latest technology quickly. A supercomputer or a computing grid cannot be replaced or upgradedeasily as newer technologies emerge. But the typical PC user can. For example, the fastest processors today are GPUs developed with computer games in mind. Due to these factors, we can state that volunteer computing has a huge potential for world computational needs.

Berkeley Open Infrastructure for Network Computing (BOINC) is the predominant volunteer computing framework in use.

Some of the other volunteer computing frameworks are:
○ Bayanihan Computing Group
○ JADIF - Java Distributed (volunteer / grid) computing Framework
○ Javelin Global Computing Project
○ XremWeb Platform
○ Entropia

Here is a list of most active volunteer computing projects as of January 2012.
○ SETI@home: Search for extra-terrestrial life by analyzing radio frequencies emanating from space
○ Einstein@home: Search for pulsars using radio signals and gravitational wave data
○ World Community Grid: Humanitarian research on disease, natural disasters, and hunger
○ Climateprediction.net: Analyse ways to improve climate prediction model
○ Folding@home: Computational molecular biology
○ LHC@home: Improve the design of the Large Hadron Collider and its detectors
○ Milkyway@home: Create a highly accurate three-dimensional model of the Milky Way galaxy using data collected from the Sloan Digital Sky Survey
○ Spinhenge@home: Study nano-magnetic molecules for research into localized tumor chemotherapy and micro-memory
○ PrimeGrid: Generate a list of sequential prime numbers, search for particular types of primes
○ Malariacontrol.net: Simulate the transmission dynamics and health effects of malaria

(This post includes citations from several sources and aims to summarize volunteer computing)

Apache Thrift is a software framework for scalable cross-language services development. It was originally developed by Facebook before it was donated to Apache Software Foundation.

Download the stable release

Unpack the tar.gz archive to a directory you prefer
(say home /home/amila/apacheThrift)

tar -xvzf thrift-0.8.0.tar.gz

You need JDK and Apache Ant at least to run Thrift's Java tutorial.
(You can refer to my previous posts to find how to install JDK on ubuntu.)

Use apt-get to install Ant

sudo apt-get install ant

We first need to install Thrift compiler and language variables before we start developing we Thrift.

There are several required packages to install Thrift that are not installed on a linux distribution by default.
To install those

sudo apt-get install libboost-dev libboost-test-dev libboost-program-options-dev libevent-dev automake libtool flex bison pkg-config g++ libssl-dev

Goto top level directory of unpacked thrift distribution
(eg: /home/amila/apacheThrift/thrift-0.8.0)

./configure

During this process, thrift will scan and list the different language found.
It should say:

Building Java Library ........ : yes
Using javac .................. : javac
Using java ................... : java
Using ant .................... : /usr/bin/ant

..along with other languages found on your computer.

However, to configure Thrift for all those languages, you may need to install additional packages

Now you can make Thrift:

make

You might get some error if all required libraries for the languages configured in above step are not present.

In that case, you can deselect the packages you don't need when configuring
For example, say you don't need the support for Ruby. When configuring, you can use:

./configure --without-ruby

(I had to deselect erlang libraries to get it working on ubuntu 11.04)

After make is completed successfully, install Thrift by,

sudo make install

To check if the installation is successfully completed:

thift

You should get an output like:

Usage: thrift [options] file
Options:
  -version    Print the compiler version
  -o dir      Set the output directory for gen-* packages
               (default: current directory)
  -out dir    Set the ouput location for generated files
.........
.........

Tutorial are located at ./tutorials directory.

There you will find two files tutorials.thrift and shared.thrift

.thrift files describe the interfaces (IDL) in terms of the classes, methods they include.

thrift -r -gen java tutorial.thrift

This will create a directory named "gen-java" inside your current directory which will include generated Java classes according to specified thrift file.

Now goto the directory "java" inside the current directory (tutorial) and execute ant.
The ant script will compile both generated source files and the source file inside java directory and build a jar file.

Finally, run the tutorial by:

thrift/tutorial/java$ ./JavaServer
thrift/tutorial/java$ ./JavaClient

You may also find this page useful.

:: Amila Manoj's Blog ::

Saturday, January 14, 2012

Volunteer Computing: An Introduction

Friday, January 6, 2012

First Steps of Apache Thrift with Java in Linux