LIBBLE-PS

Introduction

LIBBLE-PS is the LIBBLE variant implemented with the Parameter Server framework. The communication mechanism of LIBBLE-PS is based on Message-Passing Interface (MPI). LIBBLE-PS provides both data-parallel and model-parallel programming models.

The current version of LIBBLE-PS includes the following machine learning algorithms:

  • Classification
    • Logistic Regression (LR)
    • Support Vector Machine (SVM)

Empirical Comparison

The main Learning Engine for LIBBLE-PS is based on a distributed stochastic optimization algorithm called SCOPE (Scalable Composite OPtimization for lEarning). LIBBLE-PS can adopt multiple Servers and Workers in the same Parameter Server framework to achieve both data-parallelism and model-parallelism.

We choose logistic regression (LR) with a L2-norm regularization term to evaluate LIBBLE-PS and other baselines. The result on webspam dataset (350,000 points, 16,609,143 dimension) is shown below. Here, PS-Lite (AsySGD) and PS-Lite (SspSGD) are asynchronous SGD and bounded delay SGD based on Parameter Server proposed in [Mu Li, et al. OSDI 2014]. PS-Lite (AsySGD) and PS-Lite (SspSGD) are implemented based on PS-Lite provided by the authors of [Mu Li, et al. OSDI 2014]. Most other existing Parameter Server baselines have similar performance as PS-Lite. Here, we do not report the performance of them. LIBBLE-Spark is the SCOPE implemented on Spark. LIBBLE-PS (1 Server) is the LIBBLE-PS with one server, and LIBBLE-PS (2 Servers) is the LIBBLE-PS with two servers. All the methods adopt 16 Workers.

We also report our methods’ speedup compared to PS-Lite (AsySGD). The time is recorded when the gap between the objective function value and the optimal value is less than $10^{-4}~(10^{-5})$. The result of logistic regression with L2-norm on webspam dataset is shown below. All the methods use 16 Workers.

How to use

We provide partitiondata.sh to help partition and distributedly store data. The usage method is:

  ./partitiondata.sh [data file] [number] [host file]

LIBBLE-PS uses MPI to communicate. The cluster must deploy MPI. Download LIBBLE-PS and then:

  cd app/* && make

Run LIBBLE-PS by using Python:

  Python *.py

Use the python file (*.py) to edit algorithm Configuration.

Open Source

Development Team