Distributed Asynchronous Methods for Minimizing Sums of Non-Separable Functions

Recently, there has been a flurry of research activity for study of non-separable network optimization prob- lems. The problem formulation is very general and captures many important problems, such as parameter estimation in sensor networks and pattern recognition in machine learning settings. Due to the versatility of the problem formulation, fast distributed algorithms for this problem can provide drastic improvement in the system wide performance for many applications. The main challenge is to develop a distributed algorithm, which is necessary in many large scale applications. This is hard due to the sharing of the common variables and thus the optimization can not be easily decomposed.

Mathematically speaking, the optimization problem can be formally formulated as:

where each function fi is locally known to agent i, representing local objective cost. The objective functions are convex but not necessarily differentiable. For example, in a machine learning setting, each fi corresponds to a local data measurement. The agents are connected via an underlying network and each agent can only communicate with their neighbors in the network.

In one of our recent works, we have shown that under mild assumptions, local agents can compute their own estimates of the minimum and by locally communicating such estimates among neighbors, a consensus in the estimates is achieved and convergence to an (approximately) optimal solution is guaranteed. The algorithm proposed for this problem is iterative and is developed based on Alternating Direction Method of Multipliers (ADMM). All the computation can be done in an asynchronous way, meaning no central clock is required and at each time instance, only a subset of the agents is required to update. Under mild assumptions, we have shown that this method converges to the optimal solution with the best known rate O( 1 ), where k is the number of iterations

Another approach is to develop distributed algorithms that use first-order (gradient or subgradient) information, which is less computationally intensive to obtain. In particular, we designed a distributed proximal-gradient method that achieves the convergence rate of O( 1 ), taking advantage of both acceleration techniques in centralized methods, and multiple network communications steps between optimization com- putations. Among its applications, a particular example is distributed regularized regression, which allows networked agents, each with their own data, to collectively learn parameters of a model that describes the aggregate data, without actually aggregating or sharing data.

Related Files