jueves, 23 de mayo de 2019

Parallel with R. Example with snow

Snow provides support for easily executing R functions in parallel. Most of the parallel execution functions in snow are variations of the standard lapply() function, making snow fairly easy to learn. To implement these parallel operations, snow uses a master/ worker architecture, where the master sends tasks to the workers, and the workers execute the tasks and return the results to the master.


The basic cluster creation function is makeCluster() which can create any type of cluster. snow includes a number of functions that we could use, including clusterApply(), clusterApplyLB(), and parLapply(). For this example, we’ll use clusterApply(). You call it exactly the same as lapply(), except that it takes a snow cluster object as the first argument. We also need to load MASS on the workers, rather than on the master, since it’s the workers that use the “Boston” dataset.

We’ll use snow.time() to gather timing information about the overall execution. We will also use snow.time()’s plotting capability to visualize the task execution on the workers. 

#Snow example
library(snow)
data("Boston")

#cluster define
cl<-makeCluster(7,type="SOCK")
clusterExport(cl, "Boston")

#Implement with clusterApplu function
#Timing information about the overall execution, snow.time function
time<-snow.time(result<-clusterApply(cl,c("ward.D",
                                          "single",
                                          "complete",
                                          "average",
                                          "mcquitty",
                                          "median",
                                          "centroid"),
    function(n.methods) hclust(dist(Boston), n.methods)))
time
#visualize the task execution on the workers
plot(time)
stopCluster(cl)

Results

> library(snow)

> data("Boston")

> cl<-makeCluster(7,type="SOCK")

> clusterExport(cl, "Boston")

> time<-snow.time(result<-clusterApply(cl,c("ward.D","single","complete", "average","mcquitty","median","centroid"),
+                                                         function(n.methods) hclust(dist(Boston), n.methods)))

> plot(time)

> stopCluster(cl)

> time
elapsed    send receive  node 1  node 2  node 3  node 4  node 5  node 6 
   0.07    0.03    0.02    0.04    0.05    0.06    0.06    0.03    0.03 
 node 7 
   0.03 


Reference

McCallum, Q. & Weston, S. (2012) Parallel R. O'Reilly.



No hay comentarios:

Publicar un comentario