通过缓冲通道(Golang)的并发执行进程的油门数
Intent:
I am looking for a means to run os-level shell commands in parallel, but want to be careful to not clobber CPU and am wondering if a buffered channel would fit this use case.
Implemented:
Create a series of Job
s with a simulated runtime duration. Send these jobs to a queue which will dispatch
them to run
over a buffered channel as throttled by EXEC_THROTTLE
.
Observations:
This 'works' (to the extent that it compiles and runs), but I am wondering if the buffer is working as specified (see: 'Intent') to throttle the number of processes running in parallel.
Disclaimer:
Now, I am aware that newbies tend to over-use channels, but I feel this request for insight is honest, as I've at least exercised the restraint to use a sync.WaitGroup
. Forgive the somewhat toy example, but all insight would be appreciated.
package main
import (
// "os/exec"
"log"
"math/rand"
"strconv"
"sync"
"time"
)
const (
EXEC_THROTTLE = 2
)
type JobsManifest []Job
type Job struct {
cmd string
result string
runtime int // Simulate long-running task
}
func (j JobsManifest) queueJobs(logChan chan<- string, runChan chan Job, wg *sync.WaitGroup) {
go dispatch(logChan, runChan)
for _, job := range j {
wg.Add(1)
runChan <- job
}
}
func dispatch(logChan chan<- string, runChan chan Job) {
for j := range runChan {
go run(j, logChan)
}
}
func run(j Job, logChan chan<- string) {
time.Sleep(time.Second * time.Duration(j.runtime))
j.result = strconv.Itoa(rand.Intn(10)) // j.result = os.Exec("/bin/bash", "-c", j.cmd).Output()
logChan <- j.result
log.Printf(" ran: %s
", j.cmd)
}
func logger(logChan <-chan string, wg *sync.WaitGroup) {
for {
res := <-logChan
log.Printf("logged: %s
", res)
wg.Done()
}
}
func main() {
jobs := []Job{
Job{
cmd: "ps -p $(pgrep vim) | tail -n 1 | awk '{print $3}'",
runtime: 1,
},
Job{
cmd: "wc -l /var/log/foo.log | awk '{print $1}'",
runtime: 2,
},
Job{
cmd: "ls -l ~/go/src/github.com/ | wc -l | awk '{print $1}'",
runtime: 3,
},
Job{
cmd: "find /var/log/ -regextype posix-extended -regex '.*[0-9]{10}'",
runtime: 4,
},
}
var wg sync.WaitGroup
logChan := make(chan string)
runChan := make(chan Job, EXEC_THROTTLE)
go logger(logChan, &wg)
start := time.Now()
JobsManifest(jobs).queueJobs(logChan, runChan, &wg)
wg.Wait()
log.Printf("finish: %s
", time.Since(start))
}
If I understand you right, you mean to establish a mechanism to ensure that at any time at most a number of EXEC_THROTTLE
jobs are running. And if that is your intention, the code does not work.
It is because when you start a job, you have already consumed the channel - allowing another job to be started, yet no jobs have been finished. You can debug this by add an counter (you'll need atomic add or mutex).
You may do the work by simply start a group of goroutine with an unbuffered channel and block when executating jobs:
func Run(j Job) r Result {
//Run your job here
}
func Dispatch(ch chan Job) {
for j:=range ch {
wg.Add(1)
Run(j)
wg.Done()
}
}
func main() {
ch := make(chan Job)
for i:=0; i<EXEC_THROTTLE; i++ {
go Dispatch(ch)
}
//call dispatch according to the queue here.
}
It works because as along as one goroutine is consuming the channel, it means at least one goroutine is not running and there is at most EXEC_THROTTLE-1
jobs running so it is good to execuate one more and it does so.
I use this a lot. https://github.com/dustinevan/go-utils
package async
import (
"context"
"github.com/pkg/errors"
)
type Semaphore struct {
buf chan struct{}
ctx context.Context
cancel context.CancelFunc
}
func NewSemaphore(max int, parentCtx context.Context) *Semaphore {
s := &Semaphore{
buf: make(chan struct{}, max),
ctx: parentCtx,
}
go func() {
<-s.ctx.Done()
close(s.buf)
drainStruct(s.buf)
}()
return s
}
var CLOSED = errors.New("the semaphore has been closed")
func (s *Semaphore) Acquire() error {
select {
case <-s.ctx.Done():
return CLOSED
case s.buf <- struct{}{}:
return nil
}
}
func (s *Semaphore) Release() {
<-s.buf
}
you'd use it like this:
func main() {
sem := async.NewSemaphore(10, context.Background())
...
var wg sync.Waitgroup
for _, job := range jobs {
go func() {
wg.Add(1)
err := sem.Acquire()
if err != nil {
// handle err,
}
defer sem.Release()
defer wg.Done()
job()
}
wg.Wait()
}
You can also cap concurrency with buffered channel:
concurrencyLimit := 2 // Number of simultaneous jobs.
limiter := make(chan struct{}, concurrencyLimit)
for job := range jobs {
job := job // Pin loop variable.
limiter <- struct{}{} // Reserve limiter slot.
go func() {
defer func() {
<-limiter // Free limiter slot.
}()
do(job) // Do the job.
}()
}
// Wait for goroutines to finish by filling full channel.
for i := 0; i < cap(limiter); i++ {
limiter <- struct{}{}
}
Replace processItem function with required execution of your job.
Below will execute jobs in proper order. Atmost EXEC_CONCURRENT items will be executed concurrently.
package main
import (
"fmt"
"sync"
"time"
)
func processItem(i int, done chan int, wg *sync.WaitGroup) {
fmt.Printf("Async Start: %d
", i)
time.Sleep(100 * time.Millisecond * time.Duration(i))
fmt.Printf("Async Complete: %d
", i)
done <- 1
wg.Done()
}
func popItemFromBufferChannelWhenItemDoneExecuting(items chan int, done chan int) {
_ = <- done
_ = <-items
}
func main() {
EXEC_CONCURRENT := 3
items := make(chan int, EXEC_CONCURRENT)
done := make(chan int)
var wg sync.WaitGroup
for i:= 1; i < 11; i++ {
items <- i
wg.Add(1)
go processItem(i, done, &wg)
go popItemFromBufferChannelWhenItemDoneExecuting(items, done)
}
wg.Wait()
}
Below will execute jobs in Random order. Atmost EXEC_CONCURRENT items will be executed concurrently.
package main
import (
"fmt"
"sync"
"time"
)
func processItem(i int, items chan int, wg *sync.WaitGroup) {
items <- i
fmt.Printf("Async Start: %d
", i)
time.Sleep(100 * time.Millisecond * time.Duration(i))
fmt.Printf("Async Complete: %d
", i)
_ = <- items
wg.Done()
}
func main() {
EXEC_CONCURRENT := 3
items := make(chan int, EXEC_CONCURRENT)
var wg sync.WaitGroup
for i:= 1; i < 11; i++ {
wg.Add(1)
go processItem(i, items, &wg)
}
wg.Wait()
}
You can choose according to your requirement.