通过缓冲通道（Golang）的并发执行进程的油门数

问题描述：

Intent:

I am looking for a means to run os-level shell commands in parallel, but want to be careful to not clobber CPU and am wondering if a buffered channel would fit this use case.

Implemented:

Create a series of Jobs with a simulated runtime duration. Send these jobs to a queue which will dispatch them to run over a buffered channel as throttled by EXEC_THROTTLE.

Observations:

This 'works' (to the extent that it compiles and runs), but I am wondering if the buffer is working as specified (see: 'Intent') to throttle the number of processes running in parallel.

Disclaimer:

Now, I am aware that newbies tend to over-use channels, but I feel this request for insight is honest, as I've at least exercised the restraint to use a sync.WaitGroup. Forgive the somewhat toy example, but all insight would be appreciated.

Playground

package main

import (
    // "os/exec"
    "log"
    "math/rand"
    "strconv"
    "sync"
    "time"
)

const (
    EXEC_THROTTLE = 2
)

type JobsManifest []Job

type Job struct {
    cmd     string
    result  string
    runtime int // Simulate long-running task
}

func (j JobsManifest) queueJobs(logChan chan<- string, runChan chan Job, wg *sync.WaitGroup) {
    go dispatch(logChan, runChan)
    for _, job := range j {
        wg.Add(1)
        runChan <- job
    }
}

func dispatch(logChan chan<- string, runChan chan Job) {
    for j := range runChan {
        go run(j, logChan)
    }
}

func run(j Job, logChan chan<- string) {
    time.Sleep(time.Second * time.Duration(j.runtime))
    j.result = strconv.Itoa(rand.Intn(10)) // j.result = os.Exec("/bin/bash", "-c", j.cmd).Output()
    logChan <- j.result
    log.Printf("   ran: %s
", j.cmd)
}

func logger(logChan <-chan string, wg *sync.WaitGroup) {
    for {
        res := <-logChan
        log.Printf("logged: %s
", res)
        wg.Done()
    }
}

func main() {

    jobs := []Job{
        Job{
            cmd:     "ps -p $(pgrep vim) | tail -n 1 | awk '{print $3}'",
            runtime: 1,
        },
        Job{
            cmd:     "wc -l /var/log/foo.log | awk '{print $1}'",
            runtime: 2,
        },
        Job{
            cmd:     "ls -l ~/go/src/github.com/ | wc -l | awk '{print $1}'",
            runtime: 3,
        },
        Job{
            cmd:     "find /var/log/ -regextype posix-extended -regex '.*[0-9]{10}'",
            runtime: 4,
        },
    }

    var wg sync.WaitGroup
    logChan := make(chan string)
    runChan := make(chan Job, EXEC_THROTTLE)
    go logger(logChan, &wg)

    start := time.Now()
    JobsManifest(jobs).queueJobs(logChan, runChan, &wg)
    wg.Wait()
    log.Printf("finish: %s
", time.Since(start))
}

答

If I understand you right, you mean to establish a mechanism to ensure that at any time at most a number of EXEC_THROTTLE jobs are running. And if that is your intention, the code does not work.

It is because when you start a job, you have already consumed the channel - allowing another job to be started, yet no jobs have been finished. You can debug this by add an counter (you'll need atomic add or mutex).

You may do the work by simply start a group of goroutine with an unbuffered channel and block when executating jobs:

func Run(j Job) r Result {
    //Run your job here
}

func Dispatch(ch chan Job) {
    for j:=range ch {
        wg.Add(1)
        Run(j)
        wg.Done()
    }
}

func main() {
    ch := make(chan Job)
    for i:=0; i<EXEC_THROTTLE; i++ {
        go Dispatch(ch)
    }
    //call dispatch according to the queue here.
}

It works because as along as one goroutine is consuming the channel, it means at least one goroutine is not running and there is at most EXEC_THROTTLE-1 jobs running so it is good to execuate one more and it does so.

答

I use this a lot. https://github.com/dustinevan/go-utils

package async
import (
    "context"

    "github.com/pkg/errors"
)

type Semaphore struct {
    buf    chan struct{}
    ctx    context.Context
    cancel context.CancelFunc
}

func NewSemaphore(max int, parentCtx context.Context) *Semaphore {

    s := &Semaphore{
        buf:    make(chan struct{}, max),
        ctx:    parentCtx,
    }

    go func() {
        <-s.ctx.Done()
        close(s.buf)
        drainStruct(s.buf)
    }()

    return s
}

var CLOSED = errors.New("the semaphore has been closed")

func (s *Semaphore) Acquire() error {
    select {
    case <-s.ctx.Done():
        return CLOSED
    case s.buf <- struct{}{}:
        return nil
    }
}

func (s *Semaphore) Release() {
    <-s.buf
}

you'd use it like this:

func main() {

    sem := async.NewSemaphore(10, context.Background())
    ...
    var wg sync.Waitgroup 
    for _, job := range jobs {
        go func() {
            wg.Add(1)
            err := sem.Acquire()
            if err != nil {
                 // handle err, 
            }
            defer sem.Release()
            defer wg.Done()
            job()
    }
    wg.Wait()
}

答

You can also cap concurrency with buffered channel:

concurrencyLimit := 2 // Number of simultaneous jobs.
limiter := make(chan struct{}, concurrencyLimit)
for job := range jobs {
    job := job // Pin loop variable.
    limiter <- struct{}{} // Reserve limiter slot.
    go func() {
        defer func() {
            <-limiter // Free limiter slot.
        }()

        do(job) // Do the job.
    }()
}
// Wait for goroutines to finish by filling full channel.
for i := 0; i < cap(limiter); i++ {
    limiter <- struct{}{}
}

答

Replace processItem function with required execution of your job.

Below will execute jobs in proper order. Atmost EXEC_CONCURRENT items will be executed concurrently.

package main

import (
    "fmt"
    "sync"
    "time"
)

func processItem(i int, done chan int, wg *sync.WaitGroup) { 
    fmt.Printf("Async Start: %d
", i)
    time.Sleep(100 * time.Millisecond * time.Duration(i))
    fmt.Printf("Async Complete: %d
", i)
    done <- 1
    wg.Done()
}

func popItemFromBufferChannelWhenItemDoneExecuting(items chan int, done chan int) { 
    _ = <- done
    _ = <-items
}


func main() {
    EXEC_CONCURRENT := 3

    items := make(chan int, EXEC_CONCURRENT)
    done := make(chan int)
    var wg sync.WaitGroup

    for i:= 1; i < 11; i++ {
        items <- i
        wg.Add(1)   
        go processItem(i, done, &wg)
        go popItemFromBufferChannelWhenItemDoneExecuting(items, done)
    }

    wg.Wait()
}

Below will execute jobs in Random order. Atmost EXEC_CONCURRENT items will be executed concurrently.

package main

import (
    "fmt"
    "sync"
    "time"
)

func processItem(i int, items chan int, wg *sync.WaitGroup) { 
    items <- i
    fmt.Printf("Async Start: %d
", i)
    time.Sleep(100 * time.Millisecond * time.Duration(i))
    fmt.Printf("Async Complete: %d
", i)
    _ = <- items
    wg.Done()
}

func main() {
    EXEC_CONCURRENT := 3

    items := make(chan int, EXEC_CONCURRENT)
    var wg sync.WaitGroup

    for i:= 1; i < 11; i++ {
        wg.Add(1)   
        go processItem(i, items, &wg)
    }

    wg.Wait()
}

You can choose according to your requirement.

通过缓冲通道（Golang）的并发执行进程的油门数

相关推荐