I’ve always been fascinated by how real-world systems can be structured in code and how they behave under different conditions. Depending on the context, we can use various techniques to tackle this problem. In this blog post, I’ll focus on discrete event simulation and use discrete models to build and test a realistic simulation.
Before diving deeper, I’d like to give a shoutout to Serge Toro, whose excellent blog post on building simulations in Golang served as a foundational guide for this project.
To set the stage, imagine this scenario: you’re running an API on a small data center with 30 single-core servers, preparing to handle around 10,000 requests for a product launch. Each request takes approximately 3 seconds to process, and there’s a 5% chance of any server going down at a given time. How long will it take to handle all 10,000 requests?
We can answer this question by combining data structures, a little probability and statistics, and some computer systems knowledge to model and analyze the situation!
I am defining the main structs required for the simulation to work. The first struct defines an event which holds the time of the event and the type of event. The second struct defines a priority queue which holds the events in order of time. The third struct defines and holds the state of the simulation which includes an event queue, a job count, the number of processing units, the probability of a processing unit failing, the time it takes to repair a processing unit, and the time it takes to process a job.
We can alter these parameters to simulate different scenarios and in turn beter understand the system.
package main
import (
"container/heap"
"fmt"
"math/rand"
"time"
)
// event struct to represent processing and repair events
type Event struct {
time int // event timestamp
etype string // can be done or repair
}
// A PriorityQueue implements heap.Interface and holds (*pointers to) Events
type PriorityQueue []*Event
// simulation struct
type Simulation struct {
// event queue implemented as a heap to process the most recent events
eventQ *PriorityQueue
// total number of jobs we need to process
jobCount int
// number of parallel processing units we have
processors int
// probability of the processing unit failing/breaking
breakChance float32
// amount of time it takes to get the processing unit fixed and up and running
repairTime int
// latency for serving requests
processingTime int
}
Priority Queue method overrides are required to implement the heap interface. The less method is used to compare two events and the swap method is used to swap two events in the queue. Events that occur earlier in time are considered less than events that occur later in time.
// A PriorityQueue implements heap.Interface and holds (*pointers to) Events
// All PQ method implementations
func (pq PriorityQueue) Len() int { return len(pq) }
func (pq PriorityQueue) Less(i, j int) bool { return pq[i].time < pq[j].time }
func (pq PriorityQueue) Swap(i, j int) { pq[i], pq[j] = pq[j], pq[i] }
func (pq *PriorityQueue) Push(x any) {
*pq = append(*pq, x.(*Event))
}
func (pq *PriorityQueue) Pop() interface{} {
if len(*pq) == 0 {
return nil
}
old := *pq
n := len(old)
item := old[n-1]
*pq = old[0 : n-1]
return item
}
func (pq *PriorityQueue) Peek() (*Event, error) {
if len(*pq) == 0 {
return nil, fmt.Errorf("priority queue is empty")
}
return (*pq)[0], nil
}
func NewPQ() *PriorityQueue {
return &PriorityQueue{}
}
The simulation loop is the main driver of the simulation. It processes events in order of their time and updates the state of the simulation based on the type of event. The simulation loop will run until all events are processed. Each processor keeps track of time using the currentTime variable which we will use to update the performance metrics in the central event loop.
// func to run the sim and return 3 duration percentages (50, 95, and 99)
func (s *Simulation) run(seed int64) (percentileTime [3]int) {
rgen := rand.New(rand.NewSource(seed))
eventChannel := make(chan Event)
doneChannel := make(chan struct{})
var inflight, broken int
// create a go routine for each processing unit, each processor will have its own go routine
for i := 0; i < s.processors; i++ {
go func() {
currentTime := 0 // Track cumulative time for this processor
for {
// dice roll to see if the pu breaks
if rgen.Float32() < s.breakChance {
repairTime := s.repairTime + int(rgen.NormFloat64()*float64(s.repairTime)/4)
eventChannel <- Event{etype: "REPAIR", time: currentTime + repairTime}
currentTime += repairTime
time.Sleep(time.Duration(repairTime) * time.Millisecond)
} else {
// otherwise, process the event and append a done event to the event channel
processingTime := s.processingTime + int(rgen.NormFloat64()*float64(s.processingTime)/4)
eventChannel <- Event{etype: "DONE", time: currentTime + processingTime}
currentTime += processingTime
time.Sleep(time.Duration(processingTime) * time.Millisecond)
}
}
}()
}
// central event loop, listens on eventChannel
go func() {
for done := 0; done < s.jobCount; {
// receive a message from the event channel
event := <-eventChannel
if event.etype == "REPAIR" {
broken--
} else if event.etype == "DONE" {
done++
inflight--
// Calculate the percentage of jobs done
percentDone := 100 * done / s.jobCount
percentiles := []int{50, 95, 99}
for idx, p := range percentiles {
if percentDone >= p && percentileTime[idx] == 0 {
percentileTime[idx] = event.time // Use event.time as the cumulative time
}
}
}
}
doneChannel <- struct{}{}
}()
<-doneChannel
return
}
The main function initializes the simulation with all required parameters, pushes an initial event to the event queue, and runs the simulation for X units of time.
At the end, the main function prints three percentiles: the 50th, 75th, and 99th percentiles of the time taken to process all jobs (these values can also be tweaked).
func main() {
pq := make(PriorityQueue, 0)
heap.Init(&pq)
// simulate a gas station serving X cars with Y pumps
SIM := Simulation{
eventQ: &pq,
jobCount: 1000,
processors: 12,
breakChance: 0.05, // percentage
repairTime: 10, // min
processingTime: 5, // min
}
// Add initial event to start the simulation
heap.Push(SIM.eventQ, &Event{
time: 0,
etype: "DONE",
})
res := SIM.run(100)
// print out results
fmt.Printf("Time to finish: 50%% = %d, 75%% = %d, 100%% = %d
", res[0], res[1], res[2])
fmt.Printf("Relative Time to X%%: 50%% = 1x, 75%% = %.2fx, 100%% = %.2fx
", float32(res[1])/float32(res[0]), float32(res[2])/float32(res[0]))
}
If you’re new to Goroutines and channels in Go, here’s a quick overview:
Goroutines are lightweight threads that allow code to execute concurrently. Channels act as queues that Goroutines use to communicate and share information.
In this simulation, I use two channels: one to track the event timeline and another for completed events. Each processing unit runs in its own Goroutine because they operate concurrently, simulating real-world processes.
Here’s how it works:
For each processor, I generate a random number. If it’s less than the break chance, a repair event is pushed to the event channel. Otherwise, the processor handles the task and pushes a "done" event instead.
Repair and processing times are non-deterministic in real life, so I incorporate randomness into the simulation by using stochastic processing times alongside preset repair and processing speeds.
To help visualize this, think of a gas station. The time it takes to fill a gas tank varies because cars are different—there’s an average time (say X), but random factors can speed up or slow down the process. To model this variability, I use a discrete random variable with a uniform distribution and a standard deviation equal to one-quarter of the mean.
At the core of the simulation is the central event loop, which runs inside its own Goroutine, listens for events on the even channel and processes them accordingly. If the event type is Done, we calculate the percentages of jobs finished using cumulative time (through the currentTime variable in the processor's goroutine!) and update the percentiles accordingly.
An empty struct is sent to the "done" channel once all events are processed.
I’ve always found it difficult to wrap my head around abstract concepts, especially when I can’t physically see or interact with them. In school, I really struggled with learning computer systems and concurrency because so much of it felt like pure theory. That’s why I created this project—to apply these ideas in a way that made them click for me.
If you’re struggling with a concept, try building something around it! Start small, and don’t hesitate to find inspiration from others who’ve tackled similar problems. Be intentional about how you learn and what you build, and focus on making incremental progress every day.