The NUMA Conundrum

Before we jump into the nitty-gritty of scheduler tuning, let's set the stage. Non-Uniform Memory Access (NUMA) architectures have become the norm in modern server hardware. But here's the kicker: many of us are still developing and deploying our Go microservices as if we're working with uniform memory access. It's like trying to fit a square peg in a round hole – it might work, but it's far from optimal.

Why NUMA Matters for Go Microservices

Go's runtime is pretty smart, but it's not omniscient. When it comes to NUMA awareness, it needs a little help from us mere mortals. Here's why NUMA-awareness is crucial for your Go microservices:

  • Memory access latency can vary significantly between local and remote NUMA nodes
  • Improper thread and memory placement can lead to performance degradation
  • Go's garbage collector performance can be impacted by NUMA effects

Ignoring NUMA in your Go microservices is like ignoring the existence of traffic when planning a road trip. Sure, you might reach your destination, but the journey will be far from smooth.

Enter the Completely Fair Scheduler (CFS)

Now, let's talk about our main character: the Completely Fair Scheduler. Despite its name, CFS isn't always completely fair when it comes to NUMA systems. But with a bit of tuning, we can make it work wonders for our Go microservices.

CFS: The Good, The Bad, and The NUMA-Ugly

CFS is designed to be, well, fair. It tries to give each process an equal share of CPU time. But in a NUMA world, fairness isn't always what we want. Sometimes, we need to be a bit unfair to achieve optimal performance. Here's a quick rundown:

  • The Good: CFS provides good overall system responsiveness and fairness
  • The Bad: It can lead to unnecessary task migrations between NUMA nodes
  • The NUMA-Ugly: Without proper tuning, it can cause increased memory access latency for Go microservices

Tuning CFS for NUMA-Aware Go Microservices

Alright, time to roll up our sleeves and get our hands dirty with some scheduler tuning. Here are the key areas we'll focus on:

1. Adjusting Scheduling Domains

Scheduling domains define how the scheduler views the system topology. By tweaking these, we can make CFS more NUMA-aware:


# Check current scheduling domains
cat /proc/sys/kernel/sched_domain/cpu0/domain*/name

# Adjust scheduling domain parameters
echo 1 > /proc/sys/kernel/sched_domain/cpu0/domain0/prefer_local_spreading

This tells the scheduler to prefer keeping tasks on the same NUMA node when possible, reducing unnecessary migrations.

2. Fine-tuning sched_migration_cost_ns

This parameter controls how eager the scheduler is to migrate tasks between CPUs. For NUMA systems running Go microservices, we often want to increase this value:


# Check current value
cat /proc/sys/kernel/sched_migration_cost_ns

# Increase the value (e.g., to 1000000 nanoseconds)
echo 1000000 > /proc/sys/kernel/sched_migration_cost_ns

This change makes the scheduler less likely to move tasks between NUMA nodes, reducing the chances of remote memory access.

3. Leveraging cgroups for NUMA-Aware Resource Allocation

Control groups (cgroups) can be a powerful tool for enforcing NUMA-aware resource allocation. Here's a simple example of how to use cgroups to pin a Go microservice to a specific NUMA node:


# Create a cgroup for our Go microservice
mkdir /sys/fs/cgroup/cpuset/go_microservice

# Assign CPUs and memory nodes
echo "0-3" > /sys/fs/cgroup/cpuset/go_microservice/cpuset.cpus
echo "0" > /sys/fs/cgroup/cpuset/go_microservice/cpuset.mems

# Run the Go microservice within this cgroup
cgexec -g cpuset:go_microservice ./my_go_microservice

This ensures that our Go microservice only uses CPUs and memory from a single NUMA node, reducing cross-node memory access.

The Go Runtime: Your NUMA-Aware Ally

While we're focusing on scheduler tuning, let's not forget that Go's runtime can be our ally in the quest for NUMA awareness. Here are a couple of Go-specific tips:

1. GOGC and NUMA

The GOGC environment variable controls Go's garbage collector behavior. In NUMA systems, you might want to adjust this value to reduce the frequency of global collections:


export GOGC=200

This tells the Go runtime to trigger garbage collection less frequently, potentially reducing cross-node memory access during collection.

2. Leveraging runtime.NumCPU()

When writing Go code for NUMA systems, be mindful of how you're using goroutines. Here's a simple example of how to create a NUMA-aware worker pool:


import "runtime"

func createNUMAAwareWorkerPool() {
    numCPU := runtime.NumCPU()
    for i := 0; i < numCPU; i++ {
        go worker(i)
    }
}

func worker(id int) {
    runtime.LockOSThread()
    // Worker logic here
}

By using runtime.NumCPU() and runtime.LockOSThread(), we're creating a worker pool that's more likely to respect NUMA boundaries.

Measuring the Impact

All this tuning is great, but how do we know if it's actually making a difference? Here are some tools and metrics to keep an eye on:

  • numastat: Provides NUMA memory statistics
  • perf: Can be used to measure cache misses and memory access patterns
  • Go's built-in profiling: Use runtime/pprof to profile your application before and after tuning

Here's a quick example of how to use numastat to check NUMA memory usage:


numastat -p $(pgrep my_go_microservice)

Look for imbalances in memory allocation across NUMA nodes. If you see a lot of "foreign" memory access, your tuning might need some adjustment.

Pitfalls and Gotchas

Before you go off and start tuning every system in sight, a word of caution:

  • Over-tuning can lead to resource underutilization
  • What works for one Go microservice might not work for another
  • Scheduler tuning can interact in complex ways with Go's runtime behavior

Always measure, test, and validate your changes in a controlled environment before pushing to production. Remember, with great power comes great responsibility (and potentially great headaches if you're not careful).

Wrapping Up: The Art of Balance

Tuning the Completely Fair Scheduler for NUMA-aware Go microservices is truly an art form. It's about finding the right balance between fairness, performance, and resource utilization. Here are the key takeaways:

  • Understand your hardware: NUMA architecture matters
  • Tune CFS parameters with NUMA in mind
  • Leverage cgroups for fine-grained control
  • Work with Go's runtime, not against it
  • Always measure and validate your tuning efforts

Remember, the goal isn't to create a perfectly NUMA-aware system (which is practically impossible), but to find the sweet spot where your Go microservices perform at their best within the constraints of your NUMA architecture.

So, the next time someone says, "It's just a scheduler, how complex could it be?" you can smile knowingly and point them to this article. Happy tuning, and may your Go microservices forever run smoothly across NUMA nodes!

"In the world of NUMA-aware Go microservices, the scheduler is not just a referee – it's the choreographer of a complex dance between code and hardware."

Got any war stories about scheduler tuning for NUMA systems? Or perhaps some clever Go tricks for NUMA awareness? Drop them in the comments below. Let's learn from each other's triumphs (and catastrophes)!