Jon (j_b) wrote,
Jon
j_b

  • Mood:
  • Music:

Cgroups example - limiting memory to control disk writes (Debian)

I ran into a problem with an overactive process that left the rest of the system running slow. nice(1) did nothing to solve it, neither did ionice(1) rescheduling it to "Idle". If you run into something similar, cgroups may help

cgroups ("Control groups") were developed at Google around 2006 and showed up in Linux around 2.6.24. Searching for cgroups examples largely leads one to the RHEL Resource Management Guide. (Link goes to the latest version, most Google searches point to older copies.)

In my case, I had a long running (>1hr) process that wrote several hundred GB of output.

I looked at the processes' speed by piping it through pv(1), and also looked at top(1), iotop(1), and

$ watch cat /proc/meminfo  (watching the Dirty: line)

The process was doing buffered writes to disk, which was good (keeping the disk continuously fed for best throughput) but was filling up huge amounts of cache (1~2 dozen GB of Dirty pages.) When I paused it, sync(1) took over 5min to complete.

Debian 8.0 (Jessie) has cgroups by default but, the memory type are disabled by default.

# apt-get install cgroup-tools
# vi /etc/default/grub

(Add  cgroup_enable=memory  to kernel boot parameters, run  update-grub2  and reboot.)

# cgcreate -g memory:/foo
# echo 64M > /sys/fs/cgroup/memory/foo/memory.limit_in_bytes
# cgexec -g memory:/foo bash
#
(your task here)

The cgcreate(1) command is a fancy equivalent to doing a mkdir in the cgroup partition, which automatically is populated with the appropriate control files. Debian 8's kernel has both cgroup and cgroup2 support, but as systemd(8) is using version 1 and it appears the two cannot be used concurrently, that's what I used.

Pros:

  • Fast throughput - better than piping through dd oflag=direct or dd oflag=dsync
  • Solved the system-wide performance hit
  • Everything ran nicely and the watching meminfo (as above) showed dirty pages were being regularly flushed

Cons:

  • Your task might be hit by the OOM killer.
  • Your task can have malloc(3) calls fail, which makes most tools bail out.

This feels like a hack solution, but since cgroups can't limit just write buffered memory yet, and using cgroups actual disk-write limiter (blkio.throttle.write_bps_device) would require the above-mentioned slow dd(1) (which ran at 30% of the speed, at best) and none of the other tools actually worked, I'm sharing it. YMMV - and I'd love to hear of other solutions that actually work for people. A good test program to run is:

$ pv -S -s 80g < /dev/zero > zeroes.dat
(write 80GB to a file, with progress bar and live throughput details)
  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 1 comment