Ruby, like most other modern, high-level programming languages, doesn’t force you to manage memory. This feature is called garbage collection, or GC, and you get it for free in Ruby. You can write tons of Ruby code and never give a second thought to the fact that under the covers, Ruby is doing a bang-up job of allocating and freeing memory for your code to use. But it certainly couldn’t hurt to learn something about how Ruby garbage collection works. In particular, there are some settings you can tweak to adjust the Ruby garbage collection algorithm to make it work better for your needs. Read on to learn all about it!
The first misconception we have to clear up is the name. “Garbage collection” is kind of a lie—or at least a limited description. The Ruby garbage collector also allocates memory. It’s a complete memory management system. Most people probably picture the garbage collector as a janitor, picking up after kooky kids and taking out the trash. But the garbage collector is more like the Wizard of Oz, the man behind the curtain working the levers, getting memory from the operating system, keeping track of what’s using it, and freeing it back up when it’s no longer needed. It’s the engine that makes your program work!
Fundamentally, the core of memory management uses the “malloc” and “free” C functions. The first stands for “memory allocation,” and it does just what it says. You call it with the number of bytes of memory to allocate and it gives you a pointer to the starting point in memory for your allocation. It’s up to you to make sure you don’t try to write outside your allocated memory. To give the memory back to the OS when you’re done with it, you call the free function and pass it the pointer that malloc gave you.
There’s also “calloc,” short for “contiguous allocation.” It lets you allocate multiple blocks of memory at once. Then there’s “realloc,” which allows you to reallocate a section of memory you’ve already allocated but change its size. This last function suggests some of the potential difficulties with managing memory: you have to know how much you need, and that can change based on runtime concerns. You also have to be sure to free it up when you’re done with it or you’ll have a memory leak. Isn’t it a good thing Ruby does all this for you?
Early computer guru John McCarthy wrote the first garbage collector way back in 1960 to simplify memory management in the Lisp programming language. It’s pretty amazing that garbage collection has been around that long! This was also a mark-and-sweep garbage collector, which is the same algorithm that Ruby uses for its garbage collection.
Before we get to mark-and-sweep, though, there’s an important reality that you have to understand with Ruby garbage collection: Ruby stops running your program in order to do garbage collection! We call this “stop-the-world” garbage collection. It has to do this to ensure your code won’t attempt to allocate memory while it’s in the middle of figuring out what needs to be cleaned up.
So, if our program stops running every time garbage collection runs, it’s obviously very important that garbage collection is as fast as possible. The big brains behind Ruby have put a lot of effort into tweaking and improving garbage collection performance—making the wizard wiser!
Since everything in Ruby is an object, Ruby garbage collection is all about object management. First, allocating memory takes some time, so Ruby pre-allocates thousands of objects to have them ready when you need them. This is called the “free list.”
To free up the memory used by an object, Ruby first has to make sure that the object is no longer needed. It does this in multiple passes. In the “mark” phase, Ruby crawls through all the objects and marks the ones that are still in use. It does this by toggling a flag in a free bitmap. Next comes the “sweep” phase. Ruby goes through all the objects again, cleans up the unmarked objects, and returns them to the free list. It does this by just manipulating pointers, so it’s pretty fast.
Ruby also uses the tri-color mark-and-sweep approach. This divides objects into three categories: white, gray, and black. White objects are unmarked, possibly collectible objects. Gray objects are marked, but may have references to white objects. Finally, black objects have been marked, and definitely don’t point to any white objects. The mark-and-sweep process goes through and marks everything white, then marks objects gray while it’s checking their references, and then marks objects black as it figures out that they have references. Then, during the sweep phase, all the white objects can be swept.
The creators of Ruby are always trying to make Ruby better. Various versions of Ruby included improvements to the garbage collector. Ruby 2.1 added generational garbage collection. This is based on the theory that most objects are used briefly and then aren’t needed anymore. With generational garbage collection, Ruby maintains separate object spaces for “young” and “old” objects. Then it goes through only the young spaces most of the time. This is the “minor GC” phase.
If an object survives three garbage collections, it’s promoted to the “old” object space. These objects are checked only during a “major GC.” If there isn’t enough space in the young space to create new objects, then Ruby runs garbage collection on the “old space.” This full garbage collection is basically the same as the traditional, pre-2.1 garbage collection. The minor GC takes less time, so the overall time spent in garbage collection is less. Pretty ingenious!
The Ruby core team wasn’t content to rest on their laurels, however. The very next version of Ruby, version 2.2, had another improvement to garbage collection: incremental garbage collection. This approach involves running shorter mark-and-sweep passes more frequently. It means spending the same amount of time in garbage collection overall, but in more frequent and shorter bursts. Since the pauses are shorter, there’s less impact on the program.
Achieving incremental garbage collection in Ruby was pretty hard and complicated. It involved adding the concept of protected and unprotected objects. But, fundamentally, switching to incremental garbage collection meant more frequent but faster garbage collection. Instead of spending something like 100 ms doing a full mark-and-sweep phase, your code might spend 10 ms 10 times.
Those big changes in Ruby 2.1 and 2.2 were the most profound improvements to Ruby’s memory management. Since then, there hasn’t been much in the way of changes to the garbage collector. Ruby 2.6 did include some tweaks to the mark-and-sweep process that resulted in lower memory usage.
Ruby 2.7, which should be out this upcoming Christmas, is slated to include “heap compaction” garbage collection. The heap is the section of memory that allocates to programs. As objects are allocated and freed, the space you’re using on the heap can get messy, with gaps of unused space between your objects. If instead you compact the used memory together, then all your objects are together and there’s no wasted space. That will be a nice Christmas present!
One way to improve your garbage collection performance is to use an alternative memory allocator. One popular such library is jemalloc, written by Jason Evans. It emphasizes fragmentation avoidance. Another option is TCmalloc, short for “thread-caching malloc.” As you might have guessed, this library has strategies to avoid lock contention for multithreaded code. Using an alternative allocator isn’t a decision you want to make lightly, so you should test it carefully.
Ruby will check a standard set of environment variables to set certain garbage collection parameters. You don’t have to set these. Previously, the default values weren’t the best for large Ruby on Rails applications, but that’s no longer the case. If you do want to try tweaking them, here are the options, straight from the Ruby C code comments:
You can run GC.stat in an irb console running your app to get all sorts of garbage collection statistics. Several would be useful for getting numbers for these settings. As long as the variables are set in your OS environment, Ruby will pick them up.
GC.start is another good Ruby garbage collection command to know. It kicks off a garbage collection cycle. If you have a long-running script that’s chewing up memory, you can have it call GC.start periodically to keep the memory usage under control.
Garbage collection in Ruby has come a long way since the early days. Once Ruby 2.7 is out and heap compaction is rolled in, Ruby should really scream. If you still need to tweak your application’s garbage collection, do so carefully and with adequate testing. And if you want more insight into what your app is doing, including its memory usage, check out Stackify Retrace. It will give you a wealth of useful information about your app and help you find bugs.
If you would like to be a guest contributor to the Stackify blog please reach out to [email protected]