Archive

Monthly Archives: April 2013

Or: How to Debug 503’s, H12’s, and H13’s on Heroku

TLDR: I was wrong about log-runtime-entries! Turns out my unexplained dyno crashes were caused by a Gem that was misbehaving in its process forking action, which I’ve now fixed and submitted a pull request for.

Introduction and Apology

In my last post, I mentioned that I would keep an eye on my Heroku Rails app over the next 24 hours to see if removing log-runtime-metrics would indeed alleviate my problem with the unexplained dyno reboots that were causing 503, H12, and H13 errors.

Unfortunately, like a bad zombie movie, the problem just wouldn’t die. Overnight my dynos again suffered a reboot, and I again saw the dreaded ‘[FATAL] Failed to create timer thread (errno: 11)’ in the logs.

So, the problem appears not to be log-runtime-metrics after all. My apologies for raising the alarm too early — I should have waited the full 24 hours first.

Continuing the Search for the Real Killers

So what the heck is the actual problem? Something in the app is leaking threads, so how do I figure out where?

To verify my leaky thread hypothesis, I whipped up a short bash script which I then ran inside the (Warning: Big file) Ubuntu Lucid 64 Vagrant box where I develop my Rails app.

while true
do
# Command line inspired from http://askubuntu.com/a/88983
ps -u vagrant -o thcount | tail -n +2 | awk '{ num_threads += $1 } END { print num_threads }'
sleep 1
done
view raw gistfile1.txt hosted with ❤ by GitHub

I use the app for a while, and sure enough, I see the following scroll by:

43, 43 ... 45, 45 ... 44, 44 ... 46, 46 ... 45, 45 ... 47 ...

and I watch the count slowly increase, 2 steps forward and one step back, 2 steps forward and one step back. At this rate the number of threads being used would surely bust a limit given some time. It looks like I finally found the trail of blood. Something in my app was definitely creating threads and not cleaning up after itself.

To see if I could gain further insight, I took a closer look at the process list:

vagrant@www:~$ ps -u vagrant | grep ruby
10260 pts/0 00:00:00 ruby
10271 pts/0 00:00:25 ruby
10514 pts/0 00:00:14 ruby
10533 pts/0 00:00:13 ruby
10541 pts/0 00:00:16 ruby
11269 pts/0 00:00:00 ruby <defunct>
11304 pts/0 00:00:00 ruby <defunct>
11331 pts/0 00:00:00 ruby <defunct>
11363 pts/0 00:00:00 ruby <defunct>
11399 pts/0 00:00:00 ruby <defunct>
11410 pts/0 00:00:00 ruby <defunct>
11453 pts/0 00:00:00 ruby <defunct>
11484 pts/0 00:00:00 ruby <defunct>
11520 pts/0 00:00:00 ruby
view raw gistfile1.txt hosted with ❤ by GitHub

ZOMG! Zombie Processes!

Wow, so this is not only a thread leak, this is a process leak! Now I know that something in the app is forking processes and then failing to wait for the child process to complete. Over time this will result in thousands of Zombie Processes, causing a Zombie Process Apocalypse to wreak havoc on your dyno. (For a great discussion of what exactly a Zombie Process is, see this blog post by the Phusion Passenger team — look for “Not all frozen processes are made equal”)

zombie cats-01

I know that nothing in my Rails app is so clever as to try and fork processes (really bad idea), so the culprit has to be inside a Gem. After (a while) narrowing down some of the suspects, I finally found that the rack-streaming-proxy gem that I was using was forking processes, and indeed the parent processes were not waiting for the child processes.

I finally found the root cause of the Zombie Process Apocalypse.

Unfortunately this is a very old gem and is not maintained anymore, but I went ahead and fixed the bug, and submitted a pull request. The details of how the bug fix works is very interesting in its own right, and I may write a follow-up post about it.

But for now, in my experiments using the short bash script above, I’ve found that the fix works at making sure no more zombie processes are generated.

Overall this was definitely one of the hardest things I’ve had to debug; it’s one of those problems that only shows up at scale and after running for a long period of time. But at least for now, the Zombie Process Apocalypse has been averted!

Unabashed Self-Promotion

Got problems like these? I am a 7-year Ruby on Rails veteran, who previously founded (and subsequently exited) a startup as a CTO. I am available for short- and long-term consulting contracts involving CTO-level advice and Ruby on Rails project work. Please free free to get in touch.

Update 2: I was wrong about log-runtime-metrics! Turns out the culprit was a concurrency bug inside a Gem. Read how I fixed it here.
Update 1: Hacker News discussion here.

TLDR:  As of April 25, 2013, do not run log-runtime-metrics for extended periods of time, as it can crash your Heroku dynos (and cause them to reboot) multiple times daily. I was wrong! Please see Beware of the Zombie (Process) Apocalypse

Introduction

In the wake of the Rap Genius brouhaha, Heroku has been on a tear releasing many new features intended to help you tune and debug your Rails app’s performance. One of them is the Heroku Labs feature log-runtime-metrics, which combined with the Log2Viz feature gives you great insight into your app’s memory usage. You get a detailed breakdown of memory usage per dyno in your logs, and armed with this information you can scale your Unicorn worker count accordingly.

Emerging Problems

With this kind of amazing visibility into your dynos, it is tempting to leave this Labs feature enabled. After all, what can go wrong? Well, it turns out, lots. After having this feature enabled for a few days, I started noticing some unexplained dyno reboots inside NewRelic.

unexplained-heroku-dyno-reboots

These restarts were causing 503 timeouts, with Heroku H13 errors manifesting in the logs. (NB: In my case, I am running Unicorn with multiple processes, so each of these instance restarts is actually a single Unicorn worker restarting.)

2013-04-25 04:13:25.255469+00:00 heroku router - - at=error code=H13 desc="Connection closed without response" method=GET path=/private/url host=www.secretsite.com fwd="60.36.84.1" dyno=web.1 connect=2ms service=1798ms status=503 bytes=0 CRIT
view raw gistfile1.txt hosted with ❤ by GitHub

Investigating the Mystery

Digging deeper into the mystery, I went into the Logentries add-on and examined the Heroku logs around that time. Specifically, I was not receiving any exceptions inside AirBrake so I knew it had to be a middleware problem or a system-level kind of problem.

So let’s take a look at the logs. When things are running fine, you can see that log-runtime-metrics outputs your dyno’s stats every 20 seconds or so.

2013-04-25 04:10:43.594516+00:00 heroku web.1 - - source=heroku.8436970.web.1.dccb5023-e818-455f-a040-62a823a91e65 measure=load_avg_1m val=0.15
2013-04-25 04:10:43.594776+00:00 heroku web.1 - - source=heroku.8436970.web.1.dccb5023-e818-455f-a040-62a823a91e65 measure=load_avg_5m val=0.14
2013-04-25 04:10:43.595012+00:00 heroku web.1 - - source=heroku.8436970.web.1.dccb5023-e818-455f-a040-62a823a91e65 measure=load_avg_15m val=0.12
2013-04-25 04:10:43.595221+00:00 heroku web.1 - - source=heroku.8436970.web.1.dccb5023-e818-455f-a040-62a823a91e65 measure=memory_total val=811.38 units=MB
2013-04-25 04:10:43.595414+00:00 heroku web.1 - - source=heroku.8436970.web.1.dccb5023-e818-455f-a040-62a823a91e65 measure=memory_rss val=687.77 units=MB
2013-04-25 04:10:43.595608+00:00 heroku web.1 - - source=heroku.8436970.web.1.dccb5023-e818-455f-a040-62a823a91e65 measure=memory_cache val=123.61 units=MB
2013-04-25 04:10:43.595836+00:00 heroku web.1 - - source=heroku.8436970.web.1.dccb5023-e818-455f-a040-62a823a91e65 measure=memory_swap val=0.00 units=MB
...
2013-04-25 04:11:04.331556+00:00 heroku web.1 - - source=heroku.8436970.web.1.dccb5023-e818-455f-a040-62a823a91e65 measure=load_avg_1m val=0.11
2013-04-25 04:11:04.331749+00:00 heroku web.1 - - source=heroku.8436970.web.1.dccb5023-e818-455f-a040-62a823a91e65 measure=load_avg_5m val=0.13
2013-04-25 04:11:04.331954+00:00 heroku web.1 - - source=heroku.8436970.web.1.dccb5023-e818-455f-a040-62a823a91e65 measure=load_avg_15m val=0.12
2013-04-25 04:11:04.332157+00:00 heroku web.1 - - source=heroku.8436970.web.1.dccb5023-e818-455f-a040-62a823a91e65 measure=memory_total val=811.61 units=MB
2013-04-25 04:11:04.332342+00:00 heroku web.1 - - source=heroku.8436970.web.1.dccb5023-e818-455f-a040-62a823a91e65 measure=memory_rss val=687.89 units=MB
2013-04-25 04:11:04.332527+00:00 heroku web.1 - - source=heroku.8436970.web.1.dccb5023-e818-455f-a040-62a823a91e65 measure=memory_cache val=123.72 units=MB
2013-04-25 04:11:04.332795+00:00 heroku web.1 - - source=heroku.8436970.web.1.dccb5023-e818-455f-a040-62a823a91e65 measure=memory_swap val=0.00 units=MB
view raw gistfile1.txt hosted with ❤ by GitHub

(As an aside, I am using a 2X dyno, so my memory usage of ~811 MB is not a problem — in fact, it looks like I have enough memory to add another Unicorn worker. Again, log-runtime-metrics is great.)

Around 04:11:21 things start to go wrong:

2013-04-25 04:11:21.550262+00:00 app web.1 - - [FATAL] Failed to create timer thread (errno: 11)
2013-04-25 04:11:21.563697+00:00 app web.1 - - E, [2013-04-25T04:11:21.562895 #2] ERROR -- : reaped #<Process::Status: pid 8807 exit 1> worker=0
...
2013-04-25 04:11:22.731771+00:00 app web.1 - - E, [2013-04-25T04:11:22.730653 #8812] ERROR -- : can't create Thread (11) (ThreadError)
2013-04-25 04:11:22.731771+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/gems/newrelic_rpm-3.6.0.83/lib/new_relic/agent/thread.rb:12:in `initialize'
...
2013-04-25 04:11:23.955745+00:00 app web.1 - - [FATAL] Failed to create timer thread (errno: 11)
...
2013-04-25 04:11:25.154739+00:00 app web.1 - - E, [2013-04-25T04:11:25.153854 #8820] ERROR -- : can't create Thread (11) (ThreadError)
2013-04-25 04:11:25.154739+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/gems/newrelic_rpm-3.6.0.83/lib/new_relic/agent/thread.rb:12:in `initialize'
...
2013-04-25 04:11:25.173890+00:00 app web.1 - - E, [2013-04-25T04:11:25.172924 #2] ERROR -- : reaped #<Process::Status: pid 8820 exit 1> worker=0
...
2013-04-25 04:11:26.375110+00:00 app web.1 - - E, [2013-04-25T04:11:26.373942 #8823] ERROR -- : can't create Thread (11) (ThreadError)
2013-04-25 04:11:26.375110+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/gems/newrelic_rpm-3.6.0.83/lib/new_relic/agent/thread.rb:12:in `initialize'
...
2013-04-25 04:11:26.393677+00:00 app web.1 - - E, [2013-04-25T04:11:26.393423 #2] ERROR -- : reaped #<Process::Status: pid 8823 exit 1> worker=0
...
2013-04-25 04:11:27.558691+00:00 app web.1 - - E, [2013-04-25T04:11:27.557780 #8826] ERROR -- : can't create Thread (11) (ThreadError)
2013-04-25 04:11:27.558691+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/gems/newrelic_rpm-3.6.0.83/lib/new_relic/agent/thread.rb:12:in `initialize'
view raw gistfile1.txt hosted with ❤ by GitHub

Pointing the Finger at the Wrong Suspect

You can see that one of the Unicorn workers started to die repeatedly with fairly ominous error messages. At this point, my first thought was that the NewRelic gem was causing the reboots, because of this stack trace:

2013-04-25 04:13:25.177834+00:00 app web.1 - - [FATAL] Failed to create timer thread (errno: 11)
2013-04-25 04:13:25.186393+00:00 app web.1 - - E, [2013-04-25T04:13:25.185438 #9160] ERROR -- : can't create Thread (11) (ThreadError)
2013-04-25 04:13:25.186393+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/gems/newrelic_rpm-3.6.0.83/lib/new_relic/agent/thread.rb:12:in `initialize'
2013-04-25 04:13:25.186393+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/gems/newrelic_rpm-3.6.0.83/lib/new_relic/agent/thread.rb:12:in `initialize'
2013-04-25 04:13:25.186393+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/gems/newrelic_rpm-3.6.0.83/lib/new_relic/agent/agent.rb:612:in `new'
2013-04-25 04:13:25.186393+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/gems/newrelic_rpm-3.6.0.83/lib/new_relic/agent/agent.rb:612:in `start_worker_thread'
2013-04-25 04:13:25.186393+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/gems/newrelic_rpm-3.6.0.83/lib/new_relic/agent/agent.rb:211:in `after_fork'
2013-04-25 04:13:25.255469+00:00 heroku router - - at=error code=H13 desc="Connection closed without response" method=GET path=/private/url host=www.secretsite.com fwd="60.36.84.1" dyno=web.1 connect=2ms service=1798ms status=503 bytes=0 CRIT
2013-04-25 04:13:25.186393+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/gems/newrelic_rpm-3.6.0.83/lib/new_relic/agent.rb:268:in `after_fork'
2013-04-25 04:13:25.186393+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/gems/newrelic_rpm-3.6.0.83/lib/new_relic/agent/instrumentation/unicorn_instrumentation.rb:21:in `block (4 levels) in <top (required)>'
2013-04-25 04:13:25.186393+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/gems/unicorn-4.6.2/lib/unicorn/http_server.rb:500:in `spawn_missing_workers'
2013-04-25 04:13:25.186393+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/gems/unicorn-4.6.2/lib/unicorn/http_server.rb:511:in `maintain_worker_count'
2013-04-25 04:13:25.186733+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/gems/unicorn-4.6.2/lib/unicorn/http_server.rb:277:in `join'
2013-04-25 04:13:25.186733+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/gems/unicorn-4.6.2/bin/unicorn:126:in `<top (required)>'
2013-04-25 04:13:25.186733+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/bin/unicorn:23:in `load'
2013-04-25 04:13:25.186733+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/bin/unicorn:23:in `<main>'
2013-04-25 04:13:26.447382+00:00 heroku web.1 - - Process exited with status 1
2013-04-25 04:13:26.466498+00:00 heroku web.1 - - State changed from up to crashed
2013-04-25 04:13:26.466498+00:00 heroku web.1 - - State changed from crashed to starting
view raw gistfile1.txt hosted with ❤ by GitHub

But here’s the spoiler: it turns out NewRelic was just the fall guy! Here’s why. If you take a look at the log-runtime-metrics gist above, you can see that the metrics are emitted every 20 seconds. Well, hmm, what needs a timer thread like that? After a bit of thought, it hit me: log-runtime-metrics, of course!

Catching the Real Crook

As you can see from the logs, the time that the thread creation error occurs is almost exactly 20 seconds from the last metric report — which tells me that log-runtime-metrics is trying to start a thread to wake itself up in 20 seconds for the next report. However, it encounters difficulty doing so:

2013-04-25 04:10:22.544500+00:00 heroku web.1 - - source=heroku.8436970.web.1.dccb5023-e818-455f-a040-62a823a91e65 measure=memory_total val=811.86 units=MB
...
2013-04-25 04:10:43.595221+00:00 heroku web.1 - - source=heroku.8436970.web.1.dccb5023-e818-455f-a040-62a823a91e65 measure=memory_total val=811.38 units=MB
...
2013-04-25 04:11:04.332157+00:00 heroku web.1 - - source=heroku.8436970.web.1.dccb5023-e818-455f-a040-62a823a91e65 measure=memory_total val=811.61 units=MB
...
2013-04-25 04:11:21.550262+00:00 app web.1 - - [FATAL] Failed to create timer thread (errno: 11)
...
view raw gistfile1.txt hosted with ❤ by GitHub

This starts a cascading chain of failures which causes the entire dyno to reboot:

2013-04-25 04:11:21.550262+00:00 app web.1 - - [FATAL] Failed to create timer thread (errno: 11)
2013-04-25 04:11:21.563697+00:00 app web.1 - - E, [2013-04-25T04:11:21.562895 #2] ERROR -- : reaped #<Process::Status: pid 8807 exit 1> worker=0
...
2013-04-25 04:11:22.731771+00:00 app web.1 - - E, [2013-04-25T04:11:22.730653 #8812] ERROR -- : can't create Thread (11) (ThreadError)
2013-04-25 04:11:22.731771+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/gems/newrelic_rpm-3.6.0.83/lib/new_relic/agent/thread.rb:12:in `initialize'
...
2013-04-25 04:11:23.955745+00:00 app web.1 - - [FATAL] Failed to create timer thread (errno: 11)
...
2013-04-25 04:11:25.154739+00:00 app web.1 - - E, [2013-04-25T04:11:25.153854 #8820] ERROR -- : can't create Thread (11) (ThreadError)
2013-04-25 04:11:25.154739+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/gems/newrelic_rpm-3.6.0.83/lib/new_relic/agent/thread.rb:12:in `initialize'
...
2013-04-25 04:11:25.173890+00:00 app web.1 - - E, [2013-04-25T04:11:25.172924 #2] ERROR -- : reaped #<Process::Status: pid 8820 exit 1> worker=0
...
2013-04-25 04:11:26.375110+00:00 app web.1 - - E, [2013-04-25T04:11:26.373942 #8823] ERROR -- : can't create Thread (11) (ThreadError)
2013-04-25 04:11:26.375110+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/gems/newrelic_rpm-3.6.0.83/lib/new_relic/agent/thread.rb:12:in `initialize'
...
2013-04-25 04:11:26.393677+00:00 app web.1 - - E, [2013-04-25T04:11:26.393423 #2] ERROR -- : reaped #<Process::Status: pid 8823 exit 1> worker=0
...
2013-04-25 04:11:27.558691+00:00 app web.1 - - E, [2013-04-25T04:11:27.557780 #8826] ERROR -- : can't create Thread (11) (ThreadError)
2013-04-25 04:11:27.558691+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/gems/newrelic_rpm-3.6.0.83/lib/new_relic/agent/thread.rb:12:in `initialize'
...
2013-04-25 04:13:25.177834+00:00 app web.1 - - [FATAL] Failed to create timer thread (errno: 11)
2013-04-25 04:13:25.186393+00:00 app web.1 - - E, [2013-04-25T04:13:25.185438 #9160] ERROR -- : can't create Thread (11) (ThreadError)
2013-04-25 04:13:25.186393+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/gems/newrelic_rpm-3.6.0.83/lib/new_relic/agent/thread.rb:12:in `initialize'
2013-04-25 04:13:25.186393+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/gems/newrelic_rpm-3.6.0.83/lib/new_relic/agent/thread.rb:12:in `initialize'
2013-04-25 04:13:25.186393+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/gems/newrelic_rpm-3.6.0.83/lib/new_relic/agent/agent.rb:612:in `new'
2013-04-25 04:13:25.186393+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/gems/newrelic_rpm-3.6.0.83/lib/new_relic/agent/agent.rb:612:in `start_worker_thread'
2013-04-25 04:13:25.186393+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/gems/newrelic_rpm-3.6.0.83/lib/new_relic/agent/agent.rb:211:in `after_fork'
2013-04-25 04:13:25.255469+00:00 heroku router - - at=error code=H13 desc="Connection closed without response" method=GET path=/private/url host=www.secretsite.com fwd="60.36.84.1" dyno=web.1 connect=2ms service=1798ms status=503 bytes=0 CRIT
2013-04-25 04:13:25.186393+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/gems/newrelic_rpm-3.6.0.83/lib/new_relic/agent.rb:268:in `after_fork'
2013-04-25 04:13:25.186393+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/gems/newrelic_rpm-3.6.0.83/lib/new_relic/agent/instrumentation/unicorn_instrumentation.rb:21:in `block (4 levels) in <top (required)>'
2013-04-25 04:13:25.186393+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/gems/unicorn-4.6.2/lib/unicorn/http_server.rb:500:in `spawn_missing_workers'
2013-04-25 04:13:25.186393+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/gems/unicorn-4.6.2/lib/unicorn/http_server.rb:511:in `maintain_worker_count'
2013-04-25 04:13:25.186733+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/gems/unicorn-4.6.2/lib/unicorn/http_server.rb:277:in `join'
2013-04-25 04:13:25.186733+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/gems/unicorn-4.6.2/bin/unicorn:126:in `<top (required)>'
2013-04-25 04:13:25.186733+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/bin/unicorn:23:in `load'
2013-04-25 04:13:25.186733+00:00 app web.1 - - /app/vendor/bundle/ruby/1.9.1/bin/unicorn:23:in `<main>'
2013-04-25 04:13:26.447382+00:00 heroku web.1 - - Process exited with status 1
2013-04-25 04:13:26.466498+00:00 heroku web.1 - - State changed from up to crashed
2013-04-25 04:13:26.466498+00:00 heroku web.1 - - State changed from crashed to starting
...
2013-04-25 04:13:33.673734+00:00 heroku web.1 - - Starting process with command `bundle exec unicorn -p 13516 -c ./config/unicorn.rb`
2013-04-25 04:13:36.497428+00:00 app web.1 - - I, [2013-04-25T04:13:36.497000 #2] INFO -- : Refreshing Gem list
view raw gistfile1.txt hosted with ❤ by GitHub

log-runtime-metrics obviously has a thread leak, as no further threads can be created. I surmise that log-runtime-metrics is not joining its timer threads when they are done and that it is running into the PTHREAD_THREADS_MAX limit in the underlying Ubuntu OS that runs the dyno.

Note that during this two-minute thrashing interval, no further runtime metrics are reported, so log-runtime-metrics really is dead.

After the reboot, the metrics on web.1 start reporting every 20 seconds again, even before Unicorn is booted:

2013-04-25 04:13:33.673734+00:00 heroku web.1 - - Starting process with command `bundle exec unicorn -p 13516 -c ./config/unicorn.rb`
2013-04-25 04:13:36.497428+00:00 app web.1 - - I, [2013-04-25T04:13:36.497000 #2] INFO -- : Refreshing Gem list
...
2013-04-25 04:13:46.572823+00:00 heroku web.1 - - source=heroku.8436970.web.1.777b1bcb-3836-479e-ba2f-0b86ea4cbdd6 measure=memory_total val=35.01 units=MB
2013-04-25 04:13:46.572823+00:00 heroku web.1 - - source=heroku.8436970.web.1.777b1bcb-3836-479e-ba2f-0b86ea4cbdd6 measure=memory_rss val=35.00 units=MB
2013-04-25 04:13:46.572823+00:00 heroku web.1 - - source=heroku.8436970.web.1.777b1bcb-3836-479e-ba2f-0b86ea4cbdd6 measure=memory_cache val=0.00 units=MB
2013-04-25 04:13:46.572823+00:00 heroku web.1 - - source=heroku.8436970.web.1.777b1bcb-3836-479e-ba2f-0b86ea4cbdd6 measure=memory_swap val=0.00 units=MB
2013-04-25 04:13:46.572823+00:00 heroku web.1 - - source=heroku.8436970.web.1.777b1bcb-3836-479e-ba2f-0b86ea4cbdd6 measure=memory_pgpgin val=15050 units=pages
2013-04-25 04:13:46.572823+00:00 heroku web.1 - - source=heroku.8436970.web.1.777b1bcb-3836-479e-ba2f-0b86ea4cbdd6 measure=memory_pgpgout val=6088 units=pages
...
2013-04-25 04:14:02.046151+00:00 heroku web.1 - - source=heroku.8436970.web.1.777b1bcb-3836-479e-ba2f-0b86ea4cbdd6 measure=memory_total val=69.40 units=MB
2013-04-25 04:14:02.046151+00:00 heroku web.1 - - source=heroku.8436970.web.1.777b1bcb-3836-479e-ba2f-0b86ea4cbdd6 measure=memory_rss val=69.39 units=MB
2013-04-25 04:14:02.046492+00:00 heroku web.1 - - source=heroku.8436970.web.1.777b1bcb-3836-479e-ba2f-0b86ea4cbdd6 measure=memory_cache val=0.01 units=MB
2013-04-25 04:14:02.046492+00:00 heroku web.1 - - source=heroku.8436970.web.1.777b1bcb-3836-479e-ba2f-0b86ea4cbdd6 measure=memory_swap val=0.00 units=MB
2013-04-25 04:14:02.046976+00:00 heroku web.1 - - source=heroku.8436970.web.1.777b1bcb-3836-479e-ba2f-0b86ea4cbdd6 measure=memory_pgpgin val=28146 units=pages
2013-04-25 04:14:02.046976+00:00 heroku web.1 - - source=heroku.8436970.web.1.777b1bcb-3836-479e-ba2f-0b86ea4cbdd6 measure=memory_pgpgout val=10380 units=pages
2013-04-25 04:14:10.297937+00:00 app web.1 - - DEPRECATION WARNING: You have Rails 2.3-style plugins in vendor/plugins! Support for these plugins will be removed in Rails 4.0. Move them out and bundle them in your Gemfile, or fold them in to your app as lib/myplugin/* and config/initializers/myplugin.rb. See the release notes for more on this: http://weblog.rubyonrails.org/2012/1/4/rails-3-2-0-rc2-has-been-released. (called from <top (required)> at /app/config/environment.rb:21) WARN
2013-04-25 04:14:10.298239+00:00 app web.1 - - DEPRECATION WARNING: You have Rails 2.3-style plugins in vendor/plugins! Support for these plugins will be removed in Rails 4.0. Move them out and bundle them in your Gemfile, or fold them in to your app as lib/myplugin/* and config/initializers/myplugin.rb. See the release notes for more on this: http://weblog.rubyonrails.org/2012/1/4/rails-3-2-0-rc2-has-been-released. (called from <top (required)> at /app/config/environment.rb:21) WARN
...
2013-04-25 04:14:14.290245+00:00 app web.1 - - ** [NewRelic][04/25/13 04:14:14 +0000 777b1bcb-3836-479e-ba2f-0b86ea4cbdd6 (2)] INFO : Reading configuration from config/newrelic.yml
2013-04-25 04:14:14.290321+00:00 app web.1 - - ** [NewRelic][04/25/13 04:14:14 +0000 777b1bcb-3836-479e-ba2f-0b86ea4cbdd6 (2)] INFO : Starting the New Relic Agent.
...
2013-04-25 04:14:14.717721+00:00 app web.1 - - ** [NewRelic][04/25/13 04:14:14 +0000 777b1bcb-3836-479e-ba2f-0b86ea4cbdd6 (2)] INFO : Finished instrumentation
...
2013-04-25 04:14:22.177209+00:00 heroku web.1 - - source=heroku.8436970.web.1.777b1bcb-3836-479e-ba2f-0b86ea4cbdd6 measure=load_avg_1m val=1.00
2013-04-25 04:14:22.177407+00:00 heroku web.1 - - source=heroku.8436970.web.1.777b1bcb-3836-479e-ba2f-0b86ea4cbdd6 measure=memory_total val=102.30 units=MB
2013-04-25 04:14:22.177569+00:00 heroku web.1 - - source=heroku.8436970.web.1.777b1bcb-3836-479e-ba2f-0b86ea4cbdd6 measure=memory_rss val=102.29 units=MB
2013-04-25 04:14:22.177752+00:00 heroku web.1 - - source=heroku.8436970.web.1.777b1bcb-3836-479e-ba2f-0b86ea4cbdd6 measure=memory_cache val=0.01 units=MB
2013-04-25 04:14:22.177908+00:00 heroku web.1 - - source=heroku.8436970.web.1.777b1bcb-3836-479e-ba2f-0b86ea4cbdd6 measure=memory_swap val=0.00 units=MB
2013-04-25 04:14:22.178061+00:00 heroku web.1 - - source=heroku.8436970.web.1.777b1bcb-3836-479e-ba2f-0b86ea4cbdd6 measure=memory_pgpgin val=43425 units=pages
2013-04-25 04:14:22.178247+00:00 heroku web.1 - - source=heroku.8436970.web.1.777b1bcb-3836-479e-ba2f-0b86ea4cbdd6 measure=memory_pgpgout val=17236 units=pages
...
2013-04-25 04:14:28.728687+00:00 app web.1 - - I, [2013-04-25T04:14:28.728470 #2] INFO -- : listening on addr=0.0.0.0:13516 fd=11
2013-04-25 04:14:29.288774+00:00 heroku web.1 - - State changed from starting to up
view raw gistfile1.txt hosted with ❤ by GitHub

All this definitely points toward log-runtime-metrics as the culprit in this scenario. Accordingly, I have turned log-runtime-metrics off in my production system, and there have been no more unexplained dyno reboots, after a few hours of run time. I will keep an eye on it over the next 24 hours to verify that log-runtime-metrics did indeed cause the problem.

So the next time you see this notice:

heroku-beta-feature

…pay attention and take it at face value, it really is an experimental feature!

Unabashed Self-Promotion

Got problems like these? I am a 7-year Ruby on Rails veteran, who previously founded (and subsequently exited) a startup as a CTO. I am available for short- and long-term consulting contracts involving CTO-level advice and Ruby on Rails project work. Please free free to get in touch.

Well hello out there.

It’s been a while since I’ve blogged, and a lot has changed since I last posted nearly two and a half years ago. For one, I no longer live in Montreal, and I am now a freelance Ruby-on-Rails consultant (or as Austin calls it, a “CTO-on-Demand”).

Accordingly this blog will probably be taking a slightly more technical bent than it used to.

If any of my old readers are still around, I’d love it if you said Hi in the comments!