One of the beauties of hosting your Laravel app on Heroku is that its Dyno approach means that you can quickly horizontal scale, add more resources. You know, whether that be because of the amount of web requests that you’re dealing with, or the amount of jobs that your background workers are dealing with.
I had the problem last week when a customer support request came in and they were using Category Merchandiser, where we are managing over, I think, 600 categories for different merchants now. And we’ve got a sort of scheduled event that happens a couple of times a day where we go and kind of merchandise every category all over again, just to make sure that we’re in the right order, reflecting all the latest stock and sale values, etc.
So they were actually kind of using the admin to, to drag and drop. And if I kind of switched over to brower share, so they were, they’d remerchandise their products, were saving them, and then they just happened to be doing this at a time where there was a cron process running to remerchandise every stores’ categories.
And so kind of naturally, it’s just kind of getting buffered up into the queue. It was sort of the peak situation for my app. Eventually the products were shown, but it was too slow from a customer perspective. So today, just going to break down what I’ve done this week to try and tackle this and give you a little insight, how you can go about scaling your app on Heroku.
If you’re using Laravel queues or Laravel Horizon. So first off, I will show you New Relic where right now we have the scheduled job up at one o’clock in the morning and in the afternoon every day to kind of merchandise all categories all at once. Now, you can see that that takes so around one o’clock here almost half an hour to go through every category, fetching, all products, ranking them and sending the new sort orders back.
That’s what the process that it needs to go through. So I’m considering how we can optimize this. The first is clear to prioritize user initiated actions. This is a cron process. This is something that we want to do to make sure that we’re showing the right products. But we don’t want to be, we don’t want it to or a user initiated action to get lost in, in the, in the weeds of all of that going on.
So the first thing that I did was actually have a priority. So we have multiple queues and all our Laravel app and this in this app it’s just high and low. So high priority items and get stuck on the high queue and low priorities items on the low queue. Now, if it’s a user initiated action, I clearly want it to be on, on the high queue rather than the low queue.
What I also want to do is this merchandise category job actually triggers, you know, all of those other jobs – I use Laravel batches and jobs that run on completion. And so there’s quite a, you know, a chain of tasks that then occur. So I also pass in a priority because just because the first job runs on the high priority queue, any further schedule jobs or dispatched jobs don’t get put on that high queue.
So I actually pass a priority all the way down. So I know, you know, is this, is this something that we need to be treating differently? That’s the first step. So then we know that even if there’s a half an hour, you know, set of processes running our job will be towards the top. And so that customer action will see the result of their action quickly.
Now I don’t want to, just to kind of solve it kind of one way. This is a good time to reflect on, on my approach. So what other things could I do in order to kind of minimize the, this, this problem. Now, firstly is I’ve got a dyno. I think I’ve moved from having a 1x dyno to a 2x dyno. So I’ve got up more RAM and I can run more worker processes.
But you know, that’s very static. I would like it to be the case that if the queue gets longer I increase the number of dynos temporarily, get through the queue and then scale them back down again. So got put onto HireFire which is an app for Heroku, which does just this all you need to do, for my particular case for job queues and workers is add a controller that will provide information on the number of jobs in the queue. So very, very simple controller. And I think this one is my very simple controller. I have the HireFire API token. I kind of looped through all my queues, count how many in there and provide it back to HireFire.
Now can then see at 10 o’clock when my schedule ran, it actually scaled up to four dynos because there was 316 jobs in the queue, got through them, and as that decreased, decreased, it decreased number of dynos. And because you’re on a charged, sort of pro-rata for, for dynos this is a much better way of doing it then you know always having three or four dynos running.
So, you know, that’s a, that’s a step. That’s how I now know I can scale if there’s suddenly a, an influx and, you know, no matter how big category merchandiser gets and how many categories it manages, this will, this approach will continue to scale. I would just need to kind of manage how many dinos I want to go up to in the worst case.
So that’s two steps. I’ve prioritized user-initiated actions. And I’ve made sure that we can scale if we spot that there’s a lot of jobs in the queue. And the final one is kind of just to like do myself a favor. I’m not scheduling everything at once. So spread the work to minimize those spikes, those things that run on a schedule, I really don’t have to kind of schedule a merchandising process for every store for every category all at the same time. That’s just kind of inviting problems. And so a very simple way, or I kind of worked around that is rather than running this like merchandise job twice a day, I run it every hour and on the hour, I just look at the store ID in the database modulo 12 and match it with the current hour. So we’re spreading, we’re spreading those jobs still twice a day, but we’re spreading them across the day now there’s, there’s still events that trigger these you know, a category merchandising process, but this is our kind of kind of fallback.
We want to kind of do it regularly to make sure we’ve got the latest data. And this ultimately has produced logs, which are far shorter now. So if I show you New Relic for say the last six hours, You’ll see that the spikes are far, far shorter. So there’s, there are events that run at nine and they only last a few minutes.
If I go back to the previous six hours, we can see there was quite a lot going on at two o’clock, but it only lasted a few minutes. And similarly if I go back again, I think there’s a couple at 10 o’clock which could go on for five minutes. So, you know, rather than having a half an hour time period, where we have a lot of jobs in the queue, that is much shorter now because we’re spreading out through the day.
So those three areas where I’ve kind of tackled an issue from a customer support perspective to make sure my app is a little bit more efficient.