Hello world!
A few weeks ago I made a little web-app to highlight some algorithms we developed for a methodological paper. The paper is concerned with how to best slot people into group appointments. The idea came up while I was employed at the University of Bath, because the project I was hired for required us to recruit participants for longitudinal group experiments that partially took place offline, partially online. The problem was of course how do you figure out the initial real life appointments in which you can start organizing interested participants into their groups?
As it turns out, this is a surprisingly tricky problem. Intuitively, you might think that allowing people to indicate their preferences over a set of proposed appointments, in the style of DoodlePoll, and then inviting people to the most popular selection would be a grand idea. However, since we did not really have the capacity to organize a potentially huge group of people at once, we needed some way of breaking up our pool of interested participants into manageable chunks that we could on-board onto the study in a staggered fashion. Once you start thinking about organizing this way, you can come to realize that inviting to the most popular selected appointment can be suboptimal, as a combination of two or more appointments allows you to group more people for the study. This is because anyone who could not make that most popular appointment is essentially discarded, which may not be the case if you can identify two appointments that have little overlap in terms of who is available for each. So the question then becomes, what is the best set of appointments that will yield the most groups, given a pool of interested participants and our recruitment procedure constraints? This is the problem we set out to address in our paper.
The specific problem may be intractable to solve optimally, so our starting point was to develop a few heuristic algorithms and see how they compared. Of course, you also need some way of telling how well these algorithms are doing, so we spent quite some time trying to come up with a useful metric. We started trying to work out what would be considered an optimal outcome. Obviously you want to recruit as many people as possible, and you don’t want to be having any appointments that have too few attendees to form a group with or too many attendees for the room we are holding these on-boarding sessions in. From here, you might consider the number of appointments, are you looking to minimize them or maximize how many appointments are in the solution set? You might be tempted to say that you want to minimize the number of appointments, but this is not necessarily the case. Researchers are often concerned with reaching certain sample sizes, in line with the size of effects they are hoping to detect. Therefore, it might actually be better to have as many small groups as possible, rather than a few large groups. More groups equals more statistical power. On the flip side, you may actually indeed want to minimize the set of appointments, for example if the appointments are expensive to run.
As it turns out, these problems are defined narrowly enough where you can just work backwards to derive a metric for each case and prove that the metrics are optimal at detecting how close an algorithm’s solution is to the theoretical optimum, that is, the optimal outcome under the best case. I caveat it with “theoretical” because it is quite hard (again, potentially impossible) to tell what the true best possible solution is, without trying every single combination of people and appointments. Either way, this was a great starting point as we now had a pretty well-defined metric to measure our algorithms against.
We were quite surprised to find that one of the initial heuristics we developed already beat the algorithm equivalent to “Invite people that have not attended a session yet to the most popular appointment among them” in our simulations. This is where I actually start talking about that web-app this post is supposed to be about. The point of the web-app was to highlight this very finding, by allowing people to compare the performance of the popularity based algorithm to our heuristic. The web-app allows you to choose your own simulation situation, apply the two algorithms, and see the results. In addition, we intended this to also be used by others for their own purposes, so it is possible to apply the algorithms to your own CSV files if you can format them correctly.
Originally, I was planning to implement the web-app as a shiny applet in R, with which I had been experimenting for a few weeks by that point. However, I really didn’t like the usage limits of shinyapps.io and was not about to implement my own server, so I turned to JavaScript to enable the computations inside people’s browsers. This meant translating the original algorithms from Python to JavaScript, and then implementing a simple front-end to allow users to interact with the app.
As it turns out the first part was pretty easy, because the algorithms are dead simple. The actual app front-end was a bit more challenging as I was a total newbie to JavaScript, but I managed to get it all working in one intense weekend, thanks to the kind support of OpenAI. While I’m pretty pleased with the result, I’m sure I would have learned a lot more if I had done the much harder work of learning to implement it from scratch, rather than with AI assistance. On the other hand, I value efficiency (I’m lazy) and I’ll have plenty of time and reason to learn JavaScript properly in the coming years, since my planned experiments will be much less cookie-cutter than this applet. I’m looking forward to it!
If you’re interested in checking out the app, you can find it here. If you’re interested in the paper, let’s hope I’ll remember to update this post once it’s out in the wild. See you then!
Back to top