|
Since at least 2015, the Conquest! server has utilized threads. But the implementation was rudimentary: the server would process all incoming commands, build queues of messages to send out, and then create threads to deliver them to clients. The main thread would then wait until all threads completed before repeating the cycle.
In June, I decided to take a closer look at overhauling the thread management system. Part of this was driven by the need to create a separate thread to send alerts. Sending emails with curl could take several seconds — retrieving an OATH token from Gmail was time consuming and blocked the main thread. So I set out to create 9 persistent threads: 1 dedicated for sending alerts and 8 to update clients. I used ChatGPT to help with the framework and debugging. There are a few points of the implementation that I found particularly interesting. 1. Prevent Socket Blocking To prevent one socket connection from blocking others, a timeout had to be implemented in the thread's processing loop. The pseudocode looked roughly like this: foreach socket: pthread_mutex_lock; while(!stuff) { pthread_cond_timeout; } pthread_mutex_unlock; process_stuff; I hadn't used the pthread_cond_* functions before, so this was all new to me. If the worker thread was signaled by the main thread, it would wake up and start processing. If the timeout expired, the thread could move to the next socket. 2. Shared Data and Mutexes It was critical to ensure all values accessed between threads were protected with mutexes. This seems obvious, but in a mature codebase such as Conquest! (the server was ported to C in 1997), there were references to these shared values sprinkled across the code base and tracking them down took some time (and caused some weirdness along the way). 3. Handling Partial Sends As part of this upgrade I implemented additional functionality to handle partial sends to clients. In the past, if send() returned any error the thread would simply drop the connection. Now, if the error code is EAGAIN or EWOULDBLOCK, the thread remembers where it left off and retries on the next cycle. This is to ensure all data is sent to the client (at least from the server's perspective). 4. Select/Poll Timeout A bit of trivia I learned concerned select/poll. I switched from select to poll and along the way learned that both of these functions check if file descriptors are ready for IO at the time they are called. If not, they sleep for the full timeout. I had mistakenly assumed that incoming IO would interrupt the wait. The poll man page states that it "blocks until a file descriptor becomes ready" but that wasn't the behavior I observed (and contradicted ChatGPT's explanation too). I reduced the timeout from 1000m s to 50ms and the client became noticeably more responsive. Until next time!
0 Comments
|
AuthorJames has been working on Conquest! since 1993. Archives
August 2025
Categories |
RSS Feed