ISH-141
Created by Laura Hausmann
2 years ago
Mar 07 2024, 04:12 GMT+1
Updated by Laura Hausmann
1 year ago
Sep 02 2024, 16:25 GMT+2
Sharding/clustering

This lists the known issues with clustering (running multiple instances in a load-balancer setup, or even just running multiple queue processors). More are to be investigated.

  • The queue system's job retry handler (RecoverOrPrepareForExitAsync) doesn't handle crashes properly (especially with multiple queue processor workers)
  • WebSocket connections will only receive events processed by the node they’re connected to
  • So far, there is no option to start a node in pure "queue processor" mode
  • Cron tasks shouldn't be executed by all workers & should be executed in a way that's resilient to a worker going down
  • Push notifications get duplicated once per full/web worker (queue workers don't affect this problem) (maybe push notification delivery should be a background-task job in cluster mode?)
  • Duplicate work prevention (AsyncKeyedLocker) is not easily adaptable to multiple workers
Avatar

Maybe instead of the traditional clustering setup we could just have remote runners for converting images (libvips) and ffmpeg processes. Those would be the most demanding CPU tasks and being able to run them remotely would help scaling immensely.

You’d need to code in a way to send the content to the remote runner and then receive it back on the source server, for the instances that don’t use S3.

Project
Iceshrimp.NET
Priority
Normal
N
Type
Epic
E
State
Won't fix
W
Assignee
Laura Hausmann
Avatar
Subsystem
Backend
B
Component
Core services
C
Target version
Unscheduled
Released in version
Unreleased