Gerhard Lazu
๐ค SpeakerAppearances Over Time
Podcast Appearances
I have no idea why I picked connections.
It was the wrong one.
But the effect was, as you can see here, we had 2,700 long-running connections
on that uh edge so on that uh region so this in this case it was i think orange one i think ewr right so ewr was getting had like all these connections opened the clients weren't closing the connection the proxy was was full no more connections could be forwarded to the application
Long running connections, they're usually clients which are not doing the right thing, right?
You shouldn't have that many long running connections.
So the problem was a misconfiguration on our side, which meant that connections like slow connections, long running connections were basically blocking other connections from coming through.
So that was a problem there.
And I thought that was it.
But, but there was more.
So this last comment last week.
We now have a check that runs every hour.
And what was interesting, and I'll talk about the check as well, we had response bodies timing out in two regions.
So 13 regions were fine, but even after this configuration, there were two regions, IAD and EWR, where when we were using HTTP2, and for some reason this is important, when we're using HTTP2 and the proxy, the fly proxy would see this, it would not forward the connection correctly.
As in it would start, it would like serve the response.
Like we could see the headers coming back from our instances.
What we wouldn't get is the body.
So the body would always be like zero bytes served.
And we could see this happening.
We could see the connections that by the way, they were opened.