Suddenly I started to receive “Connect timeout” errors for this blog from my website availability monitoring tool.
It was not at any check. Most of the time check tool reported that the website is working, but sometimes I got the error. I started digging to find out the reason.
Read further to find out the commands I used and what I found. Hope these simple steps will help you to find the cause of your issue.
I am using GitLab Pages to host this blog. If you want to read the details about hosting, check my post about Static site generator and GitLab Pages hosting.
GitLab Pages IP is 184.108.40.206. But I got an error “Connect timeout (220.127.116.11)”. The IP I saw was different.
I store my DNS records in Cloudflare. Cloudflare has an option to proxy some DNS records traffic.
Traffic proxied to Cloudflare utilizes various Cloudflare security and performance features.
It is convenient to use that proxy since you don’t have to set up TLS/SSL certificates by yourself. Instead of resolving directly to your server IP DNS will show you their IP.
So it seemed the problem was with a certain Cloudflare proxy IP.
Then I checked the dig command result.
The output was two IP addresses. One of them I got in the error report. The second one worked well.
; <<>> DiG 9.16.13 <<>> bogomolov.tech
So it became clear that with proxy enabled the first of the IPs in the list is not worked correctly.
The tcpdump is a network packet analyzer tool. I checked TCP flow additionally using the tcpdump command below.
sudo tcpdump host bogomolov.tech -w /tmp/bogomolov.log
After gathering some packets (segments), I read the output using the command below.
tcpdump -A -r /tmp/bogomolov.log | less
As an example, my output looked like this.
05:44:59.207763 IP myhostname.41300 > 18.104.22.168.https: Flags [S], seq 2461178300, win 64240, options [mss 1460,sackOK,TS val 1287633951 ecr 0,nop,wscale 6], length 0
Here you can see that when resolved IP is 22.214.171.124, the first SYN packet (flag [S]) has SYN ACK response (flag [S.]).
But when DNS resolves IP 126.96.36.199 - there is no response.
This is the way TCP connection is established. After DNS resolved an IP, the client will try to establish a TCP connection.
At first, the client sends the SYN (Synchronize Sequence Number) packet. In TCP it is called segments. This segment informs the server that the client wants to start communication.
Then server should respond with a SYN+ACK (Acknowledgement) segment. With this segment, the server tells us that it successfully received the initial SYN packet.
The final part is when the client responds to the server with an ACK segment.
Now let’s go back to the tcpdump result.
Looking at the tcpdump output again, we will see that one of the IPs never responds to the SYN packet. It might be the Cloudflare error or some intermediate routing error. Now we know for certain that the 188.8.131.52 IP address is not working.
In NodeJS you can catch connection timeout in the request’s “error” event. The error message will contain “connect ETIMEDOUT” text.
Since this IP is not controlled by me, I started to looking information if anyone had the same issue. Found the same problem in 2020. Here is the link to that thread.
Additionally, I tried to check from another location using the curl command with force IP address. And it worked from another location. This information may be useful to Cloudflare support.
I didn’t save the curl output. You can use the command below if you have the same problem.
curl -vvv -o /dev/null --header "Host:bogomolov.tech" 184.108.40.206
The next step is to fix the problem. There were at least two ways:
- Write to Cloudflare support
- Disable Proxy and map domain name directly to your IP
The second way was easier since the website will start to work almost immediately. But you will need to handle the SSL certificate by yourself.
So I decided to write to Cloudflare support. Here is a detailed instruction on how to open a support ticket from Cloudflare.
But I had no time at that point and took a break. Later the problem was fixed. So I didn’t open any support tickets and have no chance to dig into the issue deeper.