NSQ Realtime messaging platform & BackQ PHP

Stateless containers

During recent migration of our app stack to containers i started to refactor different aspects of the app and ifrastructure, new CI/CD, docker, docker-compose, docker swarm, kubernetes. Problem i saw was we were saving state in the container, but not in the app, but rather in form of queue messages. Even if those messages would be dispatched in next 5…10 seconds it is still a state, so killing a container might affect the business logic.

Solution would be to introduce persistent/guaranteed queue in addition to fast/hot local queue.

Beanstalk is a simple, fast work queue

For years i have been using Beanstalk as a super fast local messaging queue. One simple binary with minimum dependencies, super stable, simple to use. It’s protocol covers needs for messaging queue, its lightweight and easy to implement.

/usr/bin/beanstalkd -l 127.0.0.1 -p 11300 -u beanstalkd -z 5242880 -b /var/lib/beanstalkd -s 10485760 

or without binlog and separate loopback ip for more port range, as is listens on TCP

/usr/bin/beanstalkd -l 127.0.0.2 -p 11303 -u beanstalkd

I also have RabbitMQ in project, but upgrade from Beanstalk to RabbitMQ (whether AMQP or MQTT) sounds like overkill.

So what i needed is a “modern” Beanstalk

  • minimum dependencies
  • fast & stable
  • production ready
  • authentication and authorization would be cool
  • some statistics/analytics integration would be cool, otherwise its black box what is happening with the queues

Since i was already aware of golang i ofc. knew about nsq - a queue written in golang, a really great application for golang as a language. So i took a look.

NSQ

I like Go programming language, i did the book, the online tour, i listened to >100 GolangShow podcast episodes. Unfortunately i didnt had much time or excuse to apply golang in my project, until now.

Because NSQ is written in go. I took me roughly few hours to make a decision and it turned out to work great.

  • simple binaries
  • docker image based on latest go and Apline linux
  • built-in healthckecks
  • built-in statistics/monitoring integration
  • built-in HTTP UI and some HTTP endpoints
  • TCP Protocol that is so somilar to beanstalkd i can guess that nsqio creators did use Beanstalk themselves, because both protocols are different from the usual MQTT/STOMP/AMQP and both are similar to each other, it was just a matter of 1 day to build the NSQ adapter with intefrace-compatible to Beanstalk adapter
  • widely adopted, community & enterprise supported modern codebase

NSQ / TCP Protocol Client for PHP

There were only few outdated clients on official page for PHP

No way i’m going to run 3rd party c extension in production, many other cons.

Nsqphp is based on react/react, outdated, officially “Not maintained anymore since Dec 15, 2017”.

Looks like a challenge, how come such simple TCP protocol for such popular nsq.io messaging queue is not implemented in PHP. I guess its cause PHP developers and Golang developers are a “little” bit on the other parts of programming worlds.

PHP is blocking in nature, old, OOP, used for frontend mostly, newbies friendly. Golang is concurrent, new, not OOP, used for system/internal tools, not as friendly as PHP.

The last library scottaubrey/simplensqclient actually inspired me for my first draft of the implementation, i saved some time on reading documentation and confirming the protocol flow.

Documentation for NSQ TCP protocol has some interesting points:

  • Explanation is split between two articles “Building client libraries” and TCP Protocol spec
  • It doesnt say that V2 protocol heartbeats are REQUIRED for SUB (Subscribe to a topic/channel) otherwise “Cannot SUB with heartbeats disabled” error is thrown.
  • You need to send RDY if you want to get more messages, maybe its obvious, but not explicitly mentioned, you start with “RDY 0” then manually specify “RDY N” and must respond with FIN/REQ N-times, and then again ask with “RDY X”, keeping internal counter in the client side is important
  • Heartbeats can come any time and you need to respond with NOP, but you can use them as loop breakers, same as Beanstalk “reserve-with-timeout " command.
  • CLS (Cleanly close your connection) command only applicable after SUB, i.e. when client is subscribed but not mentioned in documentation.
  • Some recommendations on protocol implementation are not applicable to PHP at all due to blocking nature
  • “heartbeat_interval” protocol option is specified in milliseconds, but nsqd “-max-heartbeat-interval” is configured with string (example “2m0s”) as argument.
  • “msg_timeout” protocol option is limited to nsqd “-max-req-timeout”, and limits DPUB , in short {defer_time <= msg_timeout <= max-req-timeout}

Library

I will be publishing the library soon after its proven itself in production environment as “remote” Beanstalk alternative, it is fully compatible with latest sshilko/backq Workers & Publishers, and to migrate your workers only adapter change needed. It is simpler than other attempts to implement the full protocol, but it supports the interface required for BackQ library and thats what i needed.

Big thanks again to Scott Aubrey for his work/implementation of simplensqclient.

Simplensqclient is published under MIT license at the time of writing, 
last updated on 2015-05-04, is currently alpha status and has no real world production use. 

Important Notice

Due to blocking nature of PHP streams, the job you are performing after receiving NSQ payload MUST NOT take more time than 2x nsq.io server hearbeats, in short heartbeats/tcp/socket processing are NOT done in separate thread/process/corouting. To compensate that the library does not allow to define maximum job duration > 1.5x defined heartbeat, but its only for validation - the job processing will not be interrupted if it takes more time.

For example you fetched payload/job from server and now uploading some files, and that takes >2x heartbeat duration. In this after file upload the library will try to report to nsq server about successfull job completion, but the server will already have the socket connection closed after not hearing anything back from client (worker) for 2x hearbeats. The job will be (or already might be) re-assigned to the next worker that asks for jobs.

The workaround is to correctly define your workers heartbeat that corresponts to expected time job will take, or having a separate channels/topics for workers processing fast and slow kind of jobs.

Docker Compose

Official nsq.io images nsqio/nsq. Official Docker installation tutorial.

  • added healthchecks
  • added explicit stop_signal
  • only publishing port 4171 (HTTP UI) and 4150 (TCP Protocol)
  • exposed max-heartbeat-interval, max-req-timeout settings (at least one hearbeat within 2 minutes, up to 48hrs message delay)
  • nsqd IP is explicitly set to 0.0.0.0, easily replaced with ${BINDIP} variable if you want to bind ONLY to specific interface for security reasons (i.e. do NOT bind to public IP)
version: '3'
services:
  nsqlookupd:
    image: nsqio/nsq
    command: /nsqlookupd
    healthcheck:
      test: ["CMD-SHELL", "/usr/bin/wget -q -s http://0.0.0.0:4161/ping && exit 0 || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 2
    expose:
      - "4161/tcp"
      - "4160/tcp"
    ulimits:
      nproc: 65535
      nofile:
        soft: 65535
        hard: 65535
    sysctls:
      net.core.somaxconn: 32767
  nsqd:
    image: nsqio/nsq
    command: ["/nsqd",
              "--lookupd-tcp-address=nsqlookupd:4160",
              "--max-req-timeout=48h0m1s",
              "--max-body-size=6000000",
              "--max-msg-size=5242880",
              "--max-heartbeat-interval=2m0s",
              "--mem-queue-size=5000",
              "--sync-every=1000",
              "--max-rdy-count=1"]
    stop_signal: SIGTERM
    healthcheck:
      test: ["CMD-SHELL", "/usr/bin/wget -q -s http://0.0.0.0:4151/ping && exit 0 || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 2
    depends_on:
      - nsqlookupd
    ports:
      - "0.0.0.0:4150:4150/tcp"
    ulimits:
      nproc: 65535
      nofile:
        soft: 65535
        hard: 65535
    sysctls:
      net.core.somaxconn: 32767
  nsqadmin:
    image: nsqio/nsq
    command: /nsqadmin --lookupd-http-address=nsqlookupd:4161
    healthcheck:
      test: ["CMD-SHELL", "/usr/bin/wget -q -s http://0.0.0.0:4171/ping && exit 0 || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 2
    depends_on:
      - nsqlookupd
    ports:
      - "0.0.0.0:4171:4171/tcp"
    ulimits:
      nproc: 65535
      nofile:
        soft: 65535
        hard: 65535
    sysctls:
      net.core.somaxconn: 32767

Still running Beanstalkd? Dont forget kernel networking settings

Unfortunatelly beanstalkd listens on TCP, and you will be normally affected by TIME_WAIT time on default kernel settings, limiting number/speed of your queue. See How TCP backlog works in Linux for details.

I recommend changing following settings on your linux box

net.ipv4.ip_local_port_range=16384 65000
net.ipv4.tcp_fin_timeout=30
net.ipv4.tcp_tw_reuse=1