This article was first published on Ubex - Medium
The process of data collection from webmaster websites.
The Ubex project continues its updates on the development of the platform. In this release, we will be discussing the process of data collection from webmaster websites and how data collection is organized for training the system core.
The webmaster installs our pixel on the pages of their website. When the user site opens the page, the JS code is executed and collects data. Then it sends the data to our receiver (collector). During the development of the collector, we carried out many tests on the speed of data processing and were able to overclock the collector for processing ~ 17,500 requests per second (RPS). Considering that in every region of the world we have separate servers with balancers, on average we can receive up to ~ 250,000 requests per second (RPS). For the moment, such power will be enough for us, and if we need to process large volumes of AWS, it will help us out.
After the data has arrived in the collector, we do not write them directly to the database, but place it in a queue for processing. This is necessary to reduce the overall load on the collector and ensure the stable operation of the entire system.
For the queuing system, we have workers who pack up the records in the queue and distribute the data in the databases. We separately write data to the log, and then write the aggregated data to the database for display in a personal user account. We need a log for data recalculation from scratch in case such an emergency arises. We use PostgreSQL as a ...
To keep reading, please go to the original article at:
Ubex - Medium