Analytics
Overview
The following components run on production (school) servers:
- Rusty Bridge as a service. Handles incoming MQTT events. To be removed in future.
- Redstone as a service. Backend to the analytics UI, API endpoints, and CAM backend. To be removed in future.
- Redstone 2 as a service. Will absorb Rusty Bridge, Redstone and Enderman components in future.
- Analytics Server UI compiled frontend served by Nginx.
- Enderman tool scheduled to run every day. Performs data processing. To be removed in future.
The Furnace service runs on analytics-collector.it.si
, handling data uploads from production servers, and interfacing with GCP.
Another Furnace service and Pickaxe frontend will run somewhere when the internal analytics website is deployed, providing reports and searching.
Data Flow
Data is collected into the analytics domain from these sources:
- EC Imports - Data is imported from the alchemy DB as well as over HTTP requests.
- App Events - Events are received from the apps over MQTT from the Mosquitto broker.
- Admin - school data is imported from admin over HTTP.
The first two sources are on production servers and the data is uploaded to central location using the minecart system. All data is then stored on GCP.
Components
Rusty Bridge
https://gitlab.it.si/analytics/rusty-bridge
This runs on every production server and is responsible for receiving device events from Mosquitto, and forwarding them to various other places. Events are encoded using protobuf. Proto definitions can be found here:
https://gitlab.it.si/analytics/protos
Each MQTT message may contain multiple protobuf events.
Events are checked for validity as follows:
- App time must be recent and not far in the future.
- Some required fields must be set like the username.
- The event must not be a duplicate, which is checked by creating a hash of the raw protobuf data and comparing it to events already in the DB.
Invalid events are inserted into Postgres in the reader_events_errors
table, which is periodically cleared.
The following actions are taken on valid events:
- Parsed and inserted into Postgres in the
reader_events
andraw_events
tables. - Raw protobuf published to Redis under the
protos
channel. Each event is published separately, they are never grouped like in the original MQTT message. - Processed into json and published to Redis under the
lp-events
channel. EC uses these events for LP progress tracking. - Pushed into a Kafka queue. This is legacy but currently still happens. The Kafka queue is not used anymore.
This project will eventually be merged into the Redstone 2 project. Most work for doing this has already been completed.
Redstone
https://gitlab.it.si/analytics/redstone
This runs on every production server and is the backend to the analytics webpage. In addition to the endpoints used by the web interface, it also serves the following:
students/...
used by the apps to show student analytics.book-page-times
andtimeline/...
used by apps.old/...
endpoints presumably used by someone old and legacyey.portal/...
used by portal for the stats shown there.devops/...
endpoints serving some reports.ec/...
endpoints for checking resource opened status.
It also runs the monitor backend for all CAM user tracking.
This project will eventually be replaced by Redstone 2, and work towards this is in progress. When this happens, the Redstone 2 project will be renamed to just Redstone.
Redstone 2
https://gitlab.it.si/analytics/redstone2
This runs on every production server and will eventually replace the Redstone, Rusty Bridge and Enderman projects. It has already replaced the retired Minecart and Sentinel projects. It also has new functionality not previously implemented.
These are the subcomponents contained in this project:
Events handling
System to replace Rusty Bridge, and eventually transition us from using MQTT to using websockets. All functionality as described in the Rusty Bridge section has been implemented over websockets.
Timeline
Shows recent events for a student.
Minecart
Uploads events and data imported from EC to Furnace server running on analytics-collector.it.si
. Events are uploaded incrementally, and other tables are uploaded only if modified. All tables are uploaded once a day in avro format http://avro.apache.org/docs/current/.
Sentinel
Health check endpoints for use by Zabbix.
EC Import
Data import from the alchemy DB, as well as some imports over HTTP. Runs once a day. Replaced the import functionality in Enderman.
Data Processing
System to replace data processing in Enderman. Currently only some flashcard processing is used in production, but most progressing has been implemented.
Eventually this will also serve as the backend to the analytics frontend.
Analytics Server UI
https://gitlab.it.si/analytics/analytics-server-ui
This is the analytics frontend running on every production server. It calls Redstone and Redstone 2 API endpoints for data.
Enderman
https://gitlab.it.si/analytics/enderman
This tool is present on every production server. It is run once a day and performs analytical processing. Results from this is shown on the frontend and some of it uploaded over the minecart subcomponent. When all processing has been moved to Redstone 2, this component will be dropped.
Furnace
https://gitlab.it.si/analytics/furnace
This is the service running on analytics-collector.it.si
. It handles the following tasks:
- Receives avro data from the minecart system in Redstone 2 and uploads it directly to a bucket on Google Storage. Also receives some metadata from minecart, and creates a document on Firestore with info about the upload and metadata.
- Initiates the transfer of avro uploads into BigQuery once a day. This uses the documents created in Firestore to plan the transfer. Status of the transfer is broadcast over Google Pubsub.
- Fetches school data from admin and uploads it as avro data to Google Storage. Immediately initiates a transfer of the data into BigQuery.
- Upload progress and metadata report endpoint. Compiled using data from Firestore.
- School data report endpoint. Compiled using data from BigQuery and Firestore.
- Book unlock report compiled using data from BigQuery.
- Search service. Search index is built using data from BigQuery and the Bleve engine is used https://blevesearch.com/. The search index is rebuilt each time after the data trasfer into BigQuery is complete, by listening to broadcasts from Google Pubsub.
- Auth and other endpoints for the Pickaxe frontend.
Most of these services can be turned on and off in the config.
Pickaxe
https://gitlab.it.si/analytics/pickaxe
This is the frontend of the internal analytics website. It is currently under development not running anywhere except my laptop. It calls API endpoints provided by Furnace. Currently it only has two sections, the search page and the schools table.