Analytics

Overview

The following components run on production (school) servers:

  • Rusty Bridge as a service. Handles incoming MQTT events. To be removed in future.
  • Redstone as a service. Backend to the analytics UI, API endpoints, and CAM backend. To be removed in future.
  • Redstone 2 as a service. Will absorb Rusty Bridge, Redstone and Enderman components in future.
  • Analytics Server UI compiled frontend served by Nginx.
  • Enderman tool scheduled to run every day. Performs data processing. To be removed in future.

The Furnace service runs on analytics-collector.it.si, handling data uploads from production servers, and interfacing with GCP.

Another Furnace service and Pickaxe frontend will run somewhere when the internal analytics website is deployed, providing reports and searching.

Data Flow

Data is collected into the analytics domain from these sources:

  • EC Imports - Data is imported from the alchemy DB as well as over HTTP requests.
  • App Events - Events are received from the apps over MQTT from the Mosquitto broker.
  • Admin - school data is imported from admin over HTTP.

The first two sources are on production servers and the data is uploaded to central location using the minecart system. All data is then stored on GCP.

Components

Rusty Bridge

https://gitlab.it.si/analytics/rusty-bridge

This runs on every production server and is responsible for receiving device events from Mosquitto, and forwarding them to various other places. Events are encoded using protobuf. Proto definitions can be found here:

https://gitlab.it.si/analytics/protos

Each MQTT message may contain multiple protobuf events.

Events are checked for validity as follows:

  • App time must be recent and not far in the future.
  • Some required fields must be set like the username.
  • The event must not be a duplicate, which is checked by creating a hash of the raw protobuf data and comparing it to events already in the DB.

Invalid events are inserted into Postgres in the reader_events_errors table, which is periodically cleared.

The following actions are taken on valid events:

  • Parsed and inserted into Postgres in the reader_events and raw_events tables.
  • Raw protobuf published to Redis under the protos channel. Each event is published separately, they are never grouped like in the original MQTT message.
  • Processed into json and published to Redis under the lp-events channel. EC uses these events for LP progress tracking.
  • Pushed into a Kafka queue. This is legacy but currently still happens. The Kafka queue is not used anymore.

This project will eventually be merged into the Redstone 2 project. Most work for doing this has already been completed.


Redstone

https://gitlab.it.si/analytics/redstone

This runs on every production server and is the backend to the analytics webpage. In addition to the endpoints used by the web interface, it also serves the following:

  • students/... used by the apps to show student analytics.
  • book-page-times and timeline/... used by apps.
  • old/... endpoints presumably used by someone old and legacyey.
  • portal/... used by portal for the stats shown there.
  • devops/... endpoints serving some reports.
  • ec/... endpoints for checking resource opened status.

It also runs the monitor backend for all CAM user tracking.

This project will eventually be replaced by Redstone 2, and work towards this is in progress. When this happens, the Redstone 2 project will be renamed to just Redstone.


Redstone 2

https://gitlab.it.si/analytics/redstone2

This runs on every production server and will eventually replace the Redstone, Rusty Bridge and Enderman projects. It has already replaced the retired Minecart and Sentinel projects. It also has new functionality not previously implemented.

These are the subcomponents contained in this project:

Events handling

System to replace Rusty Bridge, and eventually transition us from using MQTT to using websockets. All functionality as described in the Rusty Bridge section has been implemented over websockets.

Timeline

Shows recent events for a student.

Minecart

Uploads events and data imported from EC to Furnace server running on analytics-collector.it.si. Events are uploaded incrementally, and other tables are uploaded only if modified. All tables are uploaded once a day in avro format http://avro.apache.org/docs/current/.

Sentinel

Health check endpoints for use by Zabbix.

EC Import

Data import from the alchemy DB, as well as some imports over HTTP. Runs once a day. Replaced the import functionality in Enderman.

Data Processing

System to replace data processing in Enderman. Currently only some flashcard processing is used in production, but most progressing has been implemented.

Eventually this will also serve as the backend to the analytics frontend.


Analytics Server UI

https://gitlab.it.si/analytics/analytics-server-ui

This is the analytics frontend running on every production server. It calls Redstone and Redstone 2 API endpoints for data.


Enderman

https://gitlab.it.si/analytics/enderman

This tool is present on every production server. It is run once a day and performs analytical processing. Results from this is shown on the frontend and some of it uploaded over the minecart subcomponent. When all processing has been moved to Redstone 2, this component will be dropped.


Furnace

https://gitlab.it.si/analytics/furnace

This is the service running on analytics-collector.it.si. It handles the following tasks:

  • Receives avro data from the minecart system in Redstone 2 and uploads it directly to a bucket on Google Storage. Also receives some metadata from minecart, and creates a document on Firestore with info about the upload and metadata.
  • Initiates the transfer of avro uploads into BigQuery once a day. This uses the documents created in Firestore to plan the transfer. Status of the transfer is broadcast over Google Pubsub.
  • Fetches school data from admin and uploads it as avro data to Google Storage. Immediately initiates a transfer of the data into BigQuery.
  • Upload progress and metadata report endpoint. Compiled using data from Firestore.
  • School data report endpoint. Compiled using data from BigQuery and Firestore.
  • Book unlock report compiled using data from BigQuery.
  • Search service. Search index is built using data from BigQuery and the Bleve engine is used https://blevesearch.com/. The search index is rebuilt each time after the data trasfer into BigQuery is complete, by listening to broadcasts from Google Pubsub.
  • Auth and other endpoints for the Pickaxe frontend.

Most of these services can be turned on and off in the config.


Pickaxe

https://gitlab.it.si/analytics/pickaxe

This is the frontend of the internal analytics website. It is currently under development not running anywhere except my laptop. It calls API endpoints provided by Furnace. Currently it only has two sections, the search page and the schools table.