Addressing Network and Service Monitoring Challenges in the 5G landscape

motitoring image

In the next years, 5G infrastructure will become a ubiquitous, flexible, broadband and programmable network that will be in the core of every social, business, and cultural process, enabling both economic growth and social prosperity. In order to achieve this goal, the 5G vision poses significant technical challenges that must be fulfilled, including the concept of agile programmability and supporting the introduction of management mechanisms for the efficient instantiation of innovative services across heterogeneous network components, virtualized infrastructures and geographically dispersed cloud environments.

One of the important issues to be addressed in this new era of 5G service management is related to network and service monitoring, demanding for the collection and processing of network, computation and storage resources involved in the lifecycle management of 5G services. However, the already available monitoring tools do not achieve to satisfy the requirements stemming from the services envisioned in the 5G landscape, since they are in most of the cases:

  1. Intrusive and heavy-handed for short-lived, lightweight network function instances.
  2. No able to follow the fast pace of management changes enforced by continuous dynamic scheduling, provisioning and auto-scaling
  3. Not covering the requirements of all the involved emerging technologies, including deployments in both hypervisor-based and containerized manner, as well as monitoring data collection from different cloud environments (OpenStack, VMWare, etc.).

In a nutshell, the SONATA monitoring framework collects and processes data from several sources, providing the developer the ability to activate metrics and thresholds in order to capture generic or service-specific behaviour. Moreover, the developer can define rules based on metrics gathered from one or more VNFs deployed in one or more NFVIs in order to receive notifications in real time. In general, the developer is able to subscribe to a message queue or he can get the alert notifications by email and/or SMS on his smartphone. Most importantly, monitoring data and alerts are also accessible through a RESTful API or directly accessing a websocket URL.

One of the cornerstones of the monitoring framework implementation was to deliver a carrier-grade solution that would fulfill scalability requirements in a multi-PoP environment and thus several components of the Monitoring Framework had to be distributed across the SONATA Points of Presence (PoPs). First, each PoP must have its own websocket server to accommodate developers’ demands for streaming data, although the management of websockets is handled by the Monitoring Manager instance in a centralized way. Second, Prometheus Monitoring servers follow a distributed (cascaded) architecture. The local Prometheus servers collect and store metric data from the VNFs deployed in the PoP, while only the alerts are sent to the federated Prometheus server for further processing and forwarding to the subscribed users. Moreover, the alerting rules and notifications are based on monitoring data collected in different PoPs and so the decision must be made on a federation level. Another scalability requirement concerns the large flow of data from the monitoring probes to the Monitoring Server and its respective database that might affect the service performance in extreme cases. In this respect, an architectural decision to address this scalability issue was to support a distributed architecture regarding the monitoring server and its database, working in a cascaded fashion along with proper modifications on component level. In particular, the functionality of the monitoring probe will change so that it will not send data to the monitoring server in cases where the value difference is less than a threshold defined by the developer.