Preamble:
Though this document (authored as white paper a few years ago) describes the service as seen from mobile phones to server environments, the same principles hold true when connecting other clients (work stations, business networks) to, say, hosting services.
Overview
While the operations center monitors all systems, plot performance, server and network load, storage utilization, database health, the overall availability and throughput to ensure premium service, there may be still situations where the end-user would see a completely different picture. There may be other factors at play that could very well impact phone service and contribute to a poor consumer experience.
The lack of transparency to an end-to-end mobile data service could easily result (unknowingly) into a situation where users get upset, and even more so when customer support is not able to handle the calls in an expedient and satisfactory manner. Mobile operators need insight to critical key performance indicators (KPI) to ensure continuous consumer satisfaction.
Service providers are interested to find out the where/when/how to learn of problems and ideally address the incidents before the customers start alerting the service hotlines and perhaps become upset when service personnel appears ignorant to problems, keep aside offering a solution!
Customers and operators alike can experience service becoming degraded due to traffic peaks and network loads, where availability and performance may drop, when increased demand would cause bottlenecks. Users then get upset when tasks cannot be done, as timeouts and aborts hinder the phone use.
Mobile phone operators then feel the users’ pain when hotlines are bombarded, and even worse, when frustrated customers then turn to a competitor. To repair the damage, the reputation and to rebuild consumer confidence, to win back customers can become quite an expensive ordeal. Usually the cost ratio between preventive measures and restoring the service after a failure can easily reach a number somewhere between 1:10 (yes, one to ten) if not 1:20 or much more.
Device key performance indicator (KPI) tools can be designed to measure end-to-end email, web browsing, responsiveness of certain Apps, and more. That will allow giving an inside view to the overall experience from the customer perspective. Various phones can be assigned as monitors with the KPI tool implemented.
Every so often (15 minutes?) the collected KPI data is sent from the phone to a dedicated database web service to provide for statistics and trend reporting.
Following the spirit of the original KPI service tools will be provided to the mobile phone world to help track the end-to-end experience of the mobile service in a selected area or around the globe at any time to ensure consistent and desired level of “Quality of Experience” (QoE). To be proactive is key!
Scope
To monitor and record system response to a mobile phone and to analyze and help improve overall service, critical performance data (seen) on a mobile phone will be captured. That could be simple log reporting, and it could be a more sophisticated App running selectable, pre-defined user profiles, simulating to some extend how end-users would use their mobile phone. The so collected data will provide a view into the real world from the consumer perspective and help the mobile operator understand where and when quality would be compromised.
Apps may be distributed / loaded on phones set aside for that purpose and/or phones of (selected) Microsoft and Mobile Operators personnel. Scripts will describe a user profile to benchmark (e.g., load a web site, send a message, receive an email, etc.). The App would run independently without disturbing or interrupting the (human) user of the phone.
In more generic and smaller fashion customer service could probe end-to-end phone service from the consumer perspective by issuing and measuring I/O blips of test data blocks and/or capturing logs from a phone — or even a defined school of phones; etc..
A designated QoE server will collect the data from the connected phones and display statistics and alerts and is essential part of the operation center showing the service from the outside. The data can be further filtered by geographic locations (cell towers), correlated to peak vs. quiet times, network or system load, and to transfer times probed internally and externally throughout the various services.
Goal is to get a sense of the client or phone users’ real-life experience, to allow troubleshooting, perhaps pin-pointing to device, the application, or the mobile service, to implement corrective actions before problems become widespread. And to help network operations look outside their “ivory tower” and further improve the monitoring and sustain positive consumer experience.
Scenario
Phone / Client Status
- The mobile phone QoE App is continuously / on demand / during certain intervals connecting to various services, logging send requests + response receipts —simulating common user profiles.
- The end-user could use the tool to test the connection status, perhaps even trouble-shoot his/her phone
- Customer Service connects to a user’s phone, probes phone health, line quality and connection status.
- Server connects to (defined) groups of mobile phones and collects logs with overall phone and line benchmark data.
Server Backend
- QoE Server monitors “its” phones, requests/receives data and displays key performance data.
- Collected data is qualified and stored in a QoE database to allow spot-on analysis and trends.
- Data is correlated with service performance matrix to pin-point problem areas, and to challenge (internal) network operations display.
…
Specification
Key Performance Data
How:
Phone responsiveness is monitored:
- Log files, test data I/O
- QoE App / Scripts (Email, Web, Messaging, real-time vs. batch tasks, etc.)
Who:
Identify client or phone:
- Client or Phone, OS, etc.
- IMEI, ICC ID, etc.
- User ID
Where:
Location parameters:
- GPS coordinates
- Cell tower connected
- Provider (roaming)
- signal strength / ASU
When:
Timestamps recorded:
- Request sent out; start & finish
- Confirmation receive time
- first byte of message header (email) or web page (web)
- Task completed & total bytes
What:
Data collected:
- Device health (OS, Uptime, CPU %, Mem %, NW load, …)
- Apps loaded, version number
- Phone to Provider Network (to Partner Network to …?)
… and back!
Device QoE Support
The phone collects status changes and other data along with time stamps in a log. Customer Support can retrieve the log files, and can enquire overall phone health; if necessary test I/O can be initiated.
A special App can do much more — continuously and repeatedly — executing the tasks in specified scripts. Tasks are actions a phone user would do at any given time, to place or answer a telephone call, sending and receiving emails, messages, browsing the web, etc.
Collected performance data will be then sent periodically or on request to the service.
Service QoE Tasks
Captured performance data (from the mobile phone) is collected, sanitized, and stored in a database. From there reports can be prepared to help monitor overall network quality as experienced from the outside, i.e., here, the mobile phone user.
Furthermore, this “outside” data can help putting the internal monitoring of network operation with its many individual components into perspective. Of course, communication between those components and their interrelationship are continuously probed, the health of each service and the servers with their processor and memory and network and disk utilization painstakingly watched. All key service data is continuously gathered and reported, the overall network usage plotted so that performance in desired levels can always be assured— even or in particular during peak demand.
Nevertheless, from the network operation center to the mobile phone is a long way. And not all entities responsible for the communication to the mobile phone are under the (direct) control of said center. Are the data trunks to the mobile provider’s central office working within their parameters? What about the wireless access points? Or the network traffic “in between”?
Last but not least we cannot exclude problems with the phone.
Closing the Loop
[Analytics]
- Integration to overall operation center monitoring and reporting
- Correlation of “inside” and “outside” probing
- Data comparable, scalable, representative
- …
Mobile Phone Services Topology
[Visio chart]
QoE Program Flow
QoE Scripts
- Define start/stop times and repeat cycles, timeouts
- List task to execute (e.g., email, web, messaging, streaming, down/upload, etc.)
- Contains “What–If” scenarios (e.g., cannot load website, insufficient resources, etc.)
- Send collected data periodically to …
QoE App
- Retrieves static environment data (“who”)
- Executes scripts as specified, launches tasks
- Measures Response times (“when”)
- Records dynamic data (“what” and “where”)
- Distributes to QoE Service
QoE Service
- Collects data from mobile phones / QoE App
- Qualifies, accumulates, analyzes, reports, plots trends
- Complements / challenges internal monitoring
QoE Requirements
- Service availability and responsiveness
- Real-time access 24/7, always on, no wait
- Fault-tolerance, resiliency, redundancy
- Data security and reliability
Basic Program Structure
[tbd]
Constraints
[tbd]
- What does the given mobile phone (standard or debug) log contain.
- Can customer support / network service initiate benchmark I/Os on a connected mobile phone.
- It is assumed the App can retrieve all necessary health data, initiate the proposed tasks, gather key performance data and send all that (in compressed form) to the service.
- Scripts can be readily distributed to the mobile phones to program and set the App to probe specific services.
- Can the QoE App and/or the service setup their own (QoE) user IDs to allow sending emails, SMS/MMS, etc. to themselves; what about hooks into partner systems to send (or echo) IMs, tweets, and other messages or content.
Examples
Program Flow
Script Example
Prerequisites
Web Benchmark
Email Benchmark
SMS Benchmark
IM Benchmark
APP1 Benchmark
APPn Benchmark
Other Benchmarks
Appendices
Environment
Specific services can connect to (defined) groups of phones to collect more generic benchmark data – and/or troubleshooting an individual phone
App running on Windows Phone for sophisticated and continuous monitoring; flexible scripts allow easy change / adapt to resemble (representative) user profiles
If necessary, dedicated user accounts as defined for the App to perform variety of phone functions w/o impacting the environment or the user of the phone
Collected data is sent to dedicated server and database farm to analyse / plot / monitor / alert
The service provides appropriate support for the QoE tasks running on the mobile phone; it is feasible to implement the various features in phases.
QoE Scripts
Universal application, running series of scripts repeatedly to perform individual tasks
Telephone, voice + data + control
Send & Receive emails (text only vs. large attachments)
Messaging (IM, SMS, MMS), Social Network
Web site (simple vs. complex), file up/download
Real-time applications (e.g., navigation, online games, streaming, etc.)
Control Parameters
App running distributed script(s) as ”user profile”
Script #, Tasks
User/Phone ID
Location & Scheduled Activity
Peak-usage vs. sporadically
Script run settings
Think time
Sleep time
Timeout
What–If
Key Performance Data
What:
Script info, data type, bandwidth, threshold
Time stamps send request & response receipt (aborts, retries)
GPS location, signal strength, mobile connection / service
User ID & Phone ID, (overall) usage utilization and health
Where:
Transmit collected data to monitor service
Immediately or periodically or on demand or some other trigger
How:
End-to-end
Transaction logs
Collect / Measure / Scale
Multimedia or secure data
Real-time vs. batch
Components
Backend
Wireless, LAN/WAN
Proxy, Filter, …
Database
3rd Party Connect
Services
Content
Phone & V/M, SMS
Web & Forms, down/upload, Email, IM
Streaming Audio/Video
FM Radio / TV
Games, XNA
Microphones, Cameras, Speakers
GPS, gyroscope, environment
Data security
Frontend
Device Hardware (processor, memory, interfaces, sensors, antennas)
Device OS
Installed / Running Apps
Wireless Connection
The Unforeseen
Now what?
Data out of sync / invalid;
Loss of connection;
Services failures;
App aborts; …
…