Building the foundation

Digital transformation can feel like a buzzword that’s hard to grasp and even harder to implement.

In our last post - Getting started with digital transformation - we discussed how a well-structured roadmap prepares organization to digital transformation. In this article we look into an essential building block of digitalization; collecting and storing data.

Building a house is not started with the walls or the roof – it begins with a strong, solid foundation. In the world of digitalization, that foundation is reliable and trustworthy data. Without quality data, even the best digital tools and systems will struggle to deliver meaningful results.

Making business decisions based on inaccurate or inconsistent data could set you up for costly mistakes. For asset management professionals and operations managers, this could mean misinterpreting equipment health, overlooking maintenance needs, or failing to optimize performance, all of which could lead to downtime and lost revenue. Reliable data enables you to make informed decisions with confidence. You know exactly what’s happening with your assets, and you can take action based on real insights rather than guesswork. Turning data into actionable information drives improvements and keeps your operations running smoothly. But using operational, real-time data is not easy. It traditionally has been hiding in OT networks in SCADA or DCS, usually accessible only by operators through HMI, not available for maintenance & planning teams and process engineers.

Good Data?

Often people are referring to digitalization being a journey and for a good reason. In its' core, starting a journey requires same steps regardless if you are starting a multi-year digitalization project or just visiting grandparents in next town. You need a roadmap which will help you to navigate to your destination, as knowing where you are going usually helps to get there slightly smiling face.

Navigating through digitalization roadmap is kind of like navigating with normal street map, you need to answer same questions before starting either one:

‍

Accuracy - Since the dawn of computing "Garbage in - Garbage out" has proven to be true. If your data is not accurate, all your plans for building amazing analytics, machine learning, AI capabilities are immediately in trouble.
Up-to-date - You want to have real-time access to your process or production data. New infrastructure does not help if half of your data is still coming in through manual inputs or CSV imports few times per day.
Trustworthiness - Users must trust the data to use it in decision making. If they lose their trust for the data, they will quickly stop using it at all and gaining that trust again will be difficult. Remember involving subject matter experts in planning process from the previous article? Having them onboard will help you to identify problems with trustworthiness early on.
Understandable - Not everyone is an operator, who know instrument tags and P&IDs by heart. Presenting data in context with physical assets makes it much more communicable to everyone and reduces risk for errors.
Accessibility - Storing all the data in the world does not help much if users can’t get their hands on it. They need easy to use tools to access the data from outside the control network.

‍

These were requirements just for the data. On top of that, the infrastructure itself has to be secure, reliable, fault tolerant and scalable.

‍

Start small and remember the big picture

Before implementing large scale data collection, make sure to review if your selected technology stack really meets all the requirements above. Using right tools can save you from severe headaches later on, when you need to contextualize, analyse or share data. Best way to validate selected technology’s capabilities and expected outcomes, is to start with a pilot project and small set of assets. Find out which production sites has best instrumentation coverage and pick a set of well understood assets to make data quality validation easy. Setting up data collection and storage in small scale is doable in couple of weeks, or even in just days, if your project team is set up efficiently. Small scale, agile, ‘fail fast’ approach avoids committing lot of time and resources to project until you know that the selected approach works well.

Several commercial and open-source tools are available for fetching and storing real-time data. Getting data out from OPC UA Server or MQTT broker? Storing data permanently? No problem, just make your pick from multitude of open-source timeseries databases and connectors for various data sources. But collecting and storing the data solves only the first part of the problem; having accurate, up-to-date and trustworthy data. This is were things start getting a bit more complicated.

Saving instrument data into timeseries database does not make it any more understandable than it was at the source. It may be bit more accessible, but you need to know which instrument tag you want to see.

This is where contextualization, different analytics capabilities and reporting comes in and the list of available tools narrows down significantly. I’ll skip contextualization for now, we will talk more about it in next article of this series.

That out of the way, let’s continue to data collection and storage.

Validate selected technology’s capabilities and expected outcomes through a pilot project and small set of assets.

‍

Data Collection

Requirements around data collection have changed in recent years with companies swiftly moving away from Modbus, OPC DA etc. to more versatile protocols like OPC UA or MQTT. Often fastest way to add new instrumentation is to bring in an IoT device, pushing all data directly into cloud without communicating with existing control system at all. We are seeing increasing number of Edge devices, processing, refining and analysing data close to the source before it is passed on to cloud or SCADA.

As an AVEVA PI System Integrator, we focus on what the AVEVA PI System has to offer in this field. Mix of proprietary protocols, various industry standards and modern IoT communications is not a problem as AVEVA PI System’s Adapters, Connectors, Interfaces and Connect cloud service support over 200 industrial protocols to bring data out from control systems into historian. Advanced failover, buffering and history recovery capabilities make sure that no data is lost during unexpected hardware or network outages.

‍

PI Interfaces connect to legacy systems, useful for organizations relying on older infrastructure. They are way to go if you need to use OPC DA, HDA, Modbus, or other protocol unfriendly with firewalls. These typically run on a dedicated server close to the source, connecting directly to PI Data Archive.
PI Connectors provide a more modern and versatile option for collecting data over OPC UA or IEC 61850. OPC UA makes data collection easier across different systems and networks. Connectors can automatically discover data sources and create element structures in Asset Framework. Centralized configuration and automatic data discovery make PI Connectors ideal for quick integration with various data sources.
PI System Adapters are the latest addition to the PI System’s data collection tools and fit especially well edge computing and distributed architecture scenarios. They are designed to be lightweight and flexible, capable of operating on Windows, Linux or as containers. They can be deployed on edge devices, making them perfect for collecting data from remote or distributed assets.
CONNECT data services is a fairly new secure cloud platform, designed for managing industrial data. It is easy way to collect your IoT data from various sources in one location, where it can be accessed in real-time or shared with external partners. CONNECT also provides services in cloud for data contextualization, analytics and visualizations, but we are not going to touch them for now.

‍

Key considerations for storage

A strong data foundation includes a good historian, which needs to do much more than just store timeseries data for a long time. Historical real-time data must also be accessible, reliable, and performant across the organization.

It should have be able to store data with future time stamps for forecasts, predictions. On technical side good historian is also scalable, supports data compression, can fetch vast amounts of data quickly, ensures data integrity, optimizes storage usage, supports high availability and is secure.

Based on our experience, there are few other key focus points which need special attention before and during implementation of new data archive:

‍

Understand OPEX - This is amazingly often overlooked factor when setting up systems in cloud. Real-time data comes in fast and there can be lots of it. And I mean LOTS. If you are involved with big windfarms, Oil & Gas or Mining, consider yourself lucky if monthly ingress is measured in gigabytes instead of terabytes. You need lots of fast disks, which in cloud can quickly spin costs out of control unless you plan archive file distribution carefully. Getting an unexpected 200K EUR invoice every month for cloud services may drive project success rating down a bit…
Plan scalability - Think long-term. Your infrastructure should scale with ever increasing amounts of data. Consider factors like expected data growth, potential new data sources, and future digital initiatives.
Redundancy and Backups - Real-time data has tendency to find its way to business critical processes. There has been several cases, where data archive turned out to be a business critical application after an unexpected outage prevented users from accessing it. It is always better to set system up with high availability and redundancy, and carefully understood backup processes. Also keep on practicing backup and restore processes to bring systems back online as quickly as possible. As mentioned before, timeseries databases can have massive amounts of data, which can pose problems for both backup and restore.
Security and Compliance - Make sure your historian setup aligns with any industry regulations on data storage and handling. Encryption, user access controls, and regular audits can help keep sensitive data safe and compliant. Also think carefully how you are going to address data security. Your data archive may support very granular data access control, down to individual data stream level. You need to have a clear strategy on how to manage access as it can quickly turn into a massive, time consuming manual task.
Optimization - Regardless what historian solution you use, optimization will be an ongoing task. Users will always ask faster query speeds and store data with higher frequency. And then there will always be that one user, who runs a query to plot 10kHz vibration data from 3 pumps for last 12 months in Excel. Keep eye on system performance and usage patterns if possible and consider tuning system parameters when necessary. User training is also kind of optimization, educating users how to query data to avoid excessive load to archive.

Accessible, reliable, and performant across the organization.

‍

Performance tuning for AVEVA PI Data Archive

One of the most powerful, but at the same time most underutilized, configurations impacting the AVEVA PI Data Archive performance are compression and exception settings. They filter out unnecessary noise from your source data, giving up an option to save lots of disk space (cost), reduce network traffic and increase system’s overall performance.

Idea behind both compression and exception handling is to filter out noise, without risking losing meaningful data. At minimum, filtering should be set to ignore repeating values. We also recommend to filter out readings falling within the accuracy range of the instrument. If your temperature sensor has accuracy of 1°C, storing values in 0,1°C accuracy usually does not make any sense. (Maybe it is worth saving if you plan to use AI or ML to identify potential sensor degradation through increased noise).

There is old, but still valid, excellent Youtube clip from OSISoft times, explaining compression and exception logic perfectly OSIsoft: Exception and Compression Full Details

Hopefully this article helps you to identify challenges and risks when starting to implement the data infrastructure of OT/IT convergence. If you need any assistance, The NodeIT AVEVA PI consultants are ready to support your journey. We have a highly experienced senior team, avraging 10+ years of experience in industrial digitalization, helping Industrial cliants successfully design, implement, operate, and fully leverage the AVEVA PI Data Infrastructure solution to increase productivity, improve efficiency, and reduce emissions.

‍

A part of our DNA is about collaborative growth. By that we mean sharing knowledge and working together to shape a smarter and more sustainable future! Stay tuned for our next article, where we’ll delve deeper into the intricacies of data contextualization, its role in enhancing data usability and integrations with external systems like CMMS and MES.

‍

Sincerely,

The NodeIT team