Model Data

Data exploration and statistical analysis is a predominant way to learn new insights from available manufacturing data, but a secondary option now available with Data Explorer is model analysis. One objective of data input/output modeling and model analysis on the type of explainable models now available in Data Explorer is to learn process or system relationships. A good model represents the system and not just the training data and Data Explorer modeling includes a broad number of capabilities to produce a good, explainable model (model explainability is one way to validate a model represents all that is known already about a system).
A Soft Sensor
®
model is a regression of any measurement or calculation from plant data. A common use case of a Soft Sensor is a Virtual Online Analyzer
®
application, which frequently builds a model of periodic laboratory or other analytic sampling, and the Virtual Online Analyzer option can be used for either of these objectives. The only difference is that the measurement is an analyzer, while any type of Soft Sensor application may or may not reference an analyzer as the model output.
From any Analysis or Plot set of variables as with Export above, the user may select variables of interest, which in general includes candidate or significant input indicators (direct or calculated measurements) and the target output (the sensor or analysis) measurement. Two types of modeling are supported in Data Explorer's hybrid model framework:
Expressions
are equation-based models (may be a series of expressions) that are frequently physics-based or at a minimum anticipated equation-based relationships. Model training is generally the determination of expression coefficients from plant data.
Artificial
Neural Network
models are empirical, learning algorithms able to learn any single expression set from available data. Its strength is it's a universal approximator of any expression system and its weakness is the same thing. The engineer managing training will want to work with data to match to the actual expression system being inferred. A powerful set of tools are provided to support engineering guidance and to leverage knowledge to get better models from nonideal data.
The goal of machine learning is to learn the actual system in these forms of digital twins (input/output models) from available historic data. Data Explorer and the user work together to avoid learning only the historic patterns and to perform poorly on future data.
Select Variables
Select Variables for initial model setup from dataset, plot, or analysis views
Two types of models are supported Expression and Artificial Neural Network (ANN) models. Both support any number of inputs or outputs, which are particularly useful on expression models based on physics-based equations. ANN models are generally more accurate built as single-output models but can accelerate development and deployment on highly integrated systems of input/output measurements.
Select expression models where equations, particularly physics-based expressions are known. Model training will generally identify the best equation coefficients from wear-dependent conditions. Physics-based models have the advantage of learning with limited data sampling and provide the most straight-forward strategy for causal models that map to system fundamentals. Many common Data Explorer expression use cases will leverage short-cut mathematical models.
Select ANN models (neural network) where the expression system is not known as ANN as implemented in Data Explorer (a three-layer ANN with nonlinear activation functions) has been proven capable of learning any single expression set (a universal approximator). ANN is fully data dependent and does anticipate on the order of 20-40 samples (rows) of labelled (measured output) data per input variable selected. More labelled data is generally better. Supplemental toolsets are provided within Data Explorer to help lead empirical ANN modeling to match existing causal knowledge.
Note that both model types support multiple output models, although delay ID shifts input features by row (time-step) simultaneously per model so this functionality sometimes recommends training and instantiating models independently and then setting up and running models in a composite fashion if simultaneous execution is required (this is currently still a manual process in the integrated Pavilion8 SolutionBuilder).
Provide Feedback
Have questions or feedback about this documentation? Please submit your feedback here.