Calculator Export
The second tab offers technology to screen the defined dataset transforms for what is appropriate for Online (streaming or run-time) applications and what is necessary for the selected Export variables. Note that the selected variables change via Add or Remove that the selected transforms will be updated and should be reviewed again. In general, when exporting data, the order to prepare exported data is:
- Select variables in the Data Export tab. Formats may be updated now or at any later time.
- Calculate Delay ID for variables of interest to align relative time of different variables.
- Review and repair Calculator Export as needed to support an online operationalization of machine learning developed (trained) from that data export.
As much automation as possible is built into the transform filter, but it is sometimes necessary to remove filters or transforms used to exclude some data for online applications. As discussed:
- Transforms are not necessary to generate the selected variables and patterns are automatically excluded.
- Transforms that are time- or batch-dataset dependent or otherwise inappropriate for streaming calculations must be excluded and will be marked “error” unsupported. User may unselect these transforms directly in the window or choose to modify the transform list to replace unsupported, with run-time/streaming supported transformations.
- Transforms (or filters) that are necessary to prepare appropriate data for targeted machine learning may by the user be unselected on the Calculator Export page. Data will not be modified, but the streaming preparatory calculations will be appropriate. Examples include:
- Bad operating data periods removed for anomaly detection model training (learn normal or good), that should be detected as bad if they reoccur.
- Operating periods removed to select developing failure periods for model training on Predictive Maintenance or other Failure detection that should test all operations to detect what failures may be occurring at that time.
It is the user's responsibility to review the transformations on selected data and with this interface inactivate transformations not desired as preparatory calculations prior to scoring a machine learning application. In addition, the user should modify unsupported, but desired transformations to enable operationalization of desired components of each unavailable transformation.
- Individual checks (or all at header) can be selected or unselected for variables or transforms as desired.
- Selected Dataset- If variables from more than one dataset are selected, they will be available on each export tab, but for transforms should all be reviewed individually by selecting the targeted dataset at the top.
- Filter Samples- Whenever a data filter (here there is a “High Power” filter selected and displayed on the bottom display banner along with all active datasets) is selected the user has three options for how streaming data that is excluded via the data filter occurs:
- None - do not exclude data filtered, but in streaming include these values and process this data.
- Skip - exclude this data as filtered, but do not pass any data samples to a next data processing step.
- Error - exclude this data as filtered but convey the filtered data as a complete row of “Error” status.
- Scale Data- Data may be scaled as set either in the Calculator or the Data tab. These settings are shared. This is to support external toolsets that assume data is normalized or scaled prior to machine learning and also desired prior to streaming ML scoring.
- Use Valid Identifiers- is also shared with the Data tab and adjusts data labels to modify special, sometimes unacceptable characters to simplified character set options (for example, supported in DataFlowML).
- Generate Comments- adds comments to the calculator script that describes functions in different script sections. This is not required by the streaming calculation processor but may be helpful for troubleshooting or later changes.
- Include User Comments- does what it describes to retain comments built into the transforms that is particularly relevant for full line, e.g. header, documentation. Comments within the transforms may or may not have been used.
- Execute Calculator- describes how the calculator and streaming pipelines will be executed, i.e. On Input Events: when values change; On Fixed Period (Automatic): the longer of all data update frequencies in all selected variables; On Fixed Period (User Specified): as entered here or updated entered in the online environment.
- Periodic- dependent processing may be configured differently depending on these settings, but the user may enter the expected execution frequency.
- Generate Log File- generates additional log information useful if the calculation export required troubleshooting.
- Export Calculator- Exports the defined manifest files and calculator file.
NOTE:
In the exported
calculator.xml
file, a value of ‘-999.0’ is interpreted as bad data by Data Explorer's streaming calculator as a way to exchange data quality on systems that do not support quality or data status exchange directly.Provide Feedback