What is a Data Historian?

A data historian is a software system that continuously collects process data from control systems, timestamps every value, compresses the data using algorithms that preserve process fidelity while dramatically reducing storage requirements, and provides fast retrieval of that archived data to applications that need it, whether for trending, reporting, compliance, or analytics.

The historian originated from the paper chart recorder era of process control. When pneumatic instruments gave way to digital control systems in the 1970s and 1980s, operators lost the continuous paper roll that showed exactly what every instrument had been doing for the past week. The process historian was the digital replacement: software that continuously captured every sensor reading and control system output, stored it efficiently, and let engineers scroll back through history to understand what a process had done and when.

That core function -- collect, compress, store, retrieve -- has not changed. What has changed is scale. A modern plant historian may track tens of thousands of tags at subsecond collection rates. A corporate enterprise historian may aggregate data from dozens of plants across multiple continents. And the downstream consumers of that data now include machine learning models, AI agents, and cloud analytics platforms that were not imaginable when historians were first deployed.

Tag: In historian terminology, a tag is a named stream of process data corresponding to one measured or computed variable, such as a temperature sensor, a flow transmitter, or a calculated KPI. Every value stored in the historian belongs to a tag. A tag has a name, a data type, engineering units, a collection rate, and compression settings. The word comes from the physical metal tags that were attached to instruments on the plant floor to identify them before digital systems existed.

Historian vs database: why industrial data needs a specialized system

Engineers new to historian technology sometimes ask why a standard SQL database cannot serve the same purpose. The answer lies in the fundamentally different access patterns, write characteristics, and data volumes involved in industrial time-series data.

Relational database

General-purpose data store

Optimized for structured records with relationships between them
Write performance degrades at very high insert rates
No built-in time-series compression: every scan-cycle value occupies its own row
A 10-second scan rate across 2,000 tags generates over 6 billion rows per year
Retrieval of historical ranges requires table scans or complex indexing
No built-in aggregation for time-weighted averages, min/max over intervals
Well-suited for event and batch data: alarms, batch reports, production orders

Process historian

Industrial time-series database

Optimized for continuous streams of timestamped values per tag
Write throughput sustained at tens of thousands of values per second
Two-stage compression reduces storage by 80 to 95% without losing process fidelity
Retrieval of a year of data for one tag takes milliseconds
Built-in aggregation: time-weighted averages, min, max, total, count over any interval
Built-in access interfaces: OPC HDA, OPC UA Historical Access, SQL, REST API
Designed for compliance-grade audit trails with immutable archive storage

How a historian works

A historian system has four primary functional layers. Understanding each helps engineers diagnose problems with data collection, storage, or retrieval, and evaluate historian products against each other.

Collection

Data acquisition from control systems

The historian connects to data sources (OPC servers, SCADA systems, PLCs, DCS, other historians) through communication drivers or standard interfaces. It polls these sources at the configured scan rate for each tag, or receives pushed values from OPC subscriptions. Every incoming value is timestamped with millisecond resolution at the point of collection. The collection layer is where quality codes (Good, Bad, Uncertain) are also captured and stored alongside values.

Compression

Exception and swinging-door compression

Before writing to the archive, the historian applies two-stage compression to reduce storage requirements by 80 to 95% while preserving the shape of the process signal. Not every scan-cycle value is stored; only values that represent meaningful change are archived. The compression algorithm is configured per tag and is the primary factor determining how faithfully the archive represents the original process signal.

How historian compression works

Compression is what makes historians practical for long-term storage of high-frequency process data. Without it, a 2,000-tag system collecting at 10-second intervals generates over 6 billion values per year, which would require impractical storage volumes and slow retrieval. With well-configured two-stage compression, the same system might store 200 to 600 million values per year while preserving the process signal faithfully enough for all practical engineering uses.

Compression is not lossless in the engineering sense: Values between archive entries are reconstructed by interpolation when retrieved, not from stored actuals. For most process engineering uses (trending, reporting, aggregate calculations) this is entirely acceptable. For applications requiring every raw scan-cycle value, such as regulatory records with millisecond-level accuracy requirements, compression deadbands must be set very tight or raw storage mode must be used for specific tags.

Effect of two-stage compression: raw scan values vs archived values

Where historians are used and why

📈

Process trend analysis and troubleshooting

Engineers retrieve historical trends of related process variables to understand what happened before and during a process upset, equipment failure, or quality deviation. A historian allows scrolling back hours, days, or weeks to find the precursor conditions that led to the problem.

📋

Regulatory compliance and audit trails

Pharmaceutical, food and beverage, and chemical industries require documented evidence that process parameters stayed within specification during production. The historian provides a timestamped, immutable record of every process variable for every batch or production run.

📊

Operational reporting and KPIs

Shift reports, daily production summaries, OEE calculations, and energy consumption reports all depend on historian data. Reporting tools connect to the historian via OPC HDA or REST API and retrieve aggregate values over the reporting period.

🔧

Predictive maintenance and asset health

Vibration signatures, temperature trends, and energy consumption patterns from rotating equipment change gradually in ways that indicate developing faults. Machine learning models trained on historian data can detect these patterns weeks or months before a failure becomes detectable by operators.

⚡

Energy management

Utility billing verification, energy intensity tracking, and emissions monitoring all require long-term time-series data from energy meters, flow computers, and environmental sensors. Historians provide the foundation for energy management systems.

🧠

AI and advanced analytics foundation

AI models trained on process data require years of historical context: normal operating envelopes, failure precursors, seasonal variations, and process interactions. The historian is the data source that makes this possible.

Deployment architectures: edge, cloud, and hybrid

Historian deployments have evolved from exclusively on-premises installations to hybrid and cloud architectures as connectivity and storage costs have changed. Each deployment model has its appropriate use cases.

On-premises / edge historian

Installed on a server in or near the control room
Lowest latency: data collection loop is local
No network dependency for collection or real-time retrieval
Works in air-gapped or connectivity-limited environments
Long-term archive can grow to terabytes over years
Traditional deployment model for process industries
Examples: AVEVA PI, AVEVA Historian, GE Proficy Historian

Cloud historian / time-series database

Historian infrastructure hosted in cloud environment
Elastic storage: no on-premises hardware to manage
Accessible from anywhere without VPN or tunnel
Requires reliable connectivity from plant to cloud
Integrates directly with cloud analytics platforms
InfluxDB Cloud, AWS Timestream, Azure Data Explorer
Growing adoption for greenfield IIoT deployments

Hybrid: edge collection, cloud archive

Edge historian collects at full resolution locally
Historical data forwarded to cloud for long-term storage and analytics
Store-and-forward handles connectivity interruptions
Local system continues collection if cloud connection is lost
Combines low-latency local collection with cloud-scale analytics
Most common architecture for mature plants modernizing toward cloud
N3uron plus InfluxDB, TOP Server plus DataHub plus cloud endpoint

Major historian platforms

OPC HDA and OPC UA Historical Access are supported by all major process historian products. TOP Server, Cogent DataHub, and OPC Router from Software Toolbox connect to all of these platforms through standard OPC interfaces, making it possible to bridge historian data to modern analytics environments without replacing the historian itself.

AVEVA PI System

AVEVA (previously OSIsoft)

AVEVA Historian

AVEVA (previously Wonderware)

GE Proficy Historian

GE Digital / GE Vernova

Honeywell Uniformance PHD

Honeywell

Aspen InfoPlus.21

AspenTech

Yokogawa Exaquantum

Yokogawa

Canary Labs Historian

Canary Labs

InfluxDB

InfluxData (cloud-native)

ABB 800xA Historian

ABB

TOP Server historian integration: TOP Server provides seamless integration with all major historian platforms through OPC UA and OPC DA protocols. TOP Server also includes a Local Historian plug-in for edge-level data collection and OPC HDA access, providing a historian function for systems where a full enterprise historian is not deployed. Cogent DataHub handles store-and-forward to cloud historian endpoints, and OPC Router manages complex historian routing workflows.

From historian to data lakehouse: the modern architecture

The traditional historian architecture is a closed loop: data flows from the plant to the historian, and applications query the historian directly. This works well for operations teams using the historian's built-in trending and reporting tools, but it creates a data silo when the goal is enterprise-wide analytics, AI model training, or integration with IT systems that speak SQL or cloud-native APIs rather than OPC HDA.

The modern answer is the data lakehouse: a cloud-based storage and compute architecture that ingests historian data alongside data from other enterprise systems (MES, ERP, quality systems) and makes it available through open SQL and analytics interfaces. The historian remains the operational system of record for real-time and near-term historical data; the lakehouse provides the long-term analytics and AI training layer.

Software Toolbox's Historian and Lakehouse Connectivity service connects existing historian platforms to cloud data lakehouses (Azure Data Lake, AWS S3, Databricks, Snowflake) using tools that handle protocol translation, data normalization, and continuous forwarding pipeline. The historian does not need to be replaced; the lakehouse layer is added on top of it.

chatUNS.ai and historian data: When operational data from the historian flows into a Unified Namespace or is accessible through a structured lakehouse, AI agents like chatUNS.ai can query it using natural language. An engineer asking "What was the average temperature on Reactor 3 during last night's shift, and how did it compare to the previous two weeks?" gets a grounded answer from real historian data, without needing a custom report.

Frequently asked questions

Does the historian store every value the PLC generates?+−

Not by default. The historian applies exception and swinging-door compression before archiving, which means only values that represent meaningful change are stored. Values that fall within the configured deadbands are discarded. The archived data is sufficient to reconstruct the process signal by interpolation, but intermediate values within deadband tolerance are not individually stored.

If you need every raw scan-cycle value, because of a specific compliance requirement or for high-frequency signal analysis, compression deadbands can be set very tight (near zero) or raw storage mode can be configured for specific tags. This significantly increases storage requirements and should be applied selectively to tags where raw fidelity is genuinely required.

What is the difference between exception deadband and compression deadband?+−

Exception deadband is the first filter: a simple threshold that discards incoming values if they have not changed by more than the configured amount from the previous value. Values that pass the exception filter become the working set for real-time displays and further compression processing.

Compression deadband (swinging-door) is applied to values that have already passed the exception filter. It evaluates whether the new value can be linearly predicted from the trajectory of previous values within a tolerance band. If yes, the value is not archived yet; the algorithm continues tracking the slope. When a new value falls outside the predictable slope corridor, the previous value is committed to the archive and the algorithm resets.

Exception deadband reduces noise and repeated values. Compression deadband reduces redundancy in slowly-changing signals. Together they achieve compression ratios of 80 to 95% for typical process data without meaningful loss of signal fidelity.

Can I connect a reporting tool to any historian using OPC HDA?+−

If both the reporting tool and the historian implement OPC HDA, yes, without custom integration code. This is precisely the purpose OPC HDA was designed for: a standard interface that allows any compliant client to retrieve historical data from any compliant server, regardless of the historian product.

In practice, most major historian platforms expose an OPC HDA server interface, and most industrial reporting tools include an OPC HDA client. TOP Server also implements OPC HDA on the same installation as OPC DA and OPC UA, so it can serve as a bridge for devices and systems that have data TOP Server collects but which is not already in a historian.

What is an enterprise historian vs an operational historian?+−

An operational historian is deployed close to the process: in or near the control room, connected directly to the control system. Its primary users are operations engineers and operators who need process trend data and real-time context. Operational historians prioritize write performance, compression efficiency, and fast retrieval for individual tags and time ranges.

An enterprise historian aggregates data from multiple operational historians across sites, plants, or facilities. Its primary users are engineers and analysts at the corporate level who need to compare performance across sites, generate enterprise-wide reports, or provide data to analytics systems. Enterprise historians often add contextualization and data federation capabilities on top of the aggregated operational data.

How does a historian differ from a time-series database like InfluxDB?+−

Traditional process historians and modern cloud-native time-series databases like InfluxDB both store timestamped data efficiently. The key differences are in design intent and optimization.

Process historians were designed for industrial process data with built-in features like swinging-door compression, OPC HDA interfaces, exception-based deadbanding, and immutable archives for compliance. They integrate tightly with SCADA and DCS systems and have decades of proven reliability in regulated environments.

Cloud-native time-series databases like InfluxDB prioritize horizontal scalability, open APIs, integration with cloud analytics platforms, and modern query languages. They lack some historian-specific features but connect naturally to Grafana, Kafka, machine learning pipelines, and other modern analytics infrastructure.

Many organizations deploy both: the traditional historian for operational data collection and compliance, and a cloud time-series database as the analytics and AI layer, with a pipeline forwarding historian data to the cloud database for long-term analysis.

What protocols does Software Toolbox use to connect to historians?+−

Software Toolbox connects to historian platforms primarily through OPC UA and OPC DA interfaces, which all major historians expose. TOP Server supports OPC DA, OPC UA, and OPC HDA on the same installation, allowing it to both collect data from devices (as an OPC server) and deliver that data to historians (as an OPC client) in the same configuration.

Cogent DataHub bridges OPC data to historian endpoints that accept push-based data via its External Historian configuration, including AVEVA PI, AVEVA Historian, GE Proficy, and others. OPC Router handles more complex historian integration workflows with visual routing rules, transformation logic, and multi-destination forwarding. N3uron connects to cloud time-series databases including InfluxDB using native MQTT or REST API connectivity.

What is a data historian?