Database Fundamentals for Industrial Systems

Why Does a Factory Need a Database?

Imagine a factory with 200 sensors, each recording a reading every second. That is 200 x 60 x 60 x 24 = over 17 million readings per day. Where do you store them? How do you search through them? How do you find out that motor number 7 overheated last Thursday at 3 AM?

CSV files will collapse under that volume. You need a database -- a system purpose-built for storing and retrieving data quickly and safely.

The Three Types of Databases

Relational Databases (SQL)

Data is organized in tables linked by relationships -- like Excel spreadsheets but far more powerful. Each table has predefined columns (a schema).

Examples: PostgreSQL, MySQL, SQLite, Microsoft SQL Server

-- Create machines table
CREATE TABLE machines (
    id          INTEGER PRIMARY KEY,
    name        TEXT NOT NULL,
    type        TEXT NOT NULL,
    location    TEXT,
    max_temp    REAL DEFAULT 85.0,
    installed   DATE
);

-- Create sensor readings table
CREATE TABLE sensor_readings (
    id          INTEGER PRIMARY KEY,
    machine_id  INTEGER REFERENCES machines(id),
    timestamp   TIMESTAMP NOT NULL,
    temperature REAL,
    vibration   REAL,
    pressure    REAL
);

-- Insert data
INSERT INTO machines (name, type, location, max_temp)
VALUES ('CNC-01', 'Lathe', 'Hall 3', 90.0);

-- Query: machines that exceeded 85°C today
SELECT m.name, s.temperature, s.timestamp
FROM sensor_readings s
JOIN machines m ON s.machine_id = m.id
WHERE s.temperature > 85.0
  AND s.timestamp > CURRENT_DATE
ORDER BY s.temperature DESC;

Strengths:

Clear, enforced structure (schema)
Strong relationships between tables (foreign keys)
Standardized SQL language
Data integrity guarantees (ACID transactions)

Non-Relational Databases (NoSQL)

These handle flexible data that does not follow a fixed structure -- such as event logs where each event may carry different fields.

Subtypes:

Type	Description	Example
Document	Flexible JSON documents	MongoDB, CouchDB
Key-Value	Simple key and value pairs	Redis, DynamoDB
Graph	Nodes and relationships	Neo4j, SurrealDB
Wide-Column	Dynamic columns	Cassandra, ScyllaDB

// Example: MongoDB document for an industrial event
{
    "_id": "evt_20240315_001",
    "machine_id": "CNC-01",
    "timestamp": "2024-03-15T14:30:05Z",
    "event_type": "alarm",
    "severity": "high",
    "data": {
        "temperature": 92.3,
        "threshold": 85.0,
        "action_taken": "speed_reduced",
        "operator": "Ahmed"
    },
    "tags": ["temperature", "exceeded", "cnc_line"]
}

Strengths:

Flexible structure -- no predefined schema required
Excellent handling of nested data
Scales horizontally with ease
High write throughput

Time-Series Databases

Designed specifically for data that changes over time -- exactly what sensors produce. Optimized for writing massive volumes of timestamped data.

Examples: InfluxDB, TimescaleDB, QuestDB

-- InfluxDB example: write a sensor reading
INSERT sensor_data,machine=CNC-01,location=hall3
    temperature=78.5,vibration=2.3,pressure=145.0
    1710500000000000000

-- Query: average temperature every 5 minutes for the last hour
SELECT MEAN(temperature)
FROM sensor_data
WHERE machine = 'CNC-01'
  AND time > now() - 1h
GROUP BY time(5m)

Strengths:

Extreme write speed (millions of points per second)
Excellent data compression (90%+ space savings)
Built-in time-based aggregations (hourly, daily, monthly)
Automatic data expiration (retention policies)

Comprehensive Comparison

Criterion	SQL	NoSQL	Time-Series
Structure	Fixed schema	Flexible	Fixed with timestamp
Best for	Structured, relational data	Diverse, flexible data	Sensor and measurement data
Write speed	Good	Excellent	Extreme
Queries	Full SQL	Varies by type	Time-based queries
Industrial example	Machine and order management	Event logs	Sensor readings
Scaling	Vertical	Horizontal	Horizontal

Indexes: Speeding Up Searches

Without an index, a database scans every row to find what you are looking for. An index works like a book index -- it points you directly to the right location.

-- Without index: scans all readings (very slow with millions of rows)
SELECT * FROM sensor_readings
WHERE machine_id = 5 AND timestamp > '2024-03-15';

-- Create a composite index
CREATE INDEX idx_machine_time
ON sensor_readings (machine_id, timestamp);

-- The same query is now hundreds of times faster

When to create an index:

Columns you filter on frequently (WHERE)
Columns you sort by (ORDER BY)
Columns used in joins (JOIN)

Warning: each index slightly slows down write operations -- do not create too many.

SurrealDB: A Next-Generation Database

SurrealDB is a modern database that combines SQL, NoSQL, and graph capabilities in a single system. It supports tables, documents, graph relations, and real-time queries.

-- SurrealQL: define a table with constraints
DEFINE TABLE machine SCHEMAFULL;
DEFINE FIELD name ON machine TYPE string;
DEFINE FIELD machine_type ON machine TYPE string;
DEFINE FIELD location ON machine TYPE string;
DEFINE FIELD max_temp ON machine TYPE float DEFAULT 85.0;
DEFINE FIELD installed ON machine TYPE datetime;

-- Create a record
CREATE machine SET
    name = 'CNC-01',
    machine_type = 'Lathe',
    location = 'Hall 3',
    max_temp = 90.0,
    installed = '2023-06-15T00:00:00Z';

-- Graph relations: machine is located in a department
RELATE machine:cnc01 -> located_in -> department:machining;

-- Query with relations
SELECT name, ->located_in->department.name AS dept
FROM machine
WHERE max_temp > 85.0;

SurrealDB advantages for industry:

Multi-model: relational + document + graph in one system
Real-time: live subscriptions to data changes (Live Queries)
Permissions: fine-grained access control at the field level
Embedded + Server: runs embedded in your application or as a standalone server

Designing a Factory Database: Practical Example

-- Machines
CREATE TABLE machines (
    id SERIAL PRIMARY KEY,
    name VARCHAR(50) UNIQUE,
    type VARCHAR(30),
    department VARCHAR(30),
    max_temp REAL,
    max_vibration REAL,
    status VARCHAR(20) DEFAULT 'active'
);

-- Sensor readings (partitioned by month for performance)
CREATE TABLE readings_2024_03 (
    machine_id INTEGER REFERENCES machines(id),
    ts TIMESTAMP NOT NULL,
    temp REAL,
    vibration REAL,
    pressure REAL
);
CREATE INDEX idx_readings_machine_ts
ON readings_2024_03 (machine_id, ts);

-- Alarm log
CREATE TABLE alarms (
    id SERIAL PRIMARY KEY,
    machine_id INTEGER REFERENCES machines(id),
    triggered_at TIMESTAMP DEFAULT NOW(),
    severity VARCHAR(10),  -- low, medium, high, critical
    parameter VARCHAR(20), -- temperature, vibration, pressure
    value REAL,
    threshold REAL,
    acknowledged BOOLEAN DEFAULT FALSE,
    acknowledged_by VARCHAR(50)
);

-- Query: weekly alarm summary per machine
SELECT
    m.name,
    COUNT(*) AS total_alarms,
    COUNT(*) FILTER (WHERE a.severity = 'critical') AS critical,
    MAX(a.value) AS worst_reading
FROM alarms a
JOIN machines m ON a.machine_id = m.id
WHERE a.triggered_at > NOW() - INTERVAL '7 days'
GROUP BY m.name
ORDER BY critical DESC, total_alarms DESC;

How to Choose the Right Database

Question	Answer	Choice
Is your data structured and relational?	Yes	SQL (PostgreSQL)
Is your data flexible and varied?	Yes	NoSQL (MongoDB)
Is your data sensor readings with timestamps?	Yes	Time-Series (InfluxDB)
Do you need everything in one system?	Yes	SurrealDB
Do you need ultra-fast temporary caching?	Yes	Redis (Key-Value)

In practice, most smart factories use more than one type -- a relational database for managing machines and orders, and a time-series database for sensor data. The key is to choose based on the nature of your data, not on what is popular.