Debugging Techniques for Industrial Systems

When Code Fails: The Art of Bug Hunting

In industrial programming, a software bug is not just an annoyance -- it can mean an entire production line going down, damaged products, or even safety hazards for workers. This is why debugging skills are among the most critical abilities an automation engineer needs.

The term "Bug" has a famous origin story -- an actual moth was found stuck in a relay of the Harvard Mark II computer in 1947, causing a malfunction. Since then, "debugging" has been an official term in software engineering.

Types of Errors in Industrial Control Code

Syntax Errors

The easiest to find -- the compiler catches them immediately:

# Error: missing colon
if temperature > 100
    activate_cooling()

# Correct:
if temperature > 100:
    activate_cooling()

Runtime Errors

These appear while the program is running:

def calculate_flow_rate(volume, time_seconds):
    return volume / time_seconds  # What if time_seconds = 0?

# Safe solution:
def calculate_flow_rate(volume, time_seconds):
    if time_seconds <= 0:
        logging.warning("Invalid time value: %s", time_seconds)
        return 0.0
    return volume / time_seconds

Logic Errors

The most dangerous and hardest to find -- the code runs without error messages, but produces wrong results:

# Logic error: using OR instead of AND
if pressure > 5 or temperature > 80:
    emergency_shutdown()  # Triggers even if only one condition is met!

# Intended: shut down only when BOTH conditions are true
if pressure > 5 and temperature > 80:
    emergency_shutdown()

Timing Errors

Specific to industrial systems where everything depends on timing:

# Error: reading sensor before it stabilizes
def read_pressure():
    sensor.power_on()
    value = sensor.read()  # Immediate read -- sensor hasn't stabilized!
    return value

# Correct: wait for settling time
def read_pressure():
    sensor.power_on()
    time.sleep(0.5)  # Wait 500ms for sensor to stabilize
    value = sensor.read()
    return value

Logging Strategies

Logging is your eyes inside the program while it runs. In industrial environments, you cannot stand next to the machine adding print statements -- you need a structured logging system.

Log Levels

import logging

logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s [%(levelname)s] %(name)s: %(message)s',
    handlers=[
        logging.FileHandler('/var/log/plc_control.log'),
        logging.StreamHandler()
    ]
)

logger = logging.getLogger("MotorControl")

def start_motor(motor_id):
    logger.info("Starting motor %s", motor_id)

    current = read_motor_current(motor_id)
    logger.debug("Initial current: %.2f A", current)

    if current > 15.0:
        logger.warning("High start current: %.2f A for motor %s", current, motor_id)

    if current > 25.0:
        logger.error("Dangerous current! Stopping motor %s", motor_id)
        stop_motor(motor_id)
        return False

    logger.info("Motor %s running successfully", motor_id)
    return True

Golden Rules of Logging

Level	When to Use	Example
`DEBUG`	Internal details for developers only	Variable values each cycle
`INFO`	Important normal events	Motor start/stop, mode change
`WARNING`	Unexpected but program continues	Temperature approaching limit
`ERROR`	A specific operation failed	Sensor read failure
`CRITICAL`	Catastrophic failure needing immediate action	PLC connection lost

Structured Logging

For complex systems, plain text logs are not enough -- use structured logging:

import json
from datetime import datetime

def structured_log(level, event, **data):
    entry = {
        "timestamp": datetime.utcnow().isoformat(),
        "level": level,
        "event": event,
        **data
    }
    print(json.dumps(entry))

# Usage:
structured_log("INFO", "motor_started",
    motor_id="M-101",
    current_amps=8.5,
    voltage=380,
    line="filling_line_1"
)

Output:

{"timestamp": "2026-04-05T10:30:00", "level": "INFO", "event": "motor_started", "motor_id": "M-101", "current_amps": 8.5, "voltage": 380, "line": "filling_line_1"}

This format is easily parsed by tools like Elasticsearch or Grafana.

Breakpoints and Step-by-Step Tracing

Using Breakpoints

A breakpoint freezes the program at a specific line and lets you inspect all variables:

# In VS Code or PyCharm: place a breakpoint at the suspicious line

def control_loop(setpoint, sensor_value):
    error = setpoint - sensor_value
    # <-- Place breakpoint here
    kp = 2.5
    ki = 0.1
    integral = 0

    integral += error * dt
    output = kp * error + ki * integral

    if output > 100:
        output = 100  # Output saturation
    return output

Conditional Breakpoints

Instead of stopping every cycle, stop only when a condition is met:

# In the debugger:
# Condition: error > 10 and cycle_count > 1000
# This stops only when the error is large after some runtime

Tracing Without Stopping

When you cannot stop the program (because the machine is running!), use tracing:

import functools

def trace_call(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        logger.debug("Calling %s(%s, %s)", func.__name__, args, kwargs)
        result = func(*args, **kwargs)
        logger.debug("Result of %s = %s", func.__name__, result)
        return result
    return wrapper

@trace_call
def calculate_pid(setpoint, current, kp, ki, kd):
    error = setpoint - current
    return kp * error  # Simplified for clarity

Simulation: Testing Without a Real Machine

In industrial automation, you cannot always test on the real machine. Simulation saves you:

class SimulatedSensor:
    """Simulated temperature sensor for testing"""
    def __init__(self, initial_temp=25.0):
        self._temp = initial_temp
        self._noise_level = 0.5

    def read(self):
        import random
        noise = random.uniform(-self._noise_level, self._noise_level)
        return self._temp + noise

    def simulate_heating(self, rate_per_second, duration):
        self._temp += rate_per_second * duration

class SimulatedMotor:
    """Simulated motor with realistic ramp delay"""
    def __init__(self, rated_rpm=1450):
        self.target_rpm = 0
        self.actual_rpm = 0
        self._rated_rpm = rated_rpm
        self._ramp_rate = 200  # RPM per second

    def set_speed(self, rpm):
        self.target_rpm = min(rpm, self._rated_rpm)

    def update(self, dt):
        """Call each simulation cycle"""
        if self.actual_rpm < self.target_rpm:
            self.actual_rpm = min(
                self.target_rpm,
                self.actual_rpm + self._ramp_rate * dt
            )
        elif self.actual_rpm > self.target_rpm:
            self.actual_rpm = max(
                self.target_rpm,
                self.actual_rpm - self._ramp_rate * dt
            )

# Test control algorithm without a real machine
sensor = SimulatedSensor(initial_temp=20.0)
for cycle in range(100):
    temp = sensor.read()
    if temp < 50:
        sensor.simulate_heating(0.5, 0.1)
    print(f"Cycle {cycle}: temp = {temp:.1f}")

Remote Debugging

Industrial systems often run on remote devices. Here is how to debug remotely:

Using SSH and debugpy (Python)

# On the target device (industrial PC):
pip install debugpy
python -m debugpy --listen 0.0.0.0:5678 --wait-for-client control_main.py

# On your workstation (VS Code), add to launch.json:

{
    "name": "Remote Debug",
    "type": "python",
    "request": "attach",
    "connect": {
        "host": "192.168.1.100",
        "port": 5678
    },
    "pathMappings": [
        {
            "localRoot": "${workspaceFolder}/src",
            "remoteRoot": "/opt/control/src"
        }
    ]
}

Centralized Network Logging

import logging
from logging.handlers import SocketHandler

# Send logs to a central server
handler = SocketHandler('192.168.1.50', 9020)
logger.addHandler(handler)

# Now all logs from all devices reach one place
logger.info("Filling line 3 motor -- high current detected")

Common Pitfalls in Control Programming

1. Race Conditions

# Bug: two threads reading and writing the same variable
shared_counter = 0

def sensor_thread():
    global shared_counter
    shared_counter += 1  # Not an atomic operation!

# Fix: use a lock
import threading
lock = threading.Lock()

def sensor_thread_safe():
    global shared_counter
    with lock:
        shared_counter += 1

2. Memory Leaks

# Bug: storing readings forever
readings = []
while True:
    readings.append(sensor.read())  # Memory fills up!

# Fix: use a circular buffer
from collections import deque
readings = deque(maxlen=1000)  # Keep only the last 1000 readings
while True:
    readings.append(sensor.read())

3. Not Handling Connection Loss

# Bug: assuming connection is always available
value = plc.read_register(100)

# Fix: error handling with retry
import time

def safe_read(plc, register, retries=3):
    for attempt in range(retries):
        try:
            return plc.read_register(register)
        except ConnectionError:
            logger.warning("Connection failed (attempt %d/%d)", attempt+1, retries)
            time.sleep(1)
            plc.reconnect()
    logger.error("Final failure reading register %d", register)
    return None

4. Floating-Point Comparison

# Bug: direct comparison of floating-point numbers
if temperature == 37.5:  # May never be true!
    activate_alarm()

# Fix: use a tolerance margin
EPSILON = 0.01
if abs(temperature - 37.5) < EPSILON:
    activate_alarm()

A Systematic Debugging Methodology

Reproduce the bug -- if you cannot repeat it, you cannot fix it
Isolate the problem -- disable components one by one until you find the cause
Read the logs -- the answer is often already in the log files
Check your assumptions -- is the sensor actually working? Is the connection alive?
Change only one thing at a time -- never modify multiple things then test
Document the fix -- write down what the bug was and how you fixed it to prevent recurrence

Summary

Debugging in industrial programming requires:

A structured logging system with clear levels
Smart breakpoints (especially conditional ones)
Simulation for safe testing without risking the real machine
Remote debugging for devices in the field
Awareness of common pitfalls like race conditions and memory leaks

Remember: a good programmer does not write bug-free code -- a good programmer writes code where bugs are easy to find and fix.