Industrial Network Troubleshooting

Why Do Industrial Networks Fail?

At a steel rolling mill, the entire line stopped for 45 minutes. The cause was not mechanical or electrical — an Ethernet cable was pinched under a metal cable tray, and electromagnetic interference caused packet loss between the main PLC and the HMI station. The downtime cost thousands of dollars.

Industrial network troubleshooting is a critical skill for every automation engineer. Industrial networks face harsh conditions (dust, heat, vibration, electromagnetic interference), downtime translates directly to production losses, and equipment often runs on diverse legacy protocols.

Systematic Troubleshooting Methodology

Before picking up any tool, you need a methodology. Random troubleshooting wastes time and can make problems worse. Follow these seven steps:

1. Define the problem precisely:

What exactly is not working? (One device? A group? The entire network?)
When did it start? (Suddenly or gradually?)
Did anything change recently? (Software update, new device, electrical maintenance)

2. Gather information:

Review event logs in PLC and SCADA systems
Check LED indicators on switches and network cards

3. Develop a hypothesis:

Based on symptoms, what is the most likely cause?

4. Test the hypothesis:

Use appropriate diagnostic tools

5. Resolve the problem:

Apply the fix

6. Verify the solution:

Confirm the problem is actually resolved and no new issues appeared

7. Document everything:

Record the problem, root cause, and solution for future reference

Essential Diagnostic Tools

Ping: The First Pulse

The simplest tool and your first step. The ping command sends an ICMP Echo Request and waits for a reply:

ping 192.168.1.10

What Ping tells you:

Successful reply: Network connectivity exists at the IP level
Request timed out: No connectivity — problem is in the network or device
High response time (>10 ms on a local network): Possible congestion or cable issue
Intermittent packet loss: Electromagnetic interference or partially damaged cable

Traceroute: Following the Path

When Ping fails, you need to know where connectivity breaks:

tracert 192.168.2.50     (Windows)
traceroute 192.168.2.50  (Linux)

Traceroute shows every hop (router/switch) in the path and the response time for each. If the trace stops at a specific hop, the problem lies between that hop and the next one.

ARP Table: Who Owns This Address?

The ARP table maps IP addresses to physical MAC addresses:

arp -a

Common problems revealed by ARP:

Duplicate IP address (two devices with the same IP) — the MAC address will keep changing
A device missing from the ARP table means it is not physically connected

Wireshark: The Advanced Analyzer

Wireshark is the most powerful free network packet analyzer. It captures every packet passing through a network interface and displays it in full detail — from the physical layer to the application layer.

When to Use Wireshark

When Ping and Traceroute are not sufficient
To diagnose protocol-specific issues (Modbus TCP, EtherNet/IP, PROFINET)
To detect abnormal traffic (attacks, broadcast storms)
To measure response times precisely

Essential Wireshark Filters for Industrial Networks

Filter	Purpose
`ip.addr == 192.168.1.10`	Show traffic for a specific device
`modbus`	Show Modbus TCP packets only
`enip`	Show EtherNet/IP packets
`pn_io`	Show PROFINET IO packets
`tcp.analysis.retransmission`	Retransmissions (trouble indicator)
`frame.time_delta > 0.1`	Packets with delay over 100 ms

Practical Example: Diagnosing Slow Modbus TCP

If PLC readings from a Modbus TCP device are delayed:

Open Wireshark on the network between PLC and device
Apply the filter modbus && ip.addr == [device address]
Look for: TCP retransmissions, request-to-response time exceeding expectations, or Modbus Exception error messages

Common Failures and Solutions

Cable Problems

Cables are the number one cause of failures in industrial networks:

Problem	Symptoms	Solution
Complete break	No connectivity at all	Replace cable, test with Cable Tester
Partial break	Intermittent connection, packet loss	Inspect connectors, test with TDR
Wrong wiring (Crossover)	No Link indication	Verify color order (T568A/B)
Cable too long	CRC errors, slowness	Do not exceed 100 m for Cat5e/6 Ethernet

Electromagnetic Interference (EMI)

Industrial environments are full of interference sources: large motors, variable frequency drives (VFDs), welders, and high-voltage power lines.

EMI symptoms on the network:

Intermittent packet loss that increases when a specific machine starts
High CRC error rates
Auto-negotiated speed drops (e.g., from 100 Mbps to 10 Mbps)

Solutions:

Use shielded cables (STP/FTP) instead of UTP
Separate data cable paths from power cables by at least 30 cm
Use fiber optics in high-interference areas
Ensure proper grounding of cable shields

Configuration Errors

Error	Symptom	Detection
Duplicate IP address	Intermittent connection for both devices	Check ARP Table
Wrong subnet mask	Device sees some devices but not others	Compare Subnet Masks
Wrong default gateway	Local connectivity works, no external access	Check Default Gateway
Wrong VLAN	Device is completely isolated	Check switch configuration
Speed/Duplex mismatch	Slow connection, many errors	Check port settings

Broadcast Storms

A broadcast storm occurs when broadcast packets loop endlessly between switches, saturating the entire network. The most common cause: a cable accidentally connected between two ports on the same switch or between two switches without STP (Spanning Tree Protocol).

Symptoms: all devices on the network are affected simultaneously, LED indicators on switches flash rapidly, CPU usage on devices jumps to 100%.

Immediate solution: disconnect the suspected cable. Permanent solution: enable STP or RSTP on all managed switches.

Specialized Protocol Analyzers

For non-Ethernet industrial networks (such as PROFIBUS DP or DeviceNet), specialized protocol analyzers are required:

Network	Analyzer	What It Reveals
PROFIBUS DP	ProfiTrace	Cycle times, signal quality, telegram errors
DeviceNet	DeviceNet Analyzer	Node status, CAN errors
Modbus RTU	Serial analyzer	Response times, CRC errors
EtherNet/IP	Wireshark + ODVA plugin	CIP connections, RPI times

Building a Field Toolkit

Every industrial network engineer needs these tools in their kit:

Cable Tester: to verify Ethernet wiring (pair and order checks)
Laptop with Wireshark: for packet analysis
USB Network Adapter: backup if the built-in NIC does not support capture mode
Small Managed Switch: for port mirroring and traffic analysis
TDR (Time Domain Reflectometer): to locate cable breaks by distance in meters
Labels and markers: cable labeling is the best prevention against future failures

Lessons from the Field

After years of dealing with industrial network failures, these are the key takeaways:

Start at the physical layer: 80% of industrial network problems are caused by cables or connectors
Change one thing at a time: if you change multiple things simultaneously, you will not know which one fixed the problem
Keep the network diagram updated: an accurate network map saves hours of searching
Back up switch configurations: before any change, take a backup
Monitor proactively: a monitoring tool like PRTG or Zabbix catches problems before they cause outages

Summary

Industrial network troubleshooting begins with a systematic methodology and ends with the right tools. From simple Ping to advanced Wireshark analysis, each tool has its place. The most important principles: do not rush, always start at the physical layer, and document everything. An engineer who knows their network well solves problems in minutes rather than hours.