Home Wiki Networks & Communications Industrial Network Troubleshooting
Networks & Communications

Industrial Network Troubleshooting

Why Do Industrial Networks Fail?

At a steel rolling mill, the entire line stopped for 45 minutes. The cause was not mechanical or electrical — an Ethernet cable was pinched under a metal cable tray, and electromagnetic interference caused packet loss between the main PLC and the HMI station. The downtime cost thousands of dollars.

Industrial network troubleshooting is a critical skill for every automation engineer. Industrial networks face harsh conditions (dust, heat, vibration, electromagnetic interference), downtime translates directly to production losses, and equipment often runs on diverse legacy protocols.

Systematic Troubleshooting Methodology

Before picking up any tool, you need a methodology. Random troubleshooting wastes time and can make problems worse. Follow these seven steps:

1. Define the problem precisely:

  • What exactly is not working? (One device? A group? The entire network?)
  • When did it start? (Suddenly or gradually?)
  • Did anything change recently? (Software update, new device, electrical maintenance)

2. Gather information:

  • Review event logs in PLC and SCADA systems
  • Check LED indicators on switches and network cards

3. Develop a hypothesis:

  • Based on symptoms, what is the most likely cause?

4. Test the hypothesis:

  • Use appropriate diagnostic tools

5. Resolve the problem:

  • Apply the fix

6. Verify the solution:

  • Confirm the problem is actually resolved and no new issues appeared

7. Document everything:

  • Record the problem, root cause, and solution for future reference

Essential Diagnostic Tools

Ping: The First Pulse

The simplest tool and your first step. The ping command sends an ICMP Echo Request and waits for a reply:

ping 192.168.1.10

What Ping tells you:

  • Successful reply: Network connectivity exists at the IP level
  • Request timed out: No connectivity — problem is in the network or device
  • High response time (>10 ms on a local network): Possible congestion or cable issue
  • Intermittent packet loss: Electromagnetic interference or partially damaged cable

Traceroute: Following the Path

When Ping fails, you need to know where connectivity breaks:

tracert 192.168.2.50     (Windows)
traceroute 192.168.2.50  (Linux)

Traceroute shows every hop (router/switch) in the path and the response time for each. If the trace stops at a specific hop, the problem lies between that hop and the next one.

ARP Table: Who Owns This Address?

The ARP table maps IP addresses to physical MAC addresses:

arp -a

Common problems revealed by ARP:

  • Duplicate IP address (two devices with the same IP) — the MAC address will keep changing
  • A device missing from the ARP table means it is not physically connected

Wireshark: The Advanced Analyzer

Wireshark is the most powerful free network packet analyzer. It captures every packet passing through a network interface and displays it in full detail — from the physical layer to the application layer.

When to Use Wireshark

  • When Ping and Traceroute are not sufficient
  • To diagnose protocol-specific issues (Modbus TCP, EtherNet/IP, PROFINET)
  • To detect abnormal traffic (attacks, broadcast storms)
  • To measure response times precisely

Essential Wireshark Filters for Industrial Networks

Filter Purpose
ip.addr == 192.168.1.10 Show traffic for a specific device
modbus Show Modbus TCP packets only
enip Show EtherNet/IP packets
pn_io Show PROFINET IO packets
tcp.analysis.retransmission Retransmissions (trouble indicator)
frame.time_delta > 0.1 Packets with delay over 100 ms

Practical Example: Diagnosing Slow Modbus TCP

If PLC readings from a Modbus TCP device are delayed:

  1. Open Wireshark on the network between PLC and device
  2. Apply the filter modbus && ip.addr == [device address]
  3. Look for: TCP retransmissions, request-to-response time exceeding expectations, or Modbus Exception error messages

Common Failures and Solutions

Cable Problems

Cables are the number one cause of failures in industrial networks:

Problem Symptoms Solution
Complete break No connectivity at all Replace cable, test with Cable Tester
Partial break Intermittent connection, packet loss Inspect connectors, test with TDR
Wrong wiring (Crossover) No Link indication Verify color order (T568A/B)
Cable too long CRC errors, slowness Do not exceed 100 m for Cat5e/6 Ethernet

Electromagnetic Interference (EMI)

Industrial environments are full of interference sources: large motors, variable frequency drives (VFDs), welders, and high-voltage power lines.

EMI symptoms on the network:

  • Intermittent packet loss that increases when a specific machine starts
  • High CRC error rates
  • Auto-negotiated speed drops (e.g., from 100 Mbps to 10 Mbps)

Solutions:

  • Use shielded cables (STP/FTP) instead of UTP
  • Separate data cable paths from power cables by at least 30 cm
  • Use fiber optics in high-interference areas
  • Ensure proper grounding of cable shields

Configuration Errors

Error Symptom Detection
Duplicate IP address Intermittent connection for both devices Check ARP Table
Wrong subnet mask Device sees some devices but not others Compare Subnet Masks
Wrong default gateway Local connectivity works, no external access Check Default Gateway
Wrong VLAN Device is completely isolated Check switch configuration
Speed/Duplex mismatch Slow connection, many errors Check port settings

Broadcast Storms

A broadcast storm occurs when broadcast packets loop endlessly between switches, saturating the entire network. The most common cause: a cable accidentally connected between two ports on the same switch or between two switches without STP (Spanning Tree Protocol).

Symptoms: all devices on the network are affected simultaneously, LED indicators on switches flash rapidly, CPU usage on devices jumps to 100%.

Immediate solution: disconnect the suspected cable. Permanent solution: enable STP or RSTP on all managed switches.

Specialized Protocol Analyzers

For non-Ethernet industrial networks (such as PROFIBUS DP or DeviceNet), specialized protocol analyzers are required:

Network Analyzer What It Reveals
PROFIBUS DP ProfiTrace Cycle times, signal quality, telegram errors
DeviceNet DeviceNet Analyzer Node status, CAN errors
Modbus RTU Serial analyzer Response times, CRC errors
EtherNet/IP Wireshark + ODVA plugin CIP connections, RPI times

Building a Field Toolkit

Every industrial network engineer needs these tools in their kit:

  • Cable Tester: to verify Ethernet wiring (pair and order checks)
  • Laptop with Wireshark: for packet analysis
  • USB Network Adapter: backup if the built-in NIC does not support capture mode
  • Small Managed Switch: for port mirroring and traffic analysis
  • TDR (Time Domain Reflectometer): to locate cable breaks by distance in meters
  • Labels and markers: cable labeling is the best prevention against future failures

Lessons from the Field

After years of dealing with industrial network failures, these are the key takeaways:

  1. Start at the physical layer: 80% of industrial network problems are caused by cables or connectors
  2. Change one thing at a time: if you change multiple things simultaneously, you will not know which one fixed the problem
  3. Keep the network diagram updated: an accurate network map saves hours of searching
  4. Back up switch configurations: before any change, take a backup
  5. Monitor proactively: a monitoring tool like PRTG or Zabbix catches problems before they cause outages

Summary

Industrial network troubleshooting begins with a systematic methodology and ends with the right tools. From simple Ping to advanced Wireshark analysis, each tool has its place. The most important principles: do not rush, always start at the physical layer, and document everything. An engineer who knows their network well solves problems in minutes rather than hours.

troubleshooting Wireshark protocol-analyzer ping packet-capture diagnosis استكشاف الأعطال تحليل البروتوكول التقاط الحزم التشخيص الاتصال أدوات الشبكة