Why manufacturing AI fails

3 data problems most teams ignore

Key Takeaways

Manufacturing AI often fails not because of the AI itself but because it lacks a strong data foundation.
Misalignment around data architecture and low-quality data lead to outcomes that are inaccurate and ineffective.
Manufacturers often underestimate the volume of data they will need to maintain and store, which leads to lost data.

The South Korean shipbuilding industry is using AI to improve planning and scale operations: Digital twins that help engineers predict vessel performance. Predictive maintenance that increases productivity by 30 percent. AI-powered supply chain management solutions that reduce procurement bottlenecking.

Most manufacturers aren’t seeing this level of success—and many are burning millions trying.

The thing is, though, AI usually isn’t the problem. It’s more often the data architecture underneath it.

Think of your data architecture like plumbing. You can install the most expensive pipes available, but if you hook them up to a muddy pond, you still get dirty water at the tap. Even the highest quality pipes can’t fix a bad source.

Here are three foundational data elements that can keep manufacturing AI from reaching its full potential.

1. Data architecture that silos IT from OT

Industrial manufacturing leaders are the masters of operational metrics like overall equipment effectiveness (OEE), downtime, and throughput. These are the data sources that will fuel AI systems and other intelligent products. But data architecture—that is, how these data types move through IT systems to create new operational knowledge—is often a blind spot.

Why? Historically, operational technology has been kept siloed from IT systems for security reasons. Creating data architecture to connect the two was a non-starter.

It’s a legitimate fear—a hacked IT system could lead to damaged machines on the floor. But maintaining fully separate systems limits an organization’s ability to use its data to fuel insights and intelligent products.

Take the Korean shipbuilders as an example: their digital twins help them predict the performance of ships in various conditions so they can design vessels with lower fuel consumption. As the building progresses, real-time data from the factory floor lets them adjust and make decisions on the go.

Without real-time data, the digital twin drifts from reality—and a model that no longer reflects the actual build is worse than no model at all.

Say a material intended for a specific part of the hull was causing an assembly line machine to malfunction. Getting that information to the digital twin is essential; the engineers would need to decide whether to find an alternative material, design, or machine. Not only that, they’d need the updated data (post-adjustment) to feed back to the model so they could continue to accurately predict fuel efficiency and performance in the finished ship.

Data architecture is not only the link between your OT and IT systems; it's also the link between your data and your business outcomes. To revisit the plumbing analogy: data architecture connects the pipes at the source to the pipes at the tap, so water (or data) can flow freely.

2. Poor data quality and missing lineage

Korean shipbuilders are seeing 30 percent productivity gains with predictive maintenance. To get similar results, manufacturers need not only an intelligent product that works for the organization but high-quality data to fuel it.

What does high-quality data look like? It needs to be tagged accurately, timestamped reliably, and formatted consistently. This may sound simple, but it becomes complex when you’re pulling in data from dozens of sources, each with its own labeling practices.

For example, if you’re looking to boost productivity with a predictive maintenance tool, you may need data from IoT sensors and maintenance logs, among other sources.

The sensors stream constant temperature and humidity measurements to one cloud-based repository. Meanwhile, inspection and logs are updated only weekly and go to another repository.

The work of bringing these sources together is not just connecting the “pipes” (aka data architecture), it's also about standardizing their outputs. Are both sources recording dates and times in the same formats? Are they using a shared lexicon? Etc.

A data model standardizes data across the organization, making accurate analysis possible.

This data processing is essential to the work of building intelligent products. In general, it's more efficient to do this work at the source (i.e., by ensuring that uniform fields, formats, and language are used at the point of collection) rather than trying to clean it up after the fact.

A final component that makes analysis easier is data lineage, or understanding the full journey that your data takes from source to decision. Data lineage isn’t just an audit tool.

When a defective batch ships, lineage tells you which supplier, sensor reading, or inspection log missed it. Without lineage, root cause analysis becomes guesswork. Treat lineage as a first-class requirement alongside quality, not an afterthought.

3. Underestimating data volume

A common mistake that manufacturing leaders make is underestimating the amount of data they will need to manage as they build intelligent products.

Think about the Korean shipbuilders again. Their digital twins likely generate tons of vessel performance data, like fuel consumption, speed, propeller efficiency, and shaft RPM. Over time, all that data adds up and can lead to high storage costs.

A data maintenance strategy is crucial to the long-term success of your intelligent products. For example, you might store frequently used IoT sensor data in a high-speed cloud storage system, while archiving historical maintenance logs in lower-cost storage tiers to balance your budget.

Edge computing — processing data at the source, on the factory floor — reduces latency and prevents cloud overload. And starting with one IoT machine at a time lets you stress-test your systems before scaling.

The deeper issue here isn't storage cost, but the strategic planning. Teams that don't model their data growth early end up making reactive, expensive decisions, like deleting data they later wish they had or paying premium rates for emergency capacity. Build a data volume projection into your AI roadmap from day one, the same way you'd project headcount or infrastructure costs.

Your data system is your competitive advantage

Manufacturing AI use cases like predictive maintenance, quality analytics, and supply chain visibility all rely on trustworthy data. If your data is fragmented, inconsistent, or poorly structured, your AI will mislead.

That leads to poor business decisions and lost revenue. And bad recommendations means your teams won’t trust the AI, so they won’t use it. Your AI investment will be wasted.

Quality data is akin to quality components. They’re a key part of your product, but you also need the right architecture (machinery, production lines, and safety protocols on the factory floor) for a successful output.

In other words, the systems powering your AI model are your competitive advantage.

Before you kick off your next AI initiative, ask this one diagnostic question: “If we had to explain exactly where every input for this model comes from and verify its accuracy, could we?” If the answer is no or even “probably,” that’s where to start. The AI can wait. The data foundation can’t.

About the author

Álan Gularte is a Senior Data Engineer at TXI in Chicago, where he transforms complex data into powerful solutions through innovative cloud engineering and machine learning approaches. He specializes in crafting strategic insights that help organizations like yours make data-driven decisions.

Published by Álan Gularte in Process