The Growing Importance of JATS XML Beyond Journal Publishing

For years, JATS XML was treated as a journals-only concern, something publishers handled to satisfy PubMed Central or Crossref, then moved on from. That thinking is now outdated, and it’s something we’ve watched shift firsthand across the projects we work on.
Across STM, academic, and even reference publishing, the same structured XML originally built for journal articles is becoming the foundation for three things every publisher now cares about: content reuse, AI readiness, and searchability.
Content reuse is the most immediate driver. A well-tagged JATS file isn’t locked into one output. The same structured content can generate a print PDF, an EPUB, an HTML version for a website, and a dataset for a learning platform, without re-keying or re-formatting each time. We’ve seen this play out with publishers managing multi-format portfolios (journals, books, course materials, reference works), where structure once, output everywhere went from a nice-to-have to the deciding factor in how fast a title could move across formats.
AI readiness is the less obvious driver, but arguably the more urgent one. Large language models and AI-assisted research tools work far better with structured, semantically tagged content than with PDFs or loosely formatted text. When publishers come to us exploring AI-powered search or summarization for their content, the first thing we end up looking at isn’t the AI tool itself, it’s the underlying XML. The quality of that XML directly determines how usable the content is for AI applications, and content that was tagged inconsistently, or never tagged at all, becomes the bottleneck before the AI project even starts.
Searchability ties both of these together. Structured XML with proper semantic tagging (section types, reference linking, metadata) is what makes content discoverable, both by traditional search and by the AI-driven discovery tools increasingly used by researchers and readers.
The publishers who are ahead on this didn’t necessarily plan for “AI readiness” specifically. In our experience, they simply maintained consistent, standards-based XML production as a baseline, often for years, and found themselves ready when new use cases emerged. The ones who come to us under pressure are usually the ones holding large backlists of inconsistently tagged or PDF-only content, where bringing everything up to a common standard becomes a much bigger lift than it would have been if it were handled incrementally.
For publishers sitting on that kind of backlist, the question worth asking isn’t “do we need JATS XML for this book or reference work too?” It’s “what will it cost us not to have it, in two years, when reuse, AI, and discovery all depend on it?” It’s a question we’ve helped a number of publishers work through, usually starting with a straightforward audit of what’s already there.
Wordium works with STM, academic, and reference publishers on structured XML conversion, including JATS-based workflows for content beyond traditional journals. Get in touch to discuss your content’s XML readiness.
Comments
Post a Comment