PodcastsBusinessContent Operations

Content Operations

Scriptorium - The Content Strategy Experts
Content Operations
Latest episode

199 episodes

  • Content Operations

    From ad hoc to autonomous: The AI content ops maturity model

    22/06/2026 | 18 mins.
    There are five levels of maturity for AI-driven content operations. Which level are you in? In this episode, Sarah O’Keefe and Bill Swallow walk through the AI content ops maturity model, from ad hoc experimentation to fully autonomous workflows.

    Sarah O’Keefe: We want this automation, right? We want the ability to go in and extract release notes and do something with them. We have to have a certain level of maturity on the software development process so that we can grab the appropriate information. The same thing is true on the content side. You have to have a certain level of maturity in your content development processes, in your content management, so that you can identify the right things to process and the right things to access.

    Related links:

    Want to know more about Sarah’s nifty little side project? Register for our upcoming webinar.

    AI in the content lifecycle

    Enterprise content strategy maturity model

    LinkedIn:

    Sarah O’Keefe

    Bill Swallow

    Transcript:

    Introduction with ambient background music

    Christine Cuellar: From Scriptorium, this is Content Operations, a show that delivers industry-leading insights for global organizations.

    Bill Swallow: In the end, you have a unified experience so that people aren’t relearning how to engage with your content in every context you produce it.

    Sarah O’Keefe: Change is perceived as being risky; you have to convince me that making the change is less risky than not making the change.

    Alan Pringle: And at some point, you are going to have tools, technology, and processes that no longer support your needs, so if you think about that ahead of time, you’re going to be much better off.

    End of introduction

    Bill Swallow: I am Bill Swallow.

    Sarah O’Keefe: And I’m Sarah O’Keefe.

    BS: And today we’re going to talk about AI in content operations, or more specifically, a maturity model for AI.

    SO: Everything needs a maturity model, even AI.

    BS: Even me.

    SO: I have no comment.

    BS: My maturity model is written in crayon, what can I say? So okay, so we need a maturity model for AI as far as content operations are concerned, and probably in you know, many different degrees, but we’ll focus on content operations. So what might that look like?

    SO: I’ve been thinking about this and what it looks like to employ AI as a tool to help you with content. And as I was thinking about what this looks like, you know, you always fall back on that standard five-step model where one is basically mass chaos, and five is the perfect world, generally. also, one is nearly always cheap, and five is nearly always expensive enterprise things. But, you know, let’s go a little beyond mass chaos versus governed, regulated, etc., and sort of sort of back up a little bit and talk about what this might look like. So level one in every maturity model typically is ad hoc. And what that means is that in this case, AI is being used sporadically by some people. It’s inconsistent. And I would say that when we look at AI and content specifically,

    BS: Mm-hmm.

    SO: This is going to be things like reprocessing your content using public-facing models. So I wrote a draft of something, I shove it into ChatGPT and I ask it to shorten it or tighten it up or identify areas that are problematic. Or I just say, hey, you know, write my article for me. The outcome that you’re gonna get on an ad hoc model is going to depend on an ad hoc level one.

    BS: Mm-hmm.

    SO: AI thing is going to depend on how good you are on the individual’s expertise and their level of interest. So if you want to just go in there and say, hey, I have a bio and it’s too long, and I’ve been asked to produce one that’s only 50 words for a particular conference, for example, then, you know, this is this is actually a really good example of ad hoc, right? 

    BS: Mm-hmm.

    SO: We have these long multi-paragraph bios and every conference I’ve been to has a different requirement for how that bio needs to be shaped. And the fastest way to success is to just shove it into a chatbot and say, give me a 50-word version. And and then read it and make sure it didn’t invent things or give you a PhD or anything like that, and then ship it off to the conference organizer. But this is much, much faster than rewriting it from scratch by hand. And also I think, it’s a good example of something where I have the extended version and I’m going to summarize down. And that usually works pretty well. So level one is ad hoc. It’s kind of sporadic. There’s no standard across the organization. It’s just me saying, this looks useful, or you’ve probably got some use cases in this space as well.

    BS: Right. So it it kind of aligns with I guess level one of the the content maturity model that we talked about a while back, where level one is is simply content exists. Could be, you know, someone typing stuff up in Word or, you know, using a myriad of different tools, no style guide, just kind of getting content out there because people need it.

    SO: Yep. So level two is tactical. And tactical is sort of like we’re using this tool to solve some specific problems. And what you’re going to see here is something like that Bill has invented some nifty time saving tool and he has shared it with other people. Or in a larger organization, maybe somebody invented a nifty validate or or something like that and they’ve rolled it out across maybe the department, probably not the entire organization. Aomething like AI support is being rolled out. Maybe the organization has created a chatbot internally for customers, right? So there’s a chatbot, it’s sitting on the company website, people can use it to get answers, but it’s really bad. And the reason it’s really bad is because nobody thought too carefully about the content going into the chatbot, because again, we’re tactical. So probably this looked like the AI team just raided the local SharePoint, grabbed a bunch of content, did not pay a whole lot of attention to the question of whether this content was up to date in release status. Those things don’t exist, right? It’s just, look, a bucket of PDFs. Cool. Let’s dump them into the AI and go for it.

    BS: Mm-hmm.

    SO: And tragically, in many cases, the techcomm team is sitting on rigorous, structured, vetted, approved content, and nobody remembered to go ask them, can we have your content? Or where is your official content source? Or how do I know what version belongs with which document?

    BS: Right, because you know, in in their point of view there’s a PDF of it, so I don’t need to ask them.

    SO: Yeah, it’s it’s just PDF. How hard could it be? So something like AI was support was rolled out, but nobody really thought about it. Maybe it’s at a departmental level, probably it’s not enterprise-wide. And nobody has really thought about connecting this AI thing to the assets inside the organization in a reasonable, rigorous, governed, organized kind of manner.

    BS: And I suppose that’s where you get to the next tier.

    SO: Right. So the next tier after tactical comes strategic, right? So we have an actual strategy. Now, one of the difficulties in talking about AI is that AI is a tool and it’s kind of like talking about electricity. You can apply it to lots of places and it’s more sensible in some places than others. But when we say what’s your AI strategy, like how do you use water? I mean, come on, and the answer is of course to drive the AI and you know destroy the environment. But there are things that you can do with AI that are useful for content. There are also things that you can do that are not. So if you have a strategic approach to this, a strategic approach to use of AI, backing up to the authors again rather than the delivery side, maybe this looks like a collection of prompts that have been built that are shared.

    BS: Not with electricity.

    SO: Maybe this looks like saying this is the workflow that you employ. These are the kinds of things that we do to actually test whether this thing is working. these are the metrics that we’re following. So there’s an actual overarching bigger picture that somebody’s thinking about that goes beyond, let me go shove this into chatbot of the day.

    BS: Mm-hmm. Right, right.

    SO: So there’s an actual strategy for the public-facing chatbots. Somebody has thought about the back end. The authors have useful AI tools that add to their you know their productivity. One of the things that I’m hearing a lot now, you know, low-hanging fruit, release notes. Nobody wants to write release notes. It’s a terrible drudge task. It’s and it needs to be done. Well,

    BS: Mm-hmm.

    SO: There’s now there are now a lot of solutions that look like look at the diff in the code, look at the delta from you know version one to version one dot one, find the diff in the code, find the changes that have been made, look at the JIRA tickets that have been addressed, that have been solved in release one dot one, and then consolidate that all into a set of release notes that say, here’s what’s been done. And that’s probably 90% of the work, and the last 10% of the work is read that and make sure it’s accurate. Right? Don’t please don’t skip that step. Like actually look at what the thing is generating. Now, what’s interesting to me about level three, this sort of more strategic approach, is that what you’re gonna start to see is that you have prerequisites for this. You can’t do this. 

    BS: Yes.

    SO: So release notes are good example. Let’s say that hypothetically, and this is gonna sound insane, but let’s say that hypothetically, you have software development and you have no source control.

    BS: Hmm.

    SO: Everybody’s screaming, right? Because this is nuts, and why would you ever do this? Okay. But hypothetically, you have no source control. Okay. How do you know what’s changed between version one and version one point one?

    BS: It’s up here in my head.

    SO: Excellent, great. Ha okay, cool. so I’m gonna need to connect the AI to your head so that we can pull those changes out of your head.

    BS: That sounds fun.

    SO: Yeah. Amazing. Right. So all of a sudden, because we want this automation, right? We want the ability to go in and extract release notes and do something with them. We have to have a certain level of maturity on the software development process so that we can grab the appropriate information. Now, the same thing is of course true on the content side. You have to have a certain level of maturity in your content development processes, in your content management, so that you can identify, you know, the right things to process and the right things to access. And why it is that, you know, we know that software has to be governed, but we’re not so sure about content is a mystery to me.

    BS: I had never understood that.

    SO: Yeah. So there we are. Okay, so that’s kind of like a level three. With there’s some sort of strategy emerging across the enterprise. There are some useful tools and they’re shared. This is kind of like in content when you start thinking about templates. We’re gonna have some templates and we’re gonna give them to people and they’re gonna use them and it’s gonna be great. All right, so level four is governed, managed. And so now good understanding of AI.

    BS: Mm-hmm.

    SO: It is being applied in a useful, intelligent manner, by which I mean don’t apply it to the wrong problem sets, right? Apply it to the things where it makes sense to apply it. Thinking about governance, thinking about metrics, thinking about success. And then your data sources and your content sources are being managed in such a way that the AI gets good input and can actually generate good output. So I’m actually not a big fan of the term human in the loop because human in the loop implies that the AI is doing all the work and then like the human eventually gets around to QAing it. You know what? We’re terrible at QA. You know who’s good at QA?

    BS: Mm-hmm. AI.

    SO: No, computers, not AI. AI is all about probability and whatever. It is actually not very good at QA. What’s good at QA is traditional software, right? One plus one is always two. In an AI, one plus one, sometimes it’s not two. So you manage that stuff and you put those guardrails up and you start putting up the guardrails that say, okay, when the AI kind of wanders off into the wilderness, we’re gonna like bring it back to reality. We’re gonna have, we’re gonna put it in a box, right? Make the AI think inside the box, and we’re gonna govern what that box is. AI is great at thinking outside the box. Unfortunately, that’s usually not what we want from technical content. So it needs to be in in the box and it needs to be consistent and needs to be managed and all the rest of it. 

    BS: Mm-hmm.

    SO: So we govern it, right? We go in there and we make sure that the processes and the tooling that’s being put in place and the automation that’s being put in place where we’re leveraging or using AI to do things is managed. And so the human in the loop thing. I don’t want the human in the loop to fix things on the back end. I want the human in the loop to fix things on the front end so that what goes in is better, so that there’s less work to do when it comes out. You know, fix it beforehand. Don’t remediate it afterwards. That’s a boatload of work and it is not fun. So fix it ahead of time.

    BS: Right. Yeah. And likewise you probably wanna have, you know, some guardrails in there so that, you know, your AI, whatever it is, doesn’t go playing around with content that has been approved and released and is not slated for updating.

    SO: Yeah, you know, don’t fix that. That one’s done. That one’s and you know, we’re not even talking here about what it means to be in a regulated industry or in a regulatory environment. there if you are shipping or sorry, if you are a large organization and you are doing things in Europe, then you are likely subject to the European, the EU AI Act.

    BS: That’s a completely different beast.

    SO: And you have to think about what that means for what you’re doing, because the fun gold rush wild, wild west strategy of just throw AI at everything is not gonna fly in Europe. Okay, so that’s governed, you know, hypothetically. And then level five is agentic, which is basically that everything, everything or a lot of it is running autonomously.

    BS: Mm-hmm.

    SO: You know, the layman’s explanation of what is agentic AI, the difference is that instead of saying I need to put a prompt into the chatbot, it does it itself because you’ve built out the systems that drive all of that happening. 

    BS: It understands what needs to happen at what point in time.

    SO: Well, let’s not say understands, but yes. I’m trying so hard. 

    BS: Well, yeah, not understands, but there’s a workflow in place that the AI is following.

    SO: And it’s so difficult. I think that, you know, there’s, as a side note, the why do we think, why do we impose personality on the chatbots? And the answer is I think that psychologically it’s very, very difficult to interact with something that play acts at human interaction. What a great idea! Good for you. I love your thinking, blah, blah, blah. So it makes you think you’re interacting with a human. And I don’t think that our brains are equipped to say, no, actually, this is a machine.

    BS: It’s like a scary version of Teddy Ruxpin.

    SO: It, well, it passes the Turing test. And so we just can’t separate if it feels like you’re interacting with a person, you know you’re not, but it feels as though you are, and feeling is always gonna win over knowledge. So

    BS: Mm-hmm. Well yeah, the interaction is a lot more organic than you get from, you know, traditional tools.

    SO: Or, you know, yeah. I mean, think about the difference between a search, typing in a search string, and you know, a conversational search, a conversational interface. It’s quite, quite troubling, actually. Yeah, so this is kind of the five-level model, right? From big mess ad hoc, some things are happening, th some things aren’t, up to it’s completely autonomous. Now, if it’s going to be the more autonomy you want, the better your inputs have to be, which circles us right back to, and therefore, you have to do the work on the content side, because if you don’t do the work on the content side, the AI is going to go off the rails in interesting, unexpected, and potentially disastrous ways.

    BS: It will play with the mess you leave it.

    SO: Yep. So that’s where we’re going with this. That’s the AI content ops maturity model as it stands today. I reserve the right to change it tomorrow.

    BS: Today. Of course.So this model came out of I guess some little nifty side project you’ve been working on recently.

    SO: I am working on a nifty little side project. We’re not quite ready to announce it. but I’ve got a a co-author and we’re working on a thing.

    BS: Fair.

    SO: I could say more but then I’d, you know, be in trouble.

    BS: When might you be able to say more?

    SO: I believe that we have a webinar coming July 22nd, where we will say some more things.

    BS: Alrighty. Well we will learn more things then.

    SO: I too will learn more things and probably we’ll have to we’ll probably we’ll have to change everything we’ve done up until this point because everything will change by then.

    BS: Of course. Guess that’s a good place to leave this podcast. Thank you, Sarah.

    SO: Thank you.

    Want to know more about Sarah’s nifty little side project?

    Register for our upcoming webinar.

    The post From ad hoc to autonomous: The AI content ops maturity model appeared first on Scriptorium.
  • Content Operations

    Tool selection and the unpredictable variable

    01/06/2026 | 42 mins.
    How do you really choose the right documentation tool? In this podcast episode, Sarah O’Keefe (Scriptorium) talks with Paweł Kowaluk and Michał Skowron (Guidewire Software) about building a successful tool selection process, the realities of docs as code, and what happens when the technology becomes the unpredictable variable.

    Paweł Kowaluk: It’s funny how programming used to be deterministic, and it was the people who were messy. We always knew that people are going to be whimsical and maybe harder to rein in, but the technology is going to be predictable. Whereas now, technology is not predictable anymore, and you give it a prompt and you hope it’s going to do what you want. You adjust the system prompts and change the weight of things which are retrieved versus metadata, et cetera, and it doesn’t always work the way you expect it to.

    Sarah O’Keefe: And now the people are being asked to be the deterministic layer, right? To be the QA on top of the AI.

    Paweł Kowaluk: That’s actually very insightful. I like that. That is true. The human in the loop or whatever you call it, that’s supposed to be the voice of reason.

    Related links:

    Scriptorium: AI in the content lifecycle

    Tech Writer Koduje podcast

    Tech Writer Koduje: DITA as code – a modern approach to the classic standard

    Tech Writer Koduje: Are people abandoning docs as code?

    Tech Writer Koduje: A tech writing CCMS can also be a broken promise

    LinkedIn:

    Host: Sarah O’Keefe

    Guest: Paweł Kowaluk

    Guest: Michał Skowron

    Tech Writer Koduje LinkedIn profile

    Transcript:

    Introduction with ambient background music

    Christine Cuellar: From Scriptorium, this is Content Operations, a show that delivers industry-leading insights for global organizations.

    Bill Swallow: In the end, you have a unified experience so that people aren’t relearning how to engage with your content in every context you produce it.

    Sarah O’Keefe: Change is perceived as being risky; you have to convince me that making the change is less risky than not making the change.

    Alan Pringle: And at some point, you are going to have tools, technology, and processes that no longer support your needs, so if you think about that ahead of time, you’re going to be much better off.

    End of introduction

    Sarah O’Keefe: Hey, everyone. I’m Sarah O’Keefe, and welcome to the podcast. In this episode, we are going to talk about tool selection with a couple of special guests. With me today are Paweł Kowaluk, who is a software architect at Guidewire Software, and Michał Skowron, who is a documentation tools developer, also at Guidewire. Both of them are based in Poland. Welcome.

    Paweł Kowaluk: Hi.

    Michał Skowron: Hello.

    SO: I am glad to have you. For those of you on this podcast that speak Polish, you’re probably already aware that they have the one and only techcomm podcast in Polish that is available out there, and Michał and Paweł are also experts on doc process and tool selection, so that’s what we wanted to focus on today. So I will start and throw it to Michał and ask you the big picture question, which is what does a good tool selection process actually look like?

    MS: For me, good selection tool process would be divided in three stages. The first one would be gathering requirements, looking what’s out there, defining what you want to basically achieve with this new tool. Then I would go to a pilot project where you can actually test the selected tool in the real world. Manufacturers and producers of software will tell you that it can do anything and it will promise that, “Okay, you can meet all your requirements easily and we can fix that, we can improve that, we can adjust that,” so everything can be done is usually what we hear, but then you want to test it in real world on a real project, so that will be a pilot project for you and your team.

    And the third phase that depends on the outcome of the second phase, which is you either productize the selected solution or you just say, “Okay, that was a bad choice and we don’t need that.” Then we need to go back to the first stage and then say, “Okay, we need to select another tool,” and again, requirements, et cetera, et cetera. So for me, that’s the whole process, and the first stage would be probably the longest one because you need to make sure that you are meeting all your goals.

    SO: So what’s the most common reason that a pilot doesn’t succeed, that you have to go back and say, “That didn’t work. We have to try something different”?

    MS: It’s usually because you didn’t see everything when you were planning. For example, you have some projects that are very specific or you didn’t see all the problems or things that are coming your way. It’s hard to say exactly what the reason is, but it can be multiple reasons.

    For example, using of, I don’t know, branching, let’s say, in a specific tool. When you have multiple versions of your product and you want to keep them separate when it comes to documentation, it can turn out that the feature says, “Okay, you can use branching and then you can do it easily,” and then you start using it and it turns out that it doesn’t work the way you expect it. This is actually a real life example because we had a system that… I’m not going to mention any names or anything like that, but there was a system and they promised us… That was years ago and it was a vendor that promised us that they’re going to introduce a feature called branching, and it turned out after they did that that it wasn’t what we expected. So it can turn out in many cases, in many ways, it can be the problem, but branching is just an example, but it can be many other things that can go wrong.

    PK: Hey, if I can jump in here, I got a couple of examples. One is I could call it releasing strategy or versioning strategy overall, which is very hard to test in a pilot project. It’s very hard to scope for requirements because the little problems come out after a while, after a year of publishing, after two years of publishing. And another example which is related is reuse, and this one is down to formulating the requirements correctly. Because I think just saying, “We want to reuse something,” is not enough, because you have to say exactly what you want to reuse and how you want to reuse it and what you want the result to be.

    So for example, if you say, “I want to reuse notes and warnings and things like that.” We sometimes call them admonition. So, “I want to reuse these notes in my docs, and if I update a note, I want it to update in every published version of the doc.” Then only if you have these details, like I want to update it once and then want it to automatically update and publish docs, then you will see it’s not working the way you expect it. Because if you just say, “I want to reuse notes,” every system can reuse notes. Even in docs as code, there’s scripts and macros that allow you to reuse notes.

    MS: It can be also another thing that, for example, you compare the benefits with the actual cost of implementation, and it can turn out it’s not worth it because people are, for example, reluctant to use your new tool. The training, the cost of licensing, the cost of support is too big, and then you realize, okay, we want to achieve a goal like, I don’t know, reduce the time to market, and then it turns out it doesn’t work because people are struggling with using the product on a real project. And on paper, everything looks cool and you have all these features, you can use them, like Paweł mentioned, for example, reuse conditional formatting and things like that. And then it turns out it’s very hard to use, it doesn’t serve its purpose, and then you have to pay for every additional stuff and people don’t want to use it, so what’s the point?

    SO: Yeah. And I think we find that the technical problems in general, if you’ve done your requirements work, then usually, the technical problems are solvable if the people engaged in the project want to solve them, and that’s where you run into the change management issues that you’re talking about, that if the team that is being asked to pilot, to test, to try things out is sufficiently disinterested in making it succeed, they will find a way to make it fail. And the reverse is true as well. If they want it to succeed, you can implement tools that are … Well, all tools are imperfect, but you can implement tools that are not perfect solutions and succeed if the team is behind you, and if they’re not, bad, bad things will happen.

    PK: Oh yeah, that’s true. And I’ve been on projects where we did not do proper change management and I’ve been on projects where we really did it well. If you start early and you involve people, like get the biggest troublemakers, people who are the most opposed to any change, get them on the team, and if you can convince them, that means, one, you are making the right choice because you’re convincing people who are skeptical, and then two, you are set up for success. These people are going to be your biggest champions of the new solution.

    MS: But it’s good that you mentioned it because I think it’s worth emphasizing that the goal of the pilot project is not to succeed. That’s not the actual goal. The goal is to verify the requirements against the real project, and so the failure is also a success to some extent. It’s not like you have to do everything to prove that the selected tool is the right one. No, you should be aware that the pilot can end up with your let’s call it failure, and then you realize that it’s either a bad choice or you don’t need that at all. For example, you don’t need that tool at all. It can be also the outcome of the pilot project. So there are many different outcomes, so don’t be fixated on the success path that is the only right way. It means, okay, the pilot project was a success because everybody agreed that that was the right tool that we want to use, so keep it in mind.

    SO: Yeah. I think that there’s … I got into actually a debate with somebody about this. They were telling me that pilot project, proof of concept, and what was the third one? Prototype are not the same thing. Okay. Well, so a pilot project is, “We think this is the right answer and we’re going to try it small and then we’ll expand.” A prototype is, “We’re just going to try it and throw it away,” and a proof of concept is, “We think this is the right answer, but we’re not sure and we’re prepared to throw it away.” And I thought, “This is way too specific for me, but okay, sure.” But to your point, the project, whatever category it belongs in, has different purposes, and it’s important to be clear about what kind of a project is this? Is this essentially beta testing, this is step one, or is this more speculative? We’re just really, really not sure.

    PK: I think it has to be falsifiable. Like they say in the scientific method, there has to be a failure criteria somewhere. This will fail if ABC happens, because if you don’t have that-

    MS: Or a success criteria, right?

    PK: Or a success criteria, but I’ll be more focused on the failure criteria because you can always show, “Yeah, we accomplished this, 30%,” but if your failure criteria is below 50% is fail, then you will say, “I failed this.” You fail five out of six and the project is a no-go.

    SO: So I wanted to switch gears a little bit and talk about some specific tool chains and problem sets. I think it’s fair to say that the two of you are, I’m going to say mostly but not exclusively, focused on docs as code, so what does that look like? And there’s a lot of debate about docs as code versus structured content and they’re very much pitted against each other, but ultimately, what’s your perspective on the situations where docs as code is appropriate or maybe is not appropriate, and what do you look for in a project or in a problem set that matches the docs as code model?

    PK: I think the reason people associate us with docs as code is the last eight years, we’ve been working at Guidewire and that’s the strategy we’ve chosen there, so I think we’ve been entrenched in this worldview. My previous job before Guidewire was a consultant, and I would go from a company to company and set up different systems for documentation. That was my specialty, systems for publishing and updating documentation. So yeah, circling back to your question of when it works, when it doesn’t, I guess docs as code is more about it works when the right mix of people are creating the docs and the mix of people includes software developers. So for example, at Guidewire, we have several dozen static websites which are maintained by software teams without any technical writers involved, and those are a variety of internal tools, tools which are almost external, and then tools which go out to customers, and all of these little websites integrate very well with our publication system.

    And I think this is the main criterion for docs as code fitting is you are giving tools for writing to people where they work. So software developers work in code. You give them tools for writing in their codes so they don’t have to buy extra licenses, get training on anything external, use some alien process, alien to them, just because they follow SDLC, the software development lifecycle to update their docs. And it’s what they’ve been doing, and then it’s easier to convince or it’s easier to fit in a smaller team of technical writers, I don’t know, like 40 technical writers versus 2000 software developers. It’s kind of everyone contributing to the same documentation system. It’s easier for the tech writers to adapt and join the docs as code platform than the other way around.

    MS: Just like Paweł mentioned, docs as code makes sense in certain environment. It’s not like we are tribal about it. We love this solution because it works for us. So after a few years at Guidewire, we realized that this is the way we want to go, because as I said, it makes sense and it just works. We tried different things before, and before I joined Guidewire, there were different solutions. We used, for example, CCMS and we had more a traditional approach to producing docs, let’s call it this way, and then we started building our own pipelines, our own solutions. We started integrating with what was there that other devs used for their work, not related to documentation, but we decided, “Hey, maybe we can use what’s out there and just plug into the same infrastructure.” So as I said, it’s not for everybody, it doesn’t work in every environment, so I wouldn’t say, “Okay, if you’re in a factory, you should use docs as code.” No, I wouldn’t say that. So maybe we positioned ourselves as docs as code proponents, let’s call it this way, but this is because we use it every day, we build it, and this just works for us. And also maybe I can mention that we decided to end this holy war by putting DITA into Git and CICD pipelines, so we have DITA as code, and now everybody’s happy.

    PK: Oh, this is actually a great point because Sarah, also mentioned docs as code versus structured. Now we’re doing docs as code in a structured way, and where technical writers are working on the docs, they are free to use DITA and a lot of teams do that with all the reuse and all of that, but even if it’s something simple like some markdown files in a repo, we still impose a metadata structure which allows us to integrate the doc into our publishing pipeline, and the two key aspects of what we’re integrating with our authentication and search, and that needs metadata, right? It needs to know who has access to this content and it needs to know how to filter and direct the searches, and from that, our next project emerged. Like everyone else in this industry, we started working on AI solutions, and the AI solution is a third integration point where this structured approach to content also pays off.

    SO: Yeah, I did want to touch on AI and so we should probably just jump to that. What are the implications that you’re seeing? What are the effects that you’re seeing on your current processes of the use of AI or the requirement to deliver to AI, and how do you see that working?

    MS: Paweł, you want to start?

    PK: Yeah. So the content remains the same that we’ve been working on for years, and the structure, the metadata is all the same. We could not change this overnight, but overnight, we had to introduce this AI which is going to look into the content, find the topics, the chunks we call them. It’s going to find the right chunks of documentation and generate answers based on those chunks. So what happened is we had to adjust the AI, but since we were so structured, it was pretty easy, and then we gained a new source of feedback from customers through the AI. Because when people started using our chatbot, we built a chatbot which is available on docs@guidewire.com, but it’s only available to customers and partners, so you need a login on the website.

    When you go there, the AI answers questions, and then we monitor the exchanges that people have with the AI and we see what needs to improve as far as content ingestion, as far as content structure. We generate a lot of internal tickets to our tech writing team and identifying gaps, because we’re seeing now that this AI interaction is possible, here’s what people are asking of the content. And other people are asking, “How do I implement X, Y, and Z in the scenario where ABCD?” Which we didn’t know people were looking for before because they had no interaction with the docs. So they just come to the website, they either find or don’t find what they need, and we don’t know anything about that, and we could ask them, but we’re just going to ask a small group of people, whereas now, it’s as if we’re asking everyone.

    So we’re seeing these new patterns and something emerges out of those patterns, like everyone’s asking for code samples in this specific area, so we go back to the team and we work on adding more code samples, or people are looking for illustrations of these particular workflows, so people create these illustrations. It’s amazing how rich of a source of feedback this has become.

    MS: And it also adds complexity to even testing all the solutions that we have right now, because with keyword search, I’m not saying it was easy because it wasn’t easy, but now we have another layer where you have this middleman, let’s call this chatbot, let’s say. It’s a middleman that can hallucinate, so you can either have a problem with your content or you can have a problem with the agent that responds. So before, when you had a keyword search and you were missing results, you were going usually to the content and see, “Okay, I’m missing this topic. It wasn’t indexed properly, or I just need to move some knobs in my search engine and just see the fuzziness or something like that.” And now you have this content, it’s being processed by this AI tool, and then it gives you some answers and you don’t know why the answer is wrong. Because it didn’t find the content? Because it’s not there? Because it’s not the right content? Because the content is right, but there is something that it missed? There are so many moving parts right now, more than before that it adds complexity to our job, and this is just one side of it.

    The second thing is writers using AI for creating content, and we’re also exploring this path because everybody’s trying to incorporate AI solutions into their work. We code mostly on a daily basis and we use some tools that help us with coding that are AI-based, but we also want our writers to benefit from all this development in technology, and we’re exploring, let’s say Oxygen, XML, Positron. Just to clarify, we’re not sponsored anything. We’re just using it so I’m just mentioning the name specifically, but even if you don’t have any specific tool for writing that integrates directly with a AI tool, you can still use, let’s say, Copilot or any other solution that you like and you can just ask it to help you with the content, but it comes with a lot of caveats.

    It’s not like you just throw a prompt and just get what you need. It involves a lot of fine-tuning, a lot of working on instructions, giving it context, giving it information that it needs to use for producing code or docs. Because I use it for both, for coding and for documenting, for internal purposes mostly, but it takes time to make it work the way you want it.

    PK: Yeah. It’s funny how programming used to be deterministic and it was the people who are messy and the processes, and working with setting up the process for the doc tools, we always knew, people are going to be whimsical and maybe harder to reign in but the technology is going to be predictable. Whereas now, technology is not predictable anymore, and you give it a prompt and you hope it’s going to do what you want. You adjust the system prompts and change the weight of things which are retrieved versus metadata, et cetera, and it doesn’t always work the way you expect it to.

    SO: And now the people are being asked to be the deterministic layer, right? To be the QA on top of the AI.

    PK: That’s actually very insightful. I like that. That is true. The human in the loop or whatever you call it, that’s supposed to be the voice of reason.

    SO: I don’t know about you, I’m not good at voice of reason. I’m much better at causing trouble. So what you’re describing is a fairly complex and time-consuming approach to this in that you’re saying we have this AI chatbot and we’re looking at the metrics and we’re adjusting accordingly and making changes, and then using the AI on the backend for some productivity kinds of things, but not as a replacement. We’ve seen in the US and in North America, we’ve seen a significant number of people losing their jobs because the idea is that, oh, the AI can just do it, and what you’re describing is not that at all. So are you seeing any of this sort of, “Oh, this will make us more efficient, and therefore, we need fewer people,” or is it a different perspective in your piece of the tech com world?

    MS: I’m aware of all the layoffs and of all the bad things that are happening right now in the IT world, let’s call it, because it’s not only technical writers because it’s also developers and different jobs, but I think I’m still in this kind of a bubble right now where it’s not happening directly next to me so maybe I’m being too optimistic. But I keep saying maybe I’m going to eat my shirt after some time because I keep saying, “Show me one tech writer that doesn’t have a backlog that is not too long.” So we usually have too much work, and I hope that with the right approach and with a sensible management, these AI tools will be doing all the things that we don’t want to do or we don’t have time to do, and then it will give us time to do something more and something more significant.

    I’m looking at my job and I’m not saying it’s perfect because of AI, because it has so many challenges right now and it gives you different problems, but I can see that I can speed up my cumbersome tasks very, very easily. And before, I had to … This is a simple example, and I’m doing a lot of infrastructure work and a lot of backend work, so many things go wrong, and very often, I had to debug difficult problems. It was taking me weeks sometimes to nail the actual place where it happened. Now, I can do it much faster. If I have to debug a problem, it takes me, I don’t know, an hour, sometimes even minutes, and I know that I would spend a day, two, or even a week if I didn’t have these tools. But these are good examples, but there are also a lot of bad examples, but I don’t think we have time for that.

    SO: We’ll take the optimistic view.

    PK: Yeah, I agree with Michal. The approach we have is these tools can give us tremendous productivity boosts, but not in the sense of getting rid of people. We can redirect people’s work to where it’s more meaningful. And what I mentioned about feedback, identifying these content gaps. Some of these content gaps are going to be very mechanistic and you solve them by generating a bunch of docs out of code, for example, and then you put those docs in the repository where the chatbot can find them, because they’re not going to require a lot of thinking. It’s kind of like API docs where you create the swagger spec and then you generate the dogs out of that. You just grab the source code and you generate some samples, and you use that to generate answers to people. Without people, this wouldn’t work because you need, like we said, the voice of reason, but yeah, people are still required in the process.

    SO: I think that looking at it across the industry, if you take AI, take a chatbot, especially a public facing chatbot, and ask it for answers on literally anything, it will give you the average of what’s out there in the world because it’s math, so it gives you the average essentially. And so from a content creation point of view, I think the fact of the matter is that there is in fact a lot of below average content out there. Roughly half is below average, or perhaps exactly half.

    If the information is bad enough, if the content being produced is not of high quality or ungrammatical, and we’ve all seen terrible, terrible documentation that was badly translated and is just incomprehensible, then the AI as a tool for creating content may be able to produce something that is better than terrible. It’s not going to replace a professional, well-trained, highly experienced and knowledgeable product documentation group, or it shouldn’t, but that’s not every company. Not every company has a really good group of tech writers that understand the product and are adding value as they produce the content that goes with that product. And so I suspect that what we’re going to see is a split with commoditization on one end, just auto generated, it’s not great but it’s adequate maybe, and then the higher end stuff, which needs to be done well.

    PK: Well, there’s definitely documentation which only exists because it absolutely has to, and companies like that can easily generate just some kind of documentation and just use it, right? But in cases where … I think the worth, the value of a technical writer is not just writing and generating the content from below their fingertips. It’s more about the research and understanding, like you said, understanding the needs of the users and understanding how to meet them. You throw AI at a problem which sounds like a generic problem, it’s going to give you a generic answer, not necessarily one that is rooted in the organization you’re in. The list of products that you have, the way those products interact with one another, it’s going to miss all that.

    It’s just going to give you … It’s like ordering a hamburger and you say, “So what’s in the burger?” And the AI is going to tell you, “Well, usually in a burger, it’s a patty and lettuce.” And it doesn’t mention that at your restaurant, you use pear and avocado. It doesn’t know that. You know that. You’re the chef back in the kitchen and you know why your burger is special.

    SO: Yeah, I think that’s right. So I did want to touch briefly on the question of, because this is a rare opportunity to talk to some folks that are not based in the US or North America, what differences, if any, do you see in tech writing in the market? Now, you’re based in Poland but working for a, I think, US company, but what kinds of things do you see that are maybe different that especially the people listening to this in the US would not be aware of coming out of Eastern Europe?

    MS: For me, always, it was the fact that tech com appeared much later than in the United States, in Poland of course, because I’m not talking about Europe in general because there are also differences between countries in Europe. For example, Germany is totally different from I would say the Polish market, because in Germany, you usually have factories and you have technical writing associated with hardware, let’s say.

    SO: Yeah, heavy industry, machinery.

    MS: Yeah, heavy industry, that’s right. And in Poland, it’s very often about software, maybe because we have a lot of companies that outsource to Poland when it comes to software development, R&D and stuff like that, so that’s the first thing. And don’t get me wrong, people were doing tech com way before it appeared in the mainstream, but sometimes they were not even aware that this is called technical writing or technical communication. Actually, it happened to me because I moved to techcomm from some kind of IT support job, and then after a year, I was like, “I’m wondering if this is anything regulated or there are some rules or it’s actually a profession.” Then I started digging and here I am.

    But it turned out there are so many things, so I was also surprised. Okay, this is called that. These are standards. There are books. Well, actually, one of the first books that I read was your book, Technical Writing 101, which I’ll still recommend, although it’s been years since it was published the first time, but I think it still holds a lot of truth about techcomm. The core values of techcomm are still there. So going back to your question, the techcomm scene is relatively young, let’s call it this way, in Poland, which gives us a big opportunity to skip some stages, let’s say. So we don’t have the legacy, we don’t have some things that we used to do a certain way and we like it or we are just accustomed to it. We can just start fresh and just jump. In 2000s, we just jump into techcomm and we say, “Okay, let’s do it this way because now this is the way we do it.” And I think that people are flexible when it comes to looking at things a certain way. Not everybody because there are also people who don’t like changes.

    But I think also what is unique about Polish techcomm scene is it’s relatively small, so if you do something outside your work, you don’t treat your job as only a means to earn money and just survive and you want to do something else, let’s say just write an article, like give a presentation, go to a meetup. After a year or two, you just keep meeting the same people, you know basically everyone who is in the circle, so it’s much easier to be visible if you do something outside your work. So I think that would be something unique about our techcomm scene.

    PK: I think what might be a downside of the Polish techcomm scene is a lack of veteran experts, because we don’t have people who have been doing this for 40 years. I’ve been in the industry for 18 years.

    SO: But you do understand that you two are the veteran experts.

    MS: That’s what he’s trying to say, that-

    PK: I’m getting to this. So yeah-

    MS: Imagine that.

    PK: Putting on my clown makeup every morning. What I mean is there’s nobody to look to who wrote the book on technical writing and has been there for 40 years. I’ve been here for 18 years doing this thing, and out of the people who are visible publicly and who contribute to the scene, I don’t know if there’s anyone who has more experience than me. Because I know there are people who have more experience but they’re just not visible. They don’t share their knowledge with anyone, and we miss that, so that’s why we look to the West, to people like you guys, like Scriptorium, and we read books which were created there and we see there’s definitely value, even though something is as old as the scene here or older.

    MS: And people also have this tendency to look at things that were done in the past as like, “This is the old stuff. We don’t need that. This is the old way. Let’s do it this way because it’s better now, because we have all these tools and et cetera.” But I think it’s a big mistake, because what they say? History doesn’t repeat but it rhymes, something like that. So in order to do your job in the present, you just need to look at the past, and this way, you can also be ready for the future.

    I think it’s worth looking at all this, let’s call it legacy stuff, all the experience that people with 30, 40 years of experience in the field have because it’s valuable. Because usually, when you look at it much deeper, it’s nothing new. It’s usually something that already happened but it’s dressed up as something else. So if you look closer, it’s usually something that let’s say you can understand what happened and then you can apply the same rules, maybe slightly change them or bend them. But my experience is the same happens in software development, in coding. People are inventing new things, and when you look closer, it turns out somebody already said that in the past. It was already done this way, but it’s dressed up as something else or named differently or is hyped more, or it was invented too early and now is the time to use it. So the history gives you a lot of perspective, a lot of things that you can use, so don’t dismiss it.

    SO: Well, I think that’s a good point, and as we close this out, I want to ask you what you see in the past or where you’re gaining perspective on the introduction of AI. Whether it’s from a delivery point of view or a backend productivity point of view, where do you see that pattern previously, or do you?

    PK: There are similarities, but they end. They’re not perfect analogies. So going digital was on example, and when I started, it was just on the brink of companies going from print to web, and that was a huge paradigm shift. And we’re seeing reverberations of this even today where teams are still thinking in books and chapters instead of thinking in pages and thinking about even every page is page one, which is, I don’t know, it’s older than me, the idea, but it still hasn’t been adapted to by teams. Now, the AI thing is probably a similar earthquake where it’s the AI reading your docs, giving you answers. You’re using AI to generate these docs, et cetera, et cetera. I don’t think we’re going to get 20 years of leeway to adapt to that and still see people doing things the old way. I think the world maybe moves faster nowadays, but I don’t know, we’ll see.

    MS: It may be something similar to the invention of computer. So instead of writing let’s say manually, people have got this new device that gives you more power and you can do more things and then programming languages and everything, but as Paweł mentioned, the pace was much, much slower. So we had years of development to, let’s say slowly, maybe this is the bad word, let’s say slowly adjust to the change, and now it’s an earthquake. So every day, every week, there is something new and you need to keep up, keep up, so the pace of development I think is the biggest challenge. Because we tech writers survived many revolutions. I just started reading a book by Sharon Barton about the women in technical communication, and women who talk about their careers, they mentioned many, they start very early because there are a lot of examples from the States.

    So they started working when they were not even treated equally with men, which was years ago, and they mentioned all those changes, all those revolutions, all those evolutions. And I’m saying, “Okay, this is what we are witnessing right now, but the pace is definitely faster, so we need to just keep up,” which is hard, of course.

    PK: Well, I have another one which is a good one. I don’t want to not say it. The dot-com bubble. I think maybe we’re in a bubble again, maybe. You’re seeing the inflation of AI prices right now, and I think we’re going to end up in a position where you cannot do all these things with AI which we’re hoping to do with AI, so the surface of usage is going to shrink and there’s only going to be specialized uses and special case scenarios when you use AI because it’s going to be expensive, and a whole lot of AI applications are just going to disappear. This might be a parallel, but I don’t know, it’s always hard to predict, especially the future.

    MS: I’m also predicting some corrections. Let’s call them corrections.

    SO: And I three am also predicting some corrections. You touched earlier on the fuzzy. AI is very good at fuzzy problem solving, but it’s being thrown at things that are old problems that we know how to solve with scripting. And scripting from a computing power point of view is cheap or maybe free. You write your script and you run it, and every time you run it, you get the same predictable result. Now, sometimes you don’t want that. Sometimes you need that fuzzy pattern matching thing, but in the cases where that’s not what you want and we’re still using AI, that is part of the bubble, right? Everything looks like an AI-shaped problem. So yeah, I agree with you. I think that any predictions we make on specifics as to where this is going are guaranteed to be wrong, because I’ve tried this before and I’m always wrong. So I want to thank you both. This has been very entertaining and very interesting, and I hope that we will see you again and you’ll come back and tell us more about what you’re up to over there.

    PK: That would be lovely. I would like that very much. Thank you.

    MS: Yeah, sure. No problem. Thank you very much. That was fun.

    SO: Yeah. Thank you both. We will see you soon.

    Want to learn more? Download our book, Content Transformation.

    The post Tool selection and the unpredictable variable appeared first on Scriptorium.
  • Content Operations

    Taming AI: Using AI for content conversion at scale

    18/05/2026 | 24 mins.
    AI promises to transform content conversion, but what does it actually look like when you’re processing thousands of documents a day? In this episode, Sarah O’Keefe (Scriptorium) and Rich Dominelli (DCL) dig into the real-world challenges of using AI for large-scale structured content conversion.

    Rich Dominelli: If you have millions of articles and you’re asking the AI, ‘What did we do for this project six months ago?” The AI has to find those articles, pull the relevant information out of those articles, summarize it, and hand it back to you. The best way of doing that is to give extra signals to the AI, structured relevant bits of information, front matter, back matter, publication date, keywords, abstract, that allows the AI to query the corpus and get the relevant chunks out of that corpus in a very quick manner. Then, it can summarize what those chunks are. So the AI almost becomes the user interface over that corpus. But to find that data in the first place, structured content is key. Structured content is key when you’re dealing with big indexes and the web, and it’s the same with AI.

    Related links:

    Defeating Nondeterminism in LLM Inference (white paper)

    Data Conversion Laboratory (DCL)

    Scriptorium, Machine experience (MX): Making content work for humans and machines (podcast)

    LinkedIn:

    Host: Sarah O’Keefe

    Guest: Rich Dominelli

    Transcript:

    Disclaimer: This is a machine-generated transcript with edits.

    Introduction with ambient background music

    Christine Cuellar: From Scriptorium, this is Content Operations, a show that delivers industry-leading insights for global organizations.

    Bill Swallow: In the end, you have a unified experience so that people aren’t relearning how to engage with your content in every context you produce it.

    Sarah O’Keefe: Change is perceived as being risky; you have to convince me that making the change is less risky than not making the change.

    Alan Pringle: And at some point, you are going to have tools, technology, and processes that no longer support your needs, so if you think about that ahead of time, you’re going to be much better off.

    End of introduction

    Sarah O’Keefe: Hey everyone, I’m Sarah O’Keefe and I’m here today with Rich Dominelli who is a Senior Developer and Architect at DCL. Rich, welcome.

    Rich Domineli: Hi, thank you for having me.

    SO: Glad to have you. We were talking before we hit the record button, and you described yourself as a perhaps hopeful AI evangelist.

    RD: Yeah, I am well and thoroughly immersed in the AI game at DCL and using it and plus I play with AI assistants at home. I’m enthusiastic about the future of AI, sometimes disappointed about the present.

    SO: So DCL, as I think many of our listeners know, is focused on conversion at scale, which to me makes a great use case for AI because ultimately conversion is about edge cases and about inconsistency, right? If everything was 100% consistent, conversion would be pretty easy.

    RD: Yeah, no, DCL does a lot of structured content generation out of unstructured data, and the creativity, especially in the academic space, of what that unstructured data looks like is sometimes nightmarish. So the AI lets us, does a lot of the heavy lifting for us when it comes to looking for particular items, identifying concrete data points within the documents, pulling things like authors and affiliation, front matter type information, and back matter type information out of the documents and in automated fashion. It can be painful from time to time, but it’s definitely helped.

    SO: Yeah, so this is, think, you know, the reality of working with AI and working with it in a production environment in order to address all these weird edge cases and what’s going on. So tell us a little bit about how you’re using AI in, you know, these conversion use cases. What does it look like to go in there and start applying some of these tools that we have?

    RD: So, I mean, typically our flows work in a way where we’re coming in with a PDF or a Word document or some other unstructured format. We take it, we reformat it into a version that’s more AI-friendly, like Markdown, for example. And that’s usually the first step we’re doing when we’re looking for information to pull out of it like front matter. It’s a very common use case.

    If you look at academic papers, the front matter, the authors and the affiliations that are on that paper can be formatted in more ways than I could list out during the course of this podcast. It’s kind of crazy. So what we’ve started doing, and we’ve been doing this for a couple of years now, is we’re using the AI, we’re handing it the Markdown document, and we’re saying we need to list authors and affiliations, please extract it for us. 

    Now, naively, when we started that process, we assumed that the AI would give us a consistent list of authors and affiliations. And sometimes it does. But every time you do that call, you’ll get it in a different format. So then you have to start tightening things down. So OK, give me a list of authors and affiliations. I want it to be structured exactly like this. And typically, we have a JSON structure that we’re presenting to the AI, along with our prompt, and saying, give it to us. Well, okay, and that gets you a good chunk of the way there. And that was very exciting when we had that working consistently, we were getting things out of the system on a consistent basis. Awesome. But then you start looking at the results, and every once in a while, you get an author that was missed, or there would be too many authors on that paper. 

    We had one test paper, which I loved, which had 600 collaborative authors in it. And the AI would just choke after about 280-ish. So then you have to start dealing with things like paging through the data and formatting the data. And then you have to figure out, well, did it miss anything? You have 600 authors. Good luck. So now you have to take what the AI did and compare it against your own representation of it and write a program to do that comparison to say, OK, is it good? Is it good?

    You have to take a step back and you look at it and you say, okay, we have the information that’s in the non-structured format. We’re handing it to the AI. The AI is gonna give us a structured version of it and we need to validate it. Well, the first validation is very easy. Does that structured version match the schema that we gave it? Yes or no, that’s easy. Well, then you have to say, okay, is everybody there? Well, is there anybody added? Because the nice thing about AI is they occasionally get very creative. Even if you have that temperature dial turned all the way down to zero, it will pull names out of thin air and then come back to you with some random name and stick it in the middle of the data where it’s not obvious, of course, and then hand it back to you. So then you have to start saying, are all the names that appear in this list actually in the document? Are the counts matching? And if it’s not, you go back to the AI and you ask it again, and usually you’ll get a better answer the second or sometimes the third or fourth time. 

    But you need to be able to catch that, especially if you’re doing this at scale, because if you’re doing a few, it’s easy, you can eyeball it. If you’re doing 1,000 of these a day, you can eyeball all of them. You can say, you can ask the AI, OK, give me a confidence level, but if you can’t trust it in the first place about what it’s returning, yeah, I’m very confident about what I’m giving you right now. It’s really the truth, I promise you this time. I don’t know how trustworthy that would be. So you have to write tools to validate what the AI is producing, or you have to use the AI to validate what it’s producing. So coming in the first time, obviously, we did the count, we did the schema validation. We then said, okay, we’re going to check to make sure all the names appear in the document, we’re going to have landmarks in the document that we can refer back to. So if you start with Microsoft Word and you have track changes on, you can have paragraph IDs that are supplied. So you can make sure that you can find all of the authors in that list and they all have a paragraph ID and you can have your landmarks and that’s great. Or you can even hand the results to a separate AI call and say, proofread this. Is this accurate? Is this the best answer that could be for each of these? I know we’ll come back with an answer. And you can use that as a signal to gauge accuracy and to gauge repeatability and make sure it’s correct.

    SO: So you’re, let’s see, generating an AI, not a test bed, but an AI environment that’s doing this conversion or that’s processing the files for you for conversion. And then you have to go in and do all this validation to make sure that the output that you’re getting is actually correct. As compared to, I’m gonna say old-fashioned, but you know, as compared to scripting, deterministic, pretty straightforward, if A then B kinds of scripting. What are the differences between that and AI-driven conversion in testing and validation? What are the test plans? How are they different conceptually?

    RD: So from our perspective, the frustrating thing sometimes is the AI is completely non-deterministic. 

    SO: Mm-hmm.

    RD: It can give you a name formatted one way today, and then tomorrow, its formatting might be subtly different, where in the paper it has “Richard Dominelli, Junior.” The AI may decide, well, that comma probably shouldn’t be there, or junior should be followed by a period, and it wasn’t in the paper originally. And you can try prompting around that and tell it to prompt around that and make sure that it’s accurate. But it doesn’t always follow your instructions exactly when that’s the case.

    SO: And why is that? Why is it non-deterministic?

    RD: Because AIs are built on a neural network, the neural network itself has fuzzy fields within that, mostly due to floating-point arithmetic. So when you’re looking at it and it’s that weight on that particular key might be out to like 16 digits of a number and it might shift it slightly one way or the other. There is a fantastic paper from, I wanna say it’s anthropic, that goes through the different reasons why AIs are non-deterministic. It goes through repeatedly querying for the AI and who Richard Hyman is and getting back a different answer every single time. They’re all correct. However, they’re all slightly different. The other thing that will lean into that is if the AI is being heavily used, the memory and model weights will shift ever so slightly and you’ll get a different result.

    So you’ll end up having an issue where today I’m getting accurately this way and it’s relatively consistent, not perfectly, but close enough. And then tomorrow, it may just give you a dumpster fire of random information and you need to be able to detect that. Okay, the other challenge we hit fairly early on is more and more people are aggressively using AI right now. So we’re actually starting to hit issues where the LLM providers are overwhelmed. So you have to be able to code in sale over because you’ll literally get too many, you’ll get 429 errors, which are basically, I’m too busy. I can’t deal with your request right now. Call me back. And you’ll have to go back and repeatedly query to get around that. I am hoping at some day in the near future, we’ll be able to have in-house AI at scale and have these wonderful models that are so intelligent that we can run on our local hardware. And so I won’t have to deal with that, but right now, that’s not the case.

    SO: So given all of this, I mean, I’ve asked you the leading question about the issues and the negatives, but what then makes an AI-driven conversion appealing versus a sort of scripted, deterministic, if I plug in AI, I will always get B output?

    RD: So part of it is the type of data we’re dealing with. We’re dealing with unstructured information and the unstructured, the creativity of the unstructured information is rather astonishing. You’ll have people format things, know, we’ll get papers in where the entire paper is placed in different cells of the table. It’s not tabular information at all. They just, you know, we wanted this particular section to be in this cell and this particular section to be in this cell and this particular section. And the AI, I don’t want to say is immune to that, but it’s a lot more forgiving than having to write those reg ex or traditional programming or word interrupt things to try to extract that information, because the AI can address it in a much more fuzzy fashion. I know approximately what an author’s name looks like. I know approximately what a reference looks like. Even though today they decided to do it in Comic Sans or with Wingdings fonts, I can still read that and move on. So that’s really the wonderful aspect of it, is it gets around a lot of that fuzzy logic coding. You’re not dealing with having to address each of these nuances in a generic switch or state machine to try to figure out, OK, this paper should be classified this way and this approach used. Instead, the AI does a lot of that heavy lifting for you.

    SO: Okay, so it gives us that sort of fuzzier, more, I’m gonna say more flexible, I know if that’s exactly the right word. And then the outcome, what you’re describing is you’re ingesting unstructured word, PDF, those kinds of things, and turning them into structured content, presumably fundamentally XML of some sort, but also some other downstream formats. So I wanted to switch gears a little bit. There’s been a lot of conversation about using structured content as an input for AI. So this, guess, is the scenario where you’ve already ingested the unstructured content, have remediated it in various ways. We now have structured content, and we’re gonna take that and feed it into, I guess, AI part two, right? So we’re past conversion. And there’s a lot of people saying, you should feed structured content into AI, it will make the AI better. And so my question for you is, you know, is that the case, and also maybe why and what goes into structured content that makes it produce better AI outcomes, potentially, assuming that it does.

    RD: So there’s a bunch of guides out there. There are two pieces of conversation. First, there’s a bunch of guides out there for prompting AIs where they suggest using XML or simplified XML tagging to give the AI signals about your prompt that aren’t verbally expressible. So here is my question. Here is an example. Here’s how I want my output to look like. And you can put tags around that when you’re actually prompting the AI and the AI will know that those signals mean that it should pay attention to it. Okay, so that putting that aside, what I think you’re really asking though, is how does structured content, structured documents, the JATs and the DITAs and the S1000Ds and how does that help the world of AI? And to answer that question, we have to go through two things.

    One, we have to go through retrieval augmented generation and context rot. So let’s talk about context rot first, because that’s a really interesting topic and people don’t talk about it enough. You have these large language models that are coming out right now and they’re advertising this sticker shock value of, can ingest a million tokens and it has this tremendous memory so you can stick the entire encyclopedia botanica in it, and it will be able to ingest it and regurgitate it. There’s a whole lot of academic work out there that basically says that, hold on a second, practically speaking, once you exceed a certain size, even though they can technically hold that million tokens of data in memory, they’re not gonna be answering as accurately as a smaller model.

    The most common example or the most easy test for that is needle in the haystack test, where you take a document, you stick a random fact in the middle of it, and you hand the AI the document, and then you ask them for that random fact. Nine times out of 10, it will answer incorrectly. An even easier test is there’s a website which I actually like called A Thousand Names. And all this website is is a thousand randomly generated human names. The thousand randomly generated human names. You take that, you give it to the AI, say, how many names are there? And more often than not, you’ll get, well, when you do 100, you’ll get an accurate answer. 200, accurate answer. 300, things start to break down. You might get 300, or you might get 280, 320. You might get a random answer. 

    And then it gets progressively worse as it gets bigger and bigger. So if you’re working in the context world, content world, you’re looking at ingesting documents into a corpus of some sort. You’re making these structured documents in such a way for the sole purpose of making them retrievable. You want the AI to be able to retrieve those documents and the relevant documents from the corpus so that I can answer the question. A, because your corpus is probably bigger than that million tokens. And B, because the less data you send the AI, the more accurate the answer is. So the better way of thinking.

    SO: And so a token is roughly a character, right?

    RD: No, a token is actually roughly a word. It’s less than a word. It’s kind of a lot, but it’s still not like a PubMed-sized corpus or anything like that. It’s roughly the size of the New and Old Testament of the Bible, roughly a million words. So just give everybody that mental picture. But that’s just one book. 

    SO: Roughly a word. So a million tokens is kind of a lot. It’s a lot of words.

    RD: So if you have millions of articles, or and you’re asking the AI, you know, what did we do for this project six months ago that involved JAPs in this solution? And the AI has to say, okay, it has to find those articles, and then it has to find the relevant information out of those articles to be able to summarize it and hand it back to you. And the best way of doing that, and the best way we know how to do that is to giving extra signals to the AI, giving those structured relevant bits of information, front matter, back matter, publication date, keywords, abstract, that allows the AI to query the corpus and get the relevant chunks out of that corpus in a very quick manner. And then summarize what those chunks are. So the AI almost becomes the user interface over that corpus, because it’s going to summarize the data. But to find that data in the first place, structured content is key because for the same reason, structured content is key when you’re dealing with big indexes and web, same with AI.

    SO: So then structured content is potentially helpful. And I guess then circling back, let’s say I’m sitting on a pile of content of varying degrees of structured or unstructured, varying degrees of quality or lack thereof. What kinds of things should be happening before that content gets ingested into some sort of an LLM or some sort of a corpus to be used in AI-generated outputs?

    RD: So these are the same type of things you would do to make them easily retrievable ahead of time. So the standard approach that was being espoused about two years ago, a year and half ago, was something called Naive RAG. You can just take your PDFs and throw them at the AI, and the AI will ingest them into a vector database, and it will do semantic similarity and find the documents that you care about, not the best approach when you start talking about large amounts of documents. And there are issues with semantic similarities, where the AI will have a hard time distinguishing negative cases, will have a hard time peeling out the best documents, and that type of thing. So the best approach to take is you want to take those documents, you want to turn them into structured information in such a way that it’s easy for the AI to ingest. So typically that involves chunking it, into topic-level pieces or semantic chunking, coming up with summaries to make them easy for the AI to find, and whatever other information you may want to chase out of those. 

    So, for example, if I’m handing a PDF to an AI and saying, I want to be able to search this PDF later, well, six months from now, if I get a new version of that PDF and I want to search it, search against the two of them, I really want my answers coming out of the second PDF. That’s metadata, that’s structured information that doesn’t appear in the text of the PDF or may not appear in the text of the PDF. You wanna be able to do things like versioning, you wanna be able to do things like dates, you wanna be able to give these signals to the AI to be able to pull that information back quickly. And that’s really where structured content comes in. So for the purposes of preparing your own corpus, you want to convert them into an easy to ingest format, which typically means Markdown or XML or something that the AI can deal with. You want to give it whatever other signals you can so that it’s easy to find. And then you want to hand it to something that first does chunking and then text embedding, which is basically turning the information into numbers so that you can do those cosine similarity searches. And then you want everything handed off to some kind of object store like a hybrid brand database or the hybrid factor database or graph database so that they’re easy to pull out.

    SO: Awesome. So you started this off talking about being the hopeful evangelist, and now having gone through all of this, it sounds as though you’re really thinking about these issues and dealing with them at scale. What are some of the top things that you’re thinking about going forward, whether hopeful or not, the good, the bad, and the ugly?

    RD: So one of the interesting aspects of my job is I get to do a lot of interactions with AI from an R &D perspective and do some in-house programming and do some in-house tool use. And what we’re finding is developing our own internal mechanisms for AI to call third-party tools, to be able to call Crossref or Grovid or some of these reference facilities out there through like model context protocol or through API calls so we can execute those calls and get that information back and do validation before it hands back the results is a very interesting topic for us because that would let us do things like any AI have it do the first few rounds of validation before it ever comes back to us without having it go to the next step, do a validation step and then the next step and then possibly do a round trip. It would be a much faster interaction. We use right now, of course, like most of the world, we’re using a lot of AI coding tools to tighten up our code bases to make sure things are working well, to basically act as a force multiplier when we’re doing development on projects, which is phenomenal.

    I can’t say enough good things about Cloud Code, you know, because it’s really become an essential tool in my day-to-day life. But I’m also seeing a lot of people out there using these tools to help analyze their own and improve their own workflow and that day-to-day work. We talked with one of our customers recently, and they use cloud code, even though the person giving the demo was not a developer; they use cloud code to answer the RFP. And Cloud Code does a tool use call against their document corpus, answers the RFP correctly, and what used to take two or three days of slogging through documents and finding things are now being done in an hour by one person instead of having multiple people working on this project. So it’s great to start seeing that type of stuff in the enterprise just blossom because it’s really exciting.

    SO: Well, Rich, I really appreciate your insights on this. I learned a few things and I think that it’s great to hear from people who are actually using this stuff, you know, in a production world, in a high stakes world where you’re actually, you know, need to get the content right, get the information right as opposed to just, you know, that we’ll play around with it and not worry about it too much. So thank you, and we’ll look forward to hearing more from you and what you’re doing at DCL.

    RD: Sounds great. Thanks for having me.

    Conclusion with ambient background music

    CC: Thank you for listening to Content Operations by Scriptorium. For more information, visit Scriptorium.com or check the show notes for relevant links.

    Want to learn more? Download our book, Content Transformation.

    The post Taming AI: Using AI for content conversion at scale appeared first on Scriptorium.
  • Content Operations

    Machine experience (MX): Making content work for humans and machines

    04/05/2026 | 19 mins.
    Your website may look great to humans, but can machines understand it? In this episode, Sarah O’Keefe (Scriptorium) and Tom Cranstoun (Digital Domain Technologies) explore the emerging discipline of machine experience (MX). Sarah and Tom discuss what AI agents actually encounter when they visit your web pages, why microdata and metadata are critical, and what content creators must do to ensure content is consumable for both human and machine audiences.

    Tom Cranstoun: Humans are looking for pictures, they’re looking for text, and they can infer. You may think, “Well, we’ve already added information on the page,” but by putting it in as microdata, it doesn’t appear on the page for the humans. It appears on the page for the machine. I think that that’s a critical distinction. We are trying to design for both. We don’t want to overload a human with information, but we do want to give the machine as much information as it can take.

    Related links:

    The Gathering

    Digital Domain Technologies

    MX books

    The Scriptorium Content Ops manifesto

    LinkedIn:

    Host: Sarah O’Keefe

    Guest: Tom Cranstoun

    Transcript:

    Disclaimer: This is a machine-generated transcript with edits.

    Introduction with ambient background music

    Christine Cuellar: From Scriptorium, this is Content Operations, a show that delivers industry-leading insights for global organizations.

    Bill Swallow: In the end, you have a unified experience so that people aren’t relearning how to engage with your content in every context you produce it.

    Sarah O’Keefe: Change is perceived as being risky; you have to convince me that making the change is less risky than not making the change.

    Alan Pringle: And at some point, you are going to have tools, technology, and processes that no longer support your needs, so if you think about that ahead of time, you’re going to be much better off.

    End of introduction

    Sarah O’Keefe: Hey, everyone. I’m Sarah O’Keefe. Today, our guest is Tom Cranstoun, who is founder of a machine experience, or MX community, called The Gathering. He has a couple of books on MX and is currently a consultant operating as Digital Domain Technologies. Tom, after 53 years in the business, some experience with AEM at very, very large companies, including a huge project at Nissan, has turned his attention to the question of how machines, which is to say AI agents, interoperate with the current public-facing web. And so today, Tom, I’m delighted to have you on to talk with you about machine experience, or MX, and what this all means as we move forward in this brave new AI world. So welcome.

    Tom Cranstoun: Thank you, Sarah. I’m very pleased to be with you today.

    SO: I am delighted to have you. So I guess we’ll start with the extreme basics here, which is what is machine experience, or MX?

    TC: Yeah. MX, well, to my definition, machine experience is like user experience, but it’s for machines. Machines cannot ask a friend for help if something goes wrong when they’re browsing a website. They can’t turn to a partner and say, “What do you think this means?” They can’t retry a failing form input because they will just go through the same mechanical patterns to try and carry on throughout the web journeys. Therefore, machine experience is thinking about what elements one must put on a webpage to help a machine understand and action the final goal of the webpage, whether that be a CTA that lets you purchase something, or an information document that lets you know about a government policy, or a charity good, whatever the author of the page is trying to get across to the audience.

    SO: And so at a high level, what does it look like to build out machine experience? What are some examples of things that you need to put onto a webpage to accommodate the machine that’s reading it?

    TC: Well, the very first level is the disabilities angle, things like the Americans with Disabilities Act, that kind of WCAG, W-C-A-G, the accessibility work. The more accessibility information is on the page, the more the machine can understand the background of the page. So machine experience and accessibility are pretty much at the top level, the same sort of thing. If you put in JSON-LD, microdata, and you enrich your pages with the things that Americans with Disabilities Act would like, you’re actually helping a machine understand the page. So that is the top-level constraint. When you go below that level, you need to give the machine lots of information about your product, not just the thing that a human wants when it’s glancing at the page now, and as you go through the journeys, things will be added on. Humans can only take in two or three items at a time, so we design pages to reveal what is happening. You go to a catalog, to a product, to a variation, to a purchase, four different steps. Each step introduces different pricing and concepts. It’s best to feed the machine on the page that the machine lands on with all of the information that it needs. This may not necessarily be surfaced to the human reading the page, but it’s there for the machine. This helps the machine when it arrives at your webpage.

    SO: So I’m really enjoying this concept that a properly organized page with proper accessibility WCAG or ADA compliance and support then results in the machine being better able to parse the page for essentially the same reason, right? It’s properly structured, it’s predictable. The things that are labeled are labeled correctly. I don’t know that we should be driving accessibility in order to enable AI, but on the other hand, if it gets us more accessible pages, then let’s certainly do that. Can you give some examples of what happens when pages are not machine-compatible? What are the kinds of problems that people run… Or not people. What are the kinds of problems that the AIs run into when they try to parse a page that has not been labeled properly or encoded properly?

    TC: Yeah, I collect these examples from real life. Whenever I use the web as a normal person, I say, “Well, how would a machine interpret this?” Recently, I was looking for a holiday, and I asked an LLM to give me a list of five companies that offer cruises up the Mekong Delta. The machine came back with one offer at $200,000 for a week’s holiday, and the rest of them were $2,000 for a week’s holiday. What had happened there was that the machine had found a European website. Now, the Europeans changed the comma and the dot in monetary labels differently from what the Anglo-Americans do. We use a comma separator between thousands and a full stop between fractions. The Europeans actually put the full stop as the thousand separator and a comma between the fractions. This meant that when the LLM built a table of prices for holidays, it didn’t understand the distinction, and it tripped up. The agent hadn’t been instructed to compare prices and make sure that they were all within the same range and were reasonable. It just produced them as a matter of a fact. “Here’s a holiday for you. One of them is $200,000. The rest of them are 2,000.” There was no knowledge, no information that could tell the agents what was happening. If those pages had been decorated with currency and they had microdata with the… microdata always says that you should use commas as a separator and full stops as the fractional separator. If these things had been in the page, the machine wouldn’t have flipped up. Now, a human could have read a page and seen the locale values shown on the page, and both people would be able to understand what was going on. So that’s a typical trip-up from an undecorated page.

    SO: And so essentially, the presentational component that says, because I’m serving this page to somebody in, for example, Germany, they are expecting a comma separator between the full Euro amount and the cents, the Euro cents. But that comma is essentially formatting, as opposed to data, and so here we are.

    TC: Yes, correct. And the microdata has got the thing in a proper machine-readable way. The other things that we always get problems with in the world are English and American date formats. We swap the month and year around when doing short form. The machine-readable version uses ISO dates, and ISO dates put in as a microdata tells the machine categorically. It doesn’t matter what the locale is, this is the date and time.

    SO: Yeah. And so as the expression of the date, whether April 1st is 1-4 or 4-1 is essentially a formatting problem.

    TC: Correct. And these are not visibility problems. These are machine experience problems. So it’s layering up. You start with fixing the disability by doing machine experience, and then you fix the locality and the community values, the human factors, display factors.

    SO: And so I think we’re all familiar with the concept of a customer journey, but you’re now talking about a machine or an MX journey. What does that look like? I mean, how is the machine processing of a website? How do you explore that journey and what it looks like?

    TC: The machines will not discover your website, come in through your landing page, and then look for offers or products. A machine will have an idea of where it wants to go and will land straight in at a page. It will arrive five pages into your journey, and read the webpage as it is. The owner of the website has lost all of the signals about what the dwell time was on each page, how’s the reader arrived at the end location. Did they go sideways and look at other things? Those things don’t happen with machines. They go straight in, see if they can get what they can. If they can get what they can, they will action it. If they can’t, they will move on, and go to another page or another person’s website and do exactly the same to them.

    So when a machine arrives at your webpage, it will not be giving you any referral details. It will not tell you what the journey it is, and it won’t tell you what else it’s interested in. You’ll just get a cold caller who will arrive and disappear. I call them invisible users. They’re invisible to your analytics, they’re invisible to your tracking, and they’re invisible to your future. You cannot tickle them and say, “Hey, you left something in the basket.” You cannot use those parts of the journey. A machine comes in and goes, gets what it wants or it doesn’t. So you must give it, front load it as much information as possible on any and every page that a machine may land on. 

    SO: So then coming at this from the perspective of structured content people, because a lot of what you’re talking about, I mean, is web experience, like how does what we view as the end state result of the content that we’re creating. So if I have an enormous DITA CCMS full of stuff and then I output it to some semblance of a website, your focus is on what needs to be on that website so that it is describing itself in such a way that the machine, that an AI or a crawler can go in there and pick up what it needs to and process it accurately and not offer you a vacation for $200,000. I assume you did not pick that one. So what are the opportunities? When you look at MX and then also DITA as a backend, what kinds of opportunities do you see there to map those things across and take advantage of some of the structure that perhaps is already in the XML and/or structured content systems?

    TC: Yeah, I see the backend is full of good content operation stuff. Everybody has got details about pricing and dates and frequency, and there’s lots of backend information, which often doesn’t make it into the front end for people. Humans are looking for pictures, and they’re looking for text, and they can infer. They can infer if two prices are on a page and it says, “Was $200, Now $180.” A human understands that. A machine, well, depends on the quality of the machine, whether it can read and infer those things. So the backend information has to be made more visible and in a redundant manner. You may think, well, we’ve done this on the page before. We’re doing this on the page after. But by putting it in as microdata, it doesn’t appear on the page for the humans, but it appears on the page for the machine. And I think that that’s a critical distinction. We are trying to design for both. We don’t want to overload a human with information, but we do want to give the machine as much information as it can take. We don’t necessarily have to surface all of the information within the page, but we have to carry it with the page. So a page taken in isolation contains the entire story, not just the fraction that a human is looking at, which does mean that a lot more pushing off the backend from data to the front end. And some people will think that’s a waste of time, but I don’t think so. I think giving that extra material to the machine is what makes the journey successful for the machine.

    SO: I’ve been to a lot of conferences in the past couple of weeks, and the conversation around what is needed for successful LLM processing, or crawling, or ingestion, or agents for that matter and what is already provided in a metadata-rich structured content system is sort of, well, we have all of this. Now, what do we do with it, and where do we put it, and how do we make sure that this all works? So it seems like this discussion around machine experience is going to help to maybe close that gap and connect the pieces such that we can do this successfully.

    And so as we move into this, I know that you have some material out there, but also a community. Can you talk a little bit about the MX community, and what you’re looking for there, and what it’s called? We will put all of the links in the show notes. But what does it look like to participate in that community, and what sort of participants are you looking for?

    TC: Yeah, we are looking for content creators. We are looking for business owners. We are looking for technical writers. It’s called The Gathering, gathering being a Scottish term for the gathering of the clans. We all get together to do something that’s good for the combined grouping. And then after we’ve created whatever we’re going to create, we go away and do our own things. Now The Gathering is tg.community. That’s https//tg.community. We are building a set of community-led standards to try and make it easier for machines to understand documents. The Gathering is not just interested in HTML. We’re talking about documents of all types, and we’re talking about keeping the metadata that you have in the backend of the content creation systems, whether that be data or other content creation systems, and passing it through into the end documents.

    You have metadata in PowerPoint slides. You have metadata in Word documents. You have metadata in JPEGs. These, too, deserve the machine experience. If you can tell the machine details about an image inside a JPEG, then the machine doesn’t have to try and scan and interpret the image to find out what it is. It makes things so much better. And The Gathering is a community that is trying to build these as open community-led standards. One of the first things that I am proposing for the community, which was just launched on the 2nd of April, 2026, by the way, it’s very young, and we hope to build at the speed of LLMs. We need to work fast.

    The key point and the key thing that helps LLMs understand your website, there’s a thing called llms.txt, which people don’t really understand and machines don’t really use. It’s a standard for describing your website in a way that a machine can help to understand, know what’s going on without reading your site map. It is not used by the machines because, one, it’s not served as HTML, and, two, it’s not in your site map. Therefore, the crawlers that build your training material do not pick it up and do not ingest it. I am suggesting, and I have it in my books, I talk about this, if you wrap the llms.txt in HTML and serve it as HTML and put it in your site map, then you will get a better response from the training stage and from the inference stage. So you are seeding the machines with the information about your website, something that is currently missing from the world, and that’s step one. There are five steps that you’ve got to go through before you can do a successful e-commerce position. And that is feed the machine, get noticed, be descriptive, be MX-aware and be citable, and MX lets all of those things happen.

    SO: Perfect. Well, Tom, I know that there’s a lot to discuss here, and we could go on for a very long time, but I hope this gives people a little bit of an introduction to this idea and an opportunity, if they’re interested to reach out to you and to the community that you have. And there’s also a book or three. Any closing thoughts that you want to pass on before we close this out?

    TC: My personal opinion is that I think that we should treat the machines as first-class citizens and not block them from our content and to create content that works for them. The more that we do for them, the more they will do for us. And if we start treating them as an afterthought, it’s not going to be such a good web as we could build.

    SO: Okay. Well, thank you so much. I’m glad we had an opportunity to talk.

    And we will, again, put the links to the various resources that Tom mentioned, including the community. There’s some RFC, some standards drafts and a manifesto and a book. We will put all of that in the show notes. So Tom, thank you again for being here, and I look forward to hearing more on this effort.

    TC: Thank you very much, Sarah.

    Conclusion with ambient background music

    CC: Thank you for listening to Content Operations by Scriptorium. For more information, visit Scriptorium.com or check the show notes for relevant links.

    Want to learn more? Download our book, Content Transformation.

    The post Machine experience (MX): Making content work for humans and machines appeared first on Scriptorium.
  • Content Operations

    Make the move successful: Replatforming content ops

    27/04/2026 | 22 mins.
    Replatforming your content operations isn’t just about swapping systems. In this episode, Alan Pringle and Bill Swallow share what organizations must consider to successfully replatform. From navigating technical debt, system integration, and the people caught in the middle, they discuss change management, technical debt, and why your exit strategy should be part of the plan from day one.

    Software isn’t forever. Systems come, systems go, they get improved. Your requirements are ever changing with the content that you need to manage. Not thinking about your next jump is really to your detriment.

    — Bill Swallow

    Related links:

    Replatforming structured content

    Your tech expertise + our CCMS knowledge = replatforming success (case study)

    Cutting technical debt with replatforming (podcast)

    Replatforming with localization in mind

    LinkedIn:

    Alan Pringle

    Bill Swallow

    Transcript:

    Disclaimer: This is a machine-generated transcript with edits.

    Introduction with ambient background music

    Christine Cuellar: From Scriptorium, this is Content Operations, a show that delivers industry-leading insights for global organizations.

    Bill Swallow: In the end, you have a unified experience so that people aren’t relearning how to engage with your content in every context you produce it.

    Sarah O’Keefe: Change is perceived as being risky; you have to convince me that making the change is less risky than not making the change.

    Alan Pringle: And at some point, you are going to have tools, technology, and processes that no longer support your needs, so if you think about that ahead of time, you’re going to be much better off.

    End of introduction

    Alan Pringle: Hey everybody, I am Alan Pringle, and today I want to talk with Bill Swallow about content operations and replatforming. Hey Bill, how are you? 

    Bill Swallow: Good, how are you doing?

    AP: Good. So I guess we should start this by saying the reason why we want to talk about replatforming is really we have done a few replatforming projects. We’ve had some prospects reach out who are interested in doing it. So I guess we need to explain what it is and some of the things you have to think about when you’re going through the process. So if you would not mind, would you define what we mean by replatforming content operations?

    BS: Sure. So generally when I talk about replatforming, it’s in the context of a company having one system in place and maybe it’s time has come and they need to move into a new one. So it’s the entire process of determining what type of system you’re going to need, what your requirements are for that and being able to lift everything up from the old system that you want to carry forward and put it in the new system, configuring it and what have you to get it to work going forward.

    AP: So we’re not talking about using a whole new technology or a whole new platform. It’s shifting to a similar platform for some of the reasons that you just mentioned. And I think that’s another thing. There are several reasons why a company might want to do this. And I know our clients have had various reasons for doing this. Let’s focus on those for a little bit. One of them, I know you kind of already touched on this. Sometimes you just outgrow a system. It just… that’s how it is. So let’s start with that kind of, it’s not sustainable anymore because you’re bigger, too big now for what that system can do.

    BS: Sure, either you’ve outgrown it or it’s approaching end-of-life or it’s just not meeting the needs that you had five or ten years ago when you bought the system. So there are a lot of different factors there, but basically it comes down to what are your requirements and is it meeting your requirements? 

    AP: Right.

    BS: Are you able to get the things done that you need to do given the fact that, you know, the world is quite different now than it was five or 10 years ago.

    AP: Exactly, and there’s another angle here too that I think we need to briefly mention is that sometimes you’re gonna have two sets of requirements because two companies can merge or there can be an acquisition and then all of a sudden you’ve got two content operations platforms that are pretty much doing the same exact thing and I guarantee you the IT department is not gonna have that. 

    BS: Absolutely.

    AP: So there could be a situation where you’ve got two, and one of those is going to go away. And in some cases, and we should talk about this too, it’s not necessarily about picking one. It’s not uncommon to go to a whole other one. So there is quote, “No loser.” That’s also an option.

    BS: That’s very common because usually in the case of a merger, you have two established groups with two established systems that may be starting to age out on both sides. And it doesn’t make sense to spend the time and the effort to move one group into the other system when that system is probably going to be replaced in a few years anyway.

    AP: Yep. So basically, the circumstances of the merger have provided a perfect opportunity to do something that’s painful. Replatforming is not magical. There’s still going to be technical hiccups and everything else. But at least it’s not as painful because you’re both moving out of systems that maybe aren’t optimal into something that will basically treat your content creators and all the people managing content much better because it’s going to support their needs better.

    BS: Or at least everyone goes through the same pain together of learning a new system. It’s team building.

    AP: And so you bond through shared pain experiences. Exactly. All right. Yeah. huh. We’ll go with that. We will go with that. There’s some other aspects here too that are kind of related to that. And that are the idea that things, because things are getting maybe rickety, that things are getting too expensive to maintain. You keep making these little tweaks and changes in things that become less and less repeatable. And that adds up. That’s time and money that you have invested.

    BS: It certainly does. Aside from the hard costs of licensing and just the general time to use the system, produce things, you do have, after you’ve used the system for so long, you’ve got your workarounds built in and they may not be a best practice and the workaround may solve the problem on its face, but you’re doing a lot of things that you really shouldn’t be doing with that, you know, with that workaround in place. You really should be doing something that’s a little bit more streamlined. And, you know, as you’re bringing new people in and new groups in, whether it’s a merger or whether it’s just another department that realizes that, Hey, you know, they have this, you know, shiny system over here. Why don’t we start using it too? If you have workarounds in place, it takes a lot longer to get people up and running in a new system because not only do they have to learn the system, but they have to learn how you’ve worked around it.

    AP: Right, and that’s where you start talking about technical debt because all of those workarounds that you’re describing, that equates to technical debt. And one day, you’re gonna get your backside handed to you because you have all of this technical debt. And replatforming is the perfect time to press that reset button and say, we’re gonna get rid of those things. We’re gonna have a system that addresses those problems in an official, correct way, and none of these weird workarounds. 

    BS: Mm-hmm.

    AP: And by the way, those workarounds, what if the person who did them leaves and hasn’t been documented well? 

    BS: Yeah, that’s a big problem. My guess is that if you’ve been using one particular system for, you know, 10 years or even more, you probably have a lot of content just sitting there that hasn’t been touched in years, hasn’t been needed in years, but it’s still sitting in the system.

    And it’s still coming up in search results as people looking to, you know, find topics that they need to edit for a new release. And it’s just getting in the way. It’s a good time to, you know, cut clean and, you know, ditch all of that old content that you no longer, that you know, you no longer need and focus on, you know, what you need to produce going forward.

    AP: Yeah, I think maybe sometime on the show Hoarders, they can do episodes on content people and their technical debt, and basically just hoarding all of this content in various digital forms all over that nobody’s actually looked at. I’m sure we would all laugh and be horrified at the same time by such a show. 

    BS: It wouldn’t make for exciting TV, though.

    AP: Yeah. So let’s talk about the overall process for how this kind of works. We’ve talked about what it is, a lot of the reasons for it. So let’s talk about how to do it. And it’s not a one-size-fits-all thing. We can tell you about our experiences that we’ve had. But I know one of the first things you got to do, for example, is choose your new system where you’re going to be moving into.

    BS: Right. And out of the gate, the knee-jerk reaction is to go with something new and shiny, but you really need to sit back and figure out what it is you need that system to be able to do and how you need that system to be able to do it. you know, we’ve had a lot of clients who’ve come to us after setting up a system, maybe two or three years prior, who just are like, this is just not working for us. And as we talk with them, we realized that, you know, they, essentially, you know, had a square peg and they bought a round hole to put it in. And here they are three years later, still trying to force that peg into the hole. So you need to sit back and really think about your requirements and not the requirements that you have today, but the requirements that you have today and anticipate having at least five years down the road.

    You have to leave yourself open because otherwise your opportunity for growth in that system is limited by your choice, and I hate and we always say, know choose tools last. Same thing goes for the systems. The reason why we’re talking about it up front is that you do have an existing system. You do at least need to identify some candidate systems that you’re going to be moving into and have clear requirements for those systems and why you want to look at them further before deciding on the one you’re going to implement.

    AP: And those requirements can help you identify the differentiators, the things that make one system a better fit for your needs. And the more fine-tuned and discrete your requirements are, the easier time you’re going to have finding that match for the new platform that you need to address all of those requirements. 

    BS: Mm-hmm.

    AP: Another part of this is moving the content from the old system to the new. So let’s talk about content migration, because that, a lot of times, I think people underestimate what that can take, even when you’re talking about basically two very similar technology stacks.

    BS: That’s the easy part. Content is content, Alan. It’s just all just words. It’s fine. You can move it. No problem. Yeah, I think this is the most overlooked piece of all of it. Even if you’re moving from one system that uses the same format for the content under the hood to another system, you’re still going to have to make changes. 

    AP: We can take this outside later. Yeah. Yeah. Yeah.

    BS: The old system maybe had a couple bells or whistles that handled things a very specific way. And the new system has a couple of other ones and they don’t match. So you’re going to have to find a way of mapping from, you know, content type A1 to content type A2. Even though they’re both content type A, you still have these little differences that you need to map out.

    AP: Right, because what may have been best practice in your existing system may have required a custom thing that that system does that system B does not do. So you’ve got to find the equivalent of what that custom thing is. We’ve run up against that quite a few times and it’s not that uncommon, but you’re right. It’s rarely a one-to-one situation, unfortunately, even if the underlying foundation or structured standard you’re using data in particular is the same thing. So yeah.

    BS: Mm-hmm. Yeah, DITA in particular is interesting because you would think that all systems would play with a documentation standard in the same way because it is a standard. It’s not the case. There are different efficiencies that the systems bring that come with some modification. And it’s not to the standard itself, but how it interacts with it. And it may do things like replace linking with, you know, linking via file name with, you know, linking with a UID, a unique identifier. And that unique identifier is going to make perfect sense in that system that you have now, but it’s going to make absolutely no sense once you move it to another system. So you have to find some way of converting it over. 

    AP: Exactly.

    BS: That being said, that’s the best case scenario that you have two systems that use the same underlying content technology, and you just need to map a few things differently. There are other cases where you have a completely different approach to content from system A to system B. One might use XHTML or might use something else, might use RTF, who knows? And then you move to another system that uses XML or uses Markdown or what have you. But that is a bigger lift and shift where you suddenly have to remap and convert everything to a new format.

    AP: And that’s really the distinction between moving to a whole new system and replatforming. What you just described there is really going to a whole new tool environment, a new process. Whereas what we’re talking about more is where you’re basically using a tool in the same area or a competitor of the tool you’ve got now. 

    BS: Mm-hmm.

    AP: And it’s just moving things over and fixing those little custom things that aren’t going to work in your new system. So yeah, there are all these levels here. And I think one thing we really need to communicate here, even when you’re replatforming from one tool that’s very similar to the new one you need, there is still work to be done there. It’s rarely just a very clean cut, lift and shift. And a good example of that is the publishing pipelines, because tools in this area have slightly different ways for publishing and getting your content out into the world.

    BS: They do. And even if you’re using the same, you know, the same publishing pipelines that you’re able to somehow lift them up from one and drop them in another, because of the changes in how the system handles the source content, you’re still going to make, need to make modifications to those publishing pipelines later because, you know, like my example with links, because they’re going to work differently in another system. You need to tweak the output generators to handle those links appropriately.

    AP: And another example of that is when your content system integrates with other systems, the way that, for example, your content system integrates with a workflow management system, it may be different with the other system, or your product lifecycle software, that can also have to be hooked up differently. Or who knows, maybe you’re changing it all together. So you also, beyond just looking at the publishing pipelines, look at how other systems are integrated in with your content development system.

    BS: Right. It’s not a matter of just, you know, unplugging all the wires and plugging them back into the new box that you bought. I mean, it’s very different in some cases, you know, one may have a built-in API, another system, you know, it might have no handling, and you have to build an API to now, you know, talk to whatever your portal, your workflow management, your digital asset management system, what have you. It’s usually never clean cut. You can never just unplug those wires and plug them back in. And yes, there are no wires involved usually.

    AP: Yeah, well, and by extension of that, like you can’t just, you know, unplug and replug, you also have to think about people and how they have used that system to create and manage the content. You’ve got to kind of help them understand the differences and basically help them remap and reprogram their brains to understand, okay, you did it this way in system A, you’re going to have to do it. This way in system B, it’s a little bit different. So you still have the training and change management requirements. Again, it is not lift and shift, and that goes for people’s brains. It’s not gonna work like that. It’s just not.

    BS: Mm-hmm. No, no. And another thing that a lot of people tend to gloss over is the amount of testing that’s required once you get the system stood up. You have to make sure that all the content is valid in the new system, that it’s running and behaving properly, that you’re able to publish outputs, find where things might be dropping out as you publish content and fix those. So it’s never going to be a very straightforward project.

    AP: Yes. I agree, and I think this is a good time to like offer up some closing tips on things to think about, and I know one of them is this is going to take longer than you think.

    BS: Yeah, yeah. I’ve, I’ve warned people to budget at least six months, and that doesn’t mean about six months. That means at least six months and expect it to take longer. Even if it’s a, even if it’s as close to a lift and shift as you can get, it’s going to take time. and some of the reasons for that are not only is the system going to be different and you have to stress test it and make sure that it’s, it’s going to work in a live, you know, working environment, but remember that you also have competing demands at work as well. So that you can’t have your entire team just stop what they’re doing for six months or even pull it into three months and say, we’re going to stop everything, not do any production work at all. And we’re going to focus on just standing up this new system. You really can’t do that.

    AP: That never happens because there is no such vacuum on this planet in the business world, right?

    BS: No, you can’t stop. You cannot stop the production machine. You need to keep going. Aside from all of your daily job requirements, you now have the additional requirement of setting up a system. trying to shortchange yourself with a short timeline is not going to… First of all, it’s unrealistic. Second of all, it’s not going to gain you anything. If anything, you’re going to implement things incorrectly, you’re going to start out of the gate with workarounds in the new system and it just, ends poorly.

    AP: And sometimes you’re going to have to keep that legacy system running. You’re going to have parallel systems because it’s a CYA is what that is. Just in case something goes sideways with the new one, you still have your old process and can use it to deliver content that has got to hit some particular deadline, for example.

    BS: Right. You’ve got to keep things moving. You can’t just stop work. But, you know, all that being said, the number one thing you need to do when you’re thinking about any type of a shift in technology like that is to take advantage of the changes that you’re going to be making. You know, if there are, you know, if there is content that you don’t think you’re going to need going forward, move it to the side. You might be able to move it in later. You know, don’t have to necessarily delete it, but don’t bring it into the system unless you know you’re going to need it. It’s a great time to do some spring cleaning on your existing content database. Move in the stuff that you know you absolutely are going to use, and then slowly start bringing in other stuff or not, if you end up not needing it.

    AP: And then do a hoarder intervention, because you may need it. And that kind of brings up one of the last points I want to talk about. Having an outside perspective, yes, like us, come in and help you kind of think through this, that can also be helpful. And really, I think the last point I want to make is, on the edges of this discussion, is really, you always have to have an exit strategy, even when you’re going into a new tool. It really will benefit you to do something that seems so counterintuitive and to think about, what are we going to do if this tool goes away while you’re implementing the new tool? Because the fact you’re doing a replatforming already tells you that exiting is a reality, and sometimes you’ve got to do it for all the reasons we just outlined.

    BS: Mm-hmm.

    AP: So always be thinking about how are we gonna get out of here if we have to? That’s something that a lot of people in the heat of trying to get something new stood up, they really don’t think about.

    BS: Mm-hmm. Yeah, software isn’t forever. Systems come, systems go, they get improved. And your requirements are ever changing with the content that you need to manage. So not thinking about your next jump is really to your detriment.

    AP: And on that most excellent note, I will thank you, Bill, and we will end this here. Thanks.

    BS: Thanks.

    Conclusion with ambient background music

    CC: Thank you for listening to Content Operations by Scriptorium. For more information, visit Scriptorium.com or check the show notes for relevant links.

    Want to learn more about replatforming? Download our book, Content Transformation!



    The post Make the move successful: Replatforming content ops appeared first on Scriptorium.
More Business podcasts
About Content Operations
The Content Operations podcast from Scriptorium delivers industry-leading insights for scalable, global, AI-optimized content.
Podcast website

Listen to Content Operations, My First Million and many other podcasts from around the world with the radio.net app

Get the free radio.net app

  • Stations and podcasts to bookmark
  • Stream via Wi-Fi or Bluetooth
  • Supports Carplay & Android Auto
  • Many other app features