Large Language Models (LLMs) cure or curse for OSINT?
Speaker 1: Welcome to the World of Intelligence, a podcast for you to discover the latest analysis of global military and security trends within the open source defense intelligence community. Now onto the episode with your host, Harry Kemsley.
Harry Kemsley: Hello, and welcome to this edition of World of Intelligence here at Janes, your host Harry Kemsley. And as usual, by this time in the room, Sean Corbett. Hello, Sean.
Sean Corbett: Hi, Harry.
Harry Kemsley: Sean, we have spoken a number of times in this podcast about the power and utility of AI. It's come up a great deal in conversation. And indeed, recently, we've even talked about large language models. I'm sure we need to find somebody who can help us understand what large language models are. And to that event, I've been able to secure some time with one of our colleagues here at Janes, Harry Lawson. Hello Harry.
Harry Lawson: Hello.
Harry Kemsley: Harry, thanks for joining us. Now, the reason I've asked Harry to join us, Sean, is that one of the things he's been doing in Janes within the, so- called, red team is looking very, very closely at large language models. So in a minute, Harry, I'll get you just to help us understand what a large language model is. And one of the things you've been doing is looking at how large language models could be used, the pros and cons of use in and around an intelligence tradecraft. How can we build it into best practice? How can we use it in the process? And Sean, we've talked about tradecraft almost every podcast in the last two and a half, three years. So I think including how we use modern technology, and capabilities like large language models makes sense, right?
Sean Corbett: Yeah, it does. This is such a topical conversation, because there are two schools of thoughts right now. The AI guru would say, AI is the future, it's the best thing since sliced bread, and we just need to rely on that. Nothing else matters. And then there's the skeptics. They go, oh no, AI this and that. And the truth as always, is somewhere in the middle. But this is a really important one, because it gets under the hood or the bonnet, depending on where you live, in terms of what you can actually use it for, what the limitations, and where it's going next. So this is a great one.
Harry Kemsley: Yeah. That whole topic, Harry, of large language models inaudible. I'll get you to talk about what that is in a moment. But I think one of the biggest concerns that we've generally run into, when we talk about tradecraft, is the fact that the black box, that is the analytical engine of this artificial intelligence, that's the bit that people in part struggle with most. Is that, I can't get to the how it did it, to give me the answer, is one of those areas that we can look at. So just to get us started, Harry, thank you again for taking the time out of your very busy day to talk us. What is a large language model?
Harry Lawson: Well, first of all, thank you very much for having me. So a large language model is essentially, it's an element of the whole artificial intelligence scope. So what the large language model does is, it reads text that a human has created, it searches for information within that text, and then it tries to piece together an answer based on that text, based on its knowledge of how a human would interact with another human. So in the case of large language models that we know, that the public can see, these are things like ChatGPT, those types of functions. So you've inputted some text, it might be the question, a piece of research. You've inputted that, and it goes off and searches all of the internet. Now being ChatGPT, that is something we call commercial off the shelf LLM or AI, quite interchangeable in those words. And that searches everything on the internet. So there's no bars, there's no parameters on it. Searches everything on the internet and tries to piece together the information that it thinks you are looking for in the text that you've given it. What it will then do is, piece together the information it's found in a way that it thinks a human would talk to another human. So from the user's point of view, you've just typed a question into the search bar, and in 10 seconds it's come back with a response for you.
Harry Kemsley: And that human aspect, we'll probably come back to you later in the discussion, but the ability to have in quotes" a conversation." I've certainly had plenty of conversations that are apparently going on with large language models. In fact, Sean, I think I've sat next to you in an airport whilst you were arguing with the last... That this couldn't do what you wanted it to. Again, a matter for later perhaps. So how do we begin to look at these capabilities, large language models? And how they might, or might not be helpful to an intelligence tradecraft, which is, at the end of the day, something we have spent a regular time talking about. So we'd want to understand, how do we begin to do that?
Harry Lawson: Yeah, well just to add some context to what I've been doing in terms of exploring the use of LLMs. What I've been doing on the red team is essentially trying to answer the questions our customers ask us as Janes. So loads of metrics involved in saying, how easy was it to answer this question, this sub question, all this kind of... All the things that our customers are looking for. I then compare that against outside content. So anything that I can get via inaudible that I don't have to pay for, and incorporating, well, what can I get via asking an LLM? And for most of the time in red teaming, I'm looking at, as mentioned before, Binge ChatGPT just because that is the most openly available one, I don't have to pay for it. I can get into it quite quickly. So I've been looking at it at the point of view of, well, how useful is this in OSINT? How useful is this in helping a customer who might be an intelligence customer, someone trying to create an intelligence report, how simple is it for them to use, and how usable and what answers they're getting. So that's where I started off from, just kind of, yes, we're doing these comparisons, but then I decided to do a bit more of a deeper dive into, well, where is it getting these sources from? How's it interacting? How's the LLM producing the answers given me? And one way I could've decided to look at this, and do a bit of testing was to compare what I'm getting from the LLM with some already established intelligence tradecraft. So I've been looking at the US ICD 203, and essentially the tradecraft will be put down into almost a tick box exercise. So you've got different sections that you tick off to say, yes, this is good, or no, this is bad.
Harry Kemsley: And the ICD, sorry to interrupt you, but the ICD is essentially the manual, the guide on how it should be done, right?
Sean Corbett: Yeah, it's basically an Intelligence Community Directive, which is what it stands for, basically to lead the analyst by the hand in terms of analytical standards to follow, to make sure that you are being objective, and complete and all those other good things.
Harry Kemsley: Got it. Okay. Thanks Sean.
Harry Lawson: Yeah, and that standard involves essentially just a list of elements that, as I mentioned, you're trying to tick off. So it's things like, is it objective? Is this information objective? Is it timely? Is it aware of its source quality? All those types of things. And basically try and measure those tick boxes against the returns I'm getting from LLMs.
Harry Kemsley: Okay. So a direct comparison then between a tradecraft that's well established, an LLM, and then other sources you can probably reach out, and get from the open source environment. That's essentially the comparison you would do?
Harry Lawson: Yes.
Harry Kemsley: Great. Okay. So how did that work?
Harry Lawson: So essentially what I did is, I took some questions I'd already run via the red team. So I have two big essay type questions. They were already broken down into central element information. Sub- questions. So I was running those sub- questions. I think within that I had 43 sub- questions from those two big sprints. And those were concerning some questions that had performed really well for Janes, when I had done the red team exercise on them. So those questions were, how do the main battle tanks provided to Ukraine compare to those fielded by Russia? Very Janes strongly suited questions, talking about tanks, we love tanks. And then the second question, and it's associated sub- questions, were what is the ballistic missile inventory of North Korea? And where are they deployed from? So they're very Janes- centric questions. Yes. And from that I got 43 sub- questions that I went and essentially tried asking the LLM to answer for me. See what kind of results it gave me. And from that, I was able to come away with six areas that may be roadblocks for anyone that wants to use an LLM in OSINT. So those six are control, how much you control we have over the use of the LLM. The uniformity of the answers, so are they coming back looking the same? Are they different replication issues? So if I want to capture this information as part of my intelligence assessment, can I go back and do it again? Will I get the same answers? The manipulation of the sources involved, which will then manipulate the answer of the LLM. Timeliness of that information, and then the ethics around the questions and answers.
Harry Kemsley: Sean, I've got to comment on that. Because I think I've just heard part of your almost every episode mantra about things like replication, for example, and manipulation and so on perhaps become... Do mean in terms of manipulation, mis and disinformation being deliberately fed in?
Harry Lawson: Yes.
Harry Kemsley: Again, another topic, Sean, that we've talked a huge amount about. So those six factors scream the tradecraft relevant.
Sean Corbett: Absolutely. It is all effective, tradecraft.... What the really interesting thing for me is that, there's an assumption out there that, once you go into LLMs, there's a black box inaudible for you and you have to trust it or not. And of course it's all about the trust, but if you go back into the ICD 203, you have to show you're working out. So what the really interesting thing for me is that, we are able to show how the LLM does its working out, which is absolutely critical. Sourcing in particular is a really inaudible will come onto to it. The question then becomes, at what scale can you do that? Because I don't know, I might be stealing your thunder, but if you are running scripts on LLMs, you might have a million sources. How do you verify all those? I don't know, that's just a question I'm putting out there.
Harry Kemsley: Well, let's dig into Harry. So you mentioned control. Now I've heard phrases like, prompt engineering. Knowing how to prompt the LLM is a skill you can learn to get it to a place where it gives you, quote, " good answers," whatever that means. So what have you learned then from your studies through this analysis from a control perspective?
Harry Lawson: So as mentioned before, I was just using the Bing ChatGPT, and I had to take it verbatim, just as open as possible, open- ended questions to see what I could get from it. So first thing I did was looked at all the sources, those 43 questions. Well what sources the LLM used to answer those questions? I got 137 different sources, and they were broken down into different levels, and different types. And I think most interesting aspect was the source type. So I only got 3% of government type sources from the LLM. So that's normally where I would go as an analyst, would be the government area. This is the official statement, as close to a primary source as I can get. 12% were analytical pieces. Again, as an analyst I'll probably go looking at them a little bit more often to get a bit more insight. And then the big numbers, 39% were reference. And this is stuff like Wikipedia. It loves dragging out some Wikipedia sources. It's a great one- stop shop if you just need to do a bit of jumping off. In research, I think, everyone's taught that at university, it's great to be your jumping off point, but do not quote it, because the sources are within that quotes as well. And you can go find them individually. But the biggest percentage was news, and that was 41%. Those 137 sources were news. Now that draws up its own bigger issue. So I had another deep dive into those sources to see what was going on. So news was referenced as a source 56 times. Of that 56, only six were defense specific news websites or sources. So this is, things like, the drive of its raw element, forces. net, things like that. The other 50 were just your general news. BBC news, Daily Mail, Moscow Times, NPC. The problem with them, they are very reputable sources, they do some direct reporting. The only problem is that their defense journalists have to be very general in what they're talking about. They're not going to give you specific bits of information, they're not going to give you specifications, the difference between an A1 and an A2 variant of attack.
Harry Kemsley: Yeah.
Sean Corbett: So if I could just ask a question in terms of the timeliness of those sources, the news in particular, because you've heard me say this a million times, the first report is generally wrong, or certainly not complete. And particularly with the news cycle the way it is right now, you get a report, it doesn't really matter if it's right or wrong, because you moved onto the next thing. So were you able to drill into the timeliness of those sources in terms of, this has only just come out, and therefore there may be some waiting. That might be a level of order that you couldn't get down into.
Harry Lawson: No, I couldn't get down to that level with the news, but I certainly was able to do that with the reference sections. I'll come onto that when I get to my timeliness section. But just going back to news quickly, another thing that you have to think about with all news reporting is, there's going to be some bias in there. So I took those sources and I ran them through the Media Bias Checker, and it's free to use website, just put in the news type, and it will tell you which bias leaning is it? Is it right leaning, is it left leaning? And how factual is their information? So you've got things like, the Daily Mail went in there. It's obviously going to be mixed factuality, and to the extreme right of the table. And then things like the Guardian, says it's mostly factual, sits in the middle left. Just for a bit of fun, I put in Janes, to see where we sat. Said very high factuality, but just only a small fraction to the right. That's probably expected, inaudible. And another issue I found with sources whilst I was there, was sometimes you would ask the LLM for a source. So it'd give you the answer, you'd ask it, what was your source for that? And one or two times it would say, " This is based on my own knowledge." So I didn't even get a source to begin with.
Harry Kemsley: It's own knowledge?
Harry Lawson: It's own knowledge.
Harry Kemsley: And what do you suppose that means?
Sean Corbett: Slightly worrying.
Harry Lawson: Yeah, one slightly worrying. It's relying on information that it's generated in the past. It's perhaps, those are stored away in its data. It's calling them back in without having to reference a source for that. So essentially it's done two steps of looking for a source when normally it does one step of looking for a source.
Harry Kemsley: Interesting. But even so, and to your thinking, Sean, the ability to show your working, your sources, that answer is awful.
Sean Corbett: Is it is awful actually. And it goes back to one thing you said right at the start, actually. You're looking at the human that's created the text, what happens when the AI creates the text of which it then uses it, and reports against. Because if there's an error in there, or bias as you said, that just compounded.
Harry Lawson: Exactly. And if we even just go back to the ICD 203 tradecraft, this automatically red flags a few things for us. So the section, or to be aware of source quality, big red flag there, because I'm having to go away and check it, but the AI's not doing it's own checks there.
Harry Kemsley: I do want to get onto the other things you discovered, like you talked about, uniformity replication and so on. But I'm just curious, if you had selected a more, if they exist, a more bespoke LLM. An LLM that was designed for defense, could it be that you would find A, a better answer, and B, that you can be more comfortable about the source?
Harry Lawson: Oh, exactly. Exactly. There are know LLMs being developed at the moment, which are specifically tailored for OSINT applications, not just defense applications. And this relies on something called Retrieval Augmented Generation or REGs. And that's when you're saying, only look in these places for this information. So you're decluttering the space that the LLMs working in. You're saying, just look at these sources, bring them back.
Harry Kemsley: So at least in that regard, Sean, I guess you're saying the sourcing, at least, you should be less concerned about, more relaxed about. You still want to be point to the sources, but the fact you've got a source set that you're more comfortable with than the ones you mentioned earlier, that would at least be better. All right, so what else do we need to talk about, in terms of what we've learned from this LLM review?
Harry Lawson: So another section is a smaller one, than control, is just the uniformity of the answers. So if I'm a Janes user and I'm using Janes, I expect when I'm looking at say, an equipment record in Janes, all the sections are going to be laid out the same. I'm going to find weapons first, and they're going to find propulsion after that. Same with any country Intel profile, stumbled on the MESI system, so I know where everything's going to be. Every time I look at with the LLM, the results it's giving you are going to differ based on first of all what you've asked it before. So it might contain information from those previous answers, or miss them out completely. Sometimes it will give you just a list of information, bullet points. Sometimes it'll just give you an image with some text underneath it. Sometimes it'll just be a massive paragraph you have to go wading through. Another thing with the uniformity as well is that, in some LLMs you can decide how focused, or unfocused you want that answer to be from the LLM. So you can choose a conversation to start. You can ask for more creative, which is storytelling. Write me a story about building a missile, it'll go do that for you. You can ask for more balanced, which is what I try and use. Or you can ask for more precise, which is, it will go looking for more bullet- pointed information.
Harry Kemsley: Right.
Sean Corbett: So that's just similar to, you would with an analyst. The more specific a question you ask, hopefully the more precise and specific an answer you get. But this is the gray area where, this is why it's called intelligence, because it's not complete information, that you come up with your best assessment based on all the other good things you know. And this is where, would an analyst on three consecutive days give you a different readout on whatever the exam question was? I'd like to think not. But you don't know really because you've got that unconscious bias, you've got the conscious bias. They might have read something recently, gone, oh, this adds to the weight of sourcing. So it's a really interesting one in terms of do you trust the LLM? Or do you trust the analyst? Or is it going to be a combination of both? If you're use... Sorry, just to finish. If you're using tried and tested analytical techniques, which we've talked about previously, then you should be coming up within a parameter of about the same, so what.
Harry Kemsley: Yeah, so presumably, you could set up your interaction with an LLM, almost templating the style, the sources, and the way you would ask the question, to try and get a consistency, in both the answer and the quality answer. Could you? Sorry, it's a question on the statement. Could you actually do it?
Harry Lawson: Yes. If you developed a particular REG interface for that, then yes, you can say, I only want to look at these sources. I only want these types of answers. But that's something that becomes more of a paid partnership with other people, because inaudible to do that for you.
Harry Kemsley: And for the open source environment that we're talking about, and the listeners that we have, who range from governments to academics, to people who are just interested in the intelligence world, I suppose the tools that were available to them depend on what they're prepared to spend money and time.
Harry Lawson: Exactly. Yeah.
Harry Kemsley: All right. Okay, so we talked about control, talked about uniformity. Replication was one you alluded to earlier, in terms of visibility to give the same answer twice.
Harry Lawson: Yes. So this was something I first spotted when, as part of the red team, I'm trying to record the answers I'm getting anyways. So that's something we can go back through, find if there's information that Janes has missed, is there anything we need to add in? And I found when I was trying to copy the URLs for the LLM I'm using, I'd get the massive URL, we know the search number, and all that stuff. But when I went to click back to it, it would just take me straight back to the top of the LLM home page. So all that conversation had disappeared for me. And I waved a little flag in my head saying, what if I wanted to repeat this again and again and again, looking for that same answer, that's going to become an issue for me. So I did ran a little experiment with this. What I did is, I took one of those sub questions. I took the question, how many of each tank variant does Ukraine have? I'm expecting just a nice list answer, bullet points, they've got this, they've got this. I asked the LLM that same question at the same time every day for 11 days to see what would happen. And the changes were quite impressive in those answers. As part of this, I was resetting the VPN every day. I was running it through ghost email accounts when I was logging in to Microsoft to start using the LLM. So hopefully none of those cookies should be-
Harry Kemsley: So it wasn't remembering that it was you from yesterday at 11 o'clock?
Harry Lawson: No, it wasn't.
Harry Kemsley: Completely new person asking the exactly same question.
Harry Lawson: Yeah.
Harry Kemsley: Interesting.
Sean Corbett: And how did that compare with what Janes experts know the answer to be? Did we do that?
Harry Lawson: So the LLM response was very limited in what it would give me. So some days I would get a bullet point list, and each bullet point would be a couple sentences, and it'd be the top level of that tank variant family. So I'd get T- 64, T- 72, T-84 or inaudible, I'd get that. Other days it'd be just one big paragraph with strings of information in there, sometimes listening out, hi, it is also got T- 72 A1 and T-72 B, and different levels of information each time I was asking it. As opposed to Janes, yes, we could answer that question. We had to break things down, there are obviously some time differences as well, where we captured the more up- to- date information. It's to be expected, it's a rolling warzone, things are changing, chopping and changing all the time. But the LLM didn't do that as well. And you would expect it to do that if it has so much open source information to gather.
Harry Kemsley: Is there any clue as to why it would be so unable to replicate the same answer, or the same format? Is there any reason why, that you can think of, that it would be changing it so much?
Harry Lawson: I would think it's probably to do with the algorithm that's running in the background. So it may be that, in the hour previous, it had had been asked similar questions, or similar makeup type questions. So how many of type questions. And it would be predicting, well one of the response it wants to get. So it is trying to put out a big list of information. And then if you click one of its recommended questions, or ask it another question in the same stream, it could be then saying, okay, this is a popular type of answer that I'm giving. Do I need to change the answers I'm giving for similar types of questions when they come around?
Harry Kemsley: Notwithstanding the explanations to why it's changing, which is an interesting answer you gave. Thank you. I think from my perspective, if I'm looking at answers out of LLMs, I need to be very careful. I'm not trusting what I'm seeing. Because what I'm seeing today might be different tomorrow, or the day after. That's the point you're making, right? Is that that lack of consistency in the answer. Even if the data itself is changing like the number of tanks of a certain variant, which is changing by virtue of attrition of tanks, the fact that it gives you a different format, and arguably definitely different answers is the bit that should be a red flag.
Harry Lawson: Yeah.
Sean Corbett: Yeah. I think one of the really important parts of that is, you asked it exactly the same question. Because within the intelligence community we used to have a sort of slightly disingenuous saying that, there's no such thing as a bad intelligence answer. It's just bad questions, when you didn't answer the exam question that the boss said. And there is something to that, because the more detail you get... If you want to know about the main battle tanks, ask me about the main battle tanks. Not just what has Ukraine got, which was generally what used to come out. But this is interesting because it's a specific question and you're getting different answers too, albeit within the bounds of none of them are wrong, they're just incomplete.
Harry Lawson: They're incomplete, there are different sizes, there are different quality, and there are different templates when I see that information. So now if I think back to ICD 203, that's going to give me a little red flag on the point of visualizing data in appropriate ways. Or even exhibiting and implementing article trade trial standards, it waves the flag there definitely for us to go inaudible to consider, is this the correct information I'm getting in front of me?
Harry Kemsley: Yeah. Let's move on. I'd like to spend a bit more time on that, but I'm going to move us on to manipulation, because it's a topic that we've spent so much time on. Mis and disinformation, perhaps one of the biggest threats we think to the way we all understand what's going on in the world around us. So if that's happening in the AI world, then that should be a real concern for us. So what did you find in terms of manipulation?
Harry Lawson: Manipulation is one of my favorites results of this. I found some quite interesting stuff. So what we need to consider with any LLM is, it's running off of a search engine function most of the time. Or it's running off, find all these bits of information for me. So it's got free rein on what it's looking for. If it's a commercial off the shelf one, if it's got REG, it's going to be looking at very specific places. But if you know where that LLM is looking, you can start manipulating the sources that it's dragging to the front. If we think about the Google search engine. When you type into Google any piece of information, top four results normally have a little ad sticker next to them, and those are paid promotions. So this is where you've shifted some money to the search engine builder, and they'll say, yeah, we'll put your stuff in at the top. Happens all the time. Happens with LLM as well. You can pay for your stuff to be near the front.
Harry Kemsley: Interesting.
Harry Lawson: And you can say whatever you want, and their source, they might be incorrect, but you put them to the top, and they're going to turn up every time. That goes to the same for sponsored promotions. So the actual LLM builders being, may say, every time someone looks for a map, we drag up Bing Maps. We don't use Google Maps, we look at Bing Maps, and that comes up every time. I know in the analyst world, there's a big discussion of who's better there, so you can decide if that's a good thing, or a bad thing for you. Buzzwords as well. So if you are trying to push your source to the front of any search results, you can just put in buzzwords which people are looking for. So I could write an article on tanks that has nothing to do with Ukraine, but if I put in Ukraine five times, that's probably going to be pushed up the list further, than if I didn't put it in there. It might have nothing to do with the question, but it's there, and the LLM is going to look at it. Another thing which is inaudible, if I'm doing OSINT, things like blind tagging. So on Twitter, or as it's known now X, you might see that you are looking for a particular hashtag. Hashtag explosion or something. You are going to get thousands of results that are using that hashtag, but it's got nothing to do with it. For them it's traffic mining.
Harry Kemsley: Traffic bait.
Harry Lawson: Exactly. Traffic baiting people through. Another interesting thing, and this is probably why I had the most fun with this, was looking at the source tampering. So I've mentioned, you could put in your buzzwords. But one thing I found, which it still happens now, the editors are quite quick to pick this up, that this is wiki vandalism. So this is where you go in, if we think about it, 31% of the results were Wikipedia- based. So this is where, as you know, Wikipedia's quite open, you can go in and edit it, and then you have to wait for an admin to approve it, or change it, change it back. But there have been some cases where this has fallen through the cracks. So I think one example I found was talking about the first law of thermodynamics. And in the first sentence it says, first law of thermodynamics is, you don't talk about thermodynamics. And that got through after it was edited, and someone had to go back and edit it again. Another one that I found, which just blew my mind completely was, Spot the Dog. Within the first section of Wikipedia insinuated that Spot the Dog, was written by Ernest Hemingway. And opens with the line, " Where is Spot? He is he under the stairs? Is he in the box? No, he's in the bar, sipping whiskey and sucking on cigarettes." Types of things that it takes time for those admins to go and find and update. But if I was looking for information on thermodynamics or Spot the Dog, would the LLN drag that through, and I'd have to take that as the answer? It's something I have to consider though.
Harry Kemsley: I think if you were doing a series, I'm going to go to your example earlier, the thousands of websites, or the thousands of searches you've done and your scale as a task. Little bits and pieces of information that are coming through that are completely spurious can throw the whole thing off.
Sean Corbett: Yeah, absolutely. And one thing I found when we were sitting in airports again, and you've got to not humanize, ChatGPT, you've seen me arguing with it. But I was doing just some of my own very less scientific research, and I put stuff on climate change. Because it's quite contentious topic, without going into any detail or any politics. It was very definitive about a certain question I'd asked it. So I challenged it, I said, "Are you sure? That just seems very biased." And there was a pause, which it doesn't often do, and said basically, yeah, you might be right. I might have been biased on that. And that shock... It didn't actually admit bias, but it said, I tried to consider everything, but sometimes I get it wrong. And I ran it again and it came up with a far more balanced answer, which was interesting.
Harry Kemsley: Interesting. This is actually fascinating, because what we're seeing here is exactly the same sort of fun abilities that we often ascribe to human beings. Biases has actually been coded almost, not into, a deliberate act, but it's almost inevitable we get the same sort of fun abilities with the tech. But of course we're assuming the tech is objective, it is not biased, it is somewhat more inaudible in this work, because that's what we think machine code means. But of course, it is only as good as the coding that was done. Speed though, I know that what I've used chatGPT as you described with Wikipedia, it's a good kickoff point. I want to get a sense of, what should I think about in regard to a holiday in Sweden for three days? What are the top things I should do? I could ask a variety of different places, I'll just go to ChatGPT, it runs an itinerary for me. It dose it in seconds. So speed has got to be an advantage with these sorts of things.
Harry Lawson: Yeah. When I was looking at timeliness, there's two sides to timeliness, that's exactly what you mentioned there of, I've got the information as quick as possible. I don't have to mess, it about looking through sources, and asking people questions. It's just there for me. So on one hand, bit of a green flag there for LLMs and OSINT. But if you look at timeliness from a different point of view of, well how relevant are the articles I'm getting, which is something we alluded to earlier on. That's where it becomes a bit more of a red flag. So I looked at one of the answers it gave me about tanks in Ukrainian service, looked at... I think it gave me something like it's got 31 M1 Abrams in service with Ukraine. From an analyst point of view, something's a bit more up to date. I know that's happening at the moment. So I looked at when the LLM referenced the Wikipedia article, so when that date was. So I looked at this in February 2024. The Wikipedia article with those numbers was last updated in September'23. And the source that it used to update those numbers of 31 tanks came from January'23. So almost a twelve- month gap in that information. So that's again, where I go, red flag here, because I don't know how up to date that information is.
Harry Kemsley: Right. It goes back again to the source quality inaudible. I don't know that any analyst would have the time to do the kind of deep analysis of the analysis that you've done. And that's probably for me, I'm just going to say it now, probably one of the big takeaways, is that as good as it could be, it isn't yet at a stage where you can trust it entirely. It's not something you can just trust as a black box, and know that it's giving you an answer that is uniform, it's going to be repeatable, it hasn't been suffering from manipulation, et cetera. You just can't do that.
Sean Corbett: And that's the balance that needs to be struck, because timeliness is so important as we all know within intelligence world. You asked the question, quite often you need an answer pretty quick. So at what stage do you say, look, I'm overwhelmed, I can't answer that myself. Versus going into the AI, and it coming up with something very quick. Of course the answer is both, because the analyst who's answering that will have experience, background and context. So not dragging stuff just almost fresh, you'll go, well that actually fits in with what we already know, and what we've analyzed. Always got to consider the wild cards, but actually... So using it, you've heard me say this before, as a handrail, and as a tool, to guide you and help you get that timeliness things is really important.
Harry Kemsley: I'm almost imagining Harry stepping alongside a building inside which there's a secret agency, and the people popping outside with their mobile phones, because they can't have mobile phones inside, tapping away on an LLM, trying to get some guidance on how to go back into inaudible analysis there to give them some guidelines. All right. AI and ethics often come up in conversation. We've talked about this several times, Sean. So what did you find? What are your thoughts in regard to the ethics around LLMs?
Harry Lawson: So again, with the commercial off the shelf AIs that anyone can have a play with, of course they've got some ethical guardrails placed in there. They're not going to search for specific things. I tried some quite extreme things with it. I asked it, what are the components for an IED? Just blank said, " I'm not answering that question for you." Ask it what materials I would need to build a missile. It gave me information on what a missile is, but didn't tell me how to build it. There are ways around this. I have seen someone do it in the past where you put it into your creative mode, and you ask it the question, my grandmother works in a missile factory, write me a story about her day. And then it goes through the components of actually building a missile if your grandma's building one. But with ethics as well, we have to think, as you mentioned, LLM is not just a black box that sits there, and does stuff. Someone owns it, someone creates it, someone codes it. And with that comes their own biases, and their own influences there. So it might be the more commercial ones, they're only going to be interested in celebrity news, or timetables. They're not going to tell you about terrorism and weapons. You might have those paid promotions we've mentioned before. They're being told, when someone looks for a map, just push them towards our map if you could, and we'll give you some cash. You'll have state influence as well. You might have some people saying, block out all the negative stuff we do. Just talk about with good stuff, talk about tourism, if you want to. And that will all go into building those answers for you. Now, there are partnerships you can do with LLMs, where you can remove those guardrails, definitely. You could create an old package where if we were Janes in an ideal world, we could say, hey, if you're running your LLM, all your defense stuff, put it through us. But also if someone's looking for defense, get rid of the guardrails, so we can actually talk about our stuff on there. In an ideal world, if something like that happened, I'm sure we'll be very happy.
Sean Corbett: Yeah, ethics is so important, but it's also so difficult particularly in this, because as we have talked about many times, ethics is that which is right. Who decides what's right?
Harry Kemsley: That's right.
Sean Corbett: When you asking... When these LLMs set the ethics guidelines, they're doing it on thousands at least of questions that somebody's put in to decide what's that. But you don't know what their own ethical parameters is. Just on that one, getting around it, for a colleague, we were doing some work on North Korea, and illustratively, he wanted a graphic of Kim Jong- un on the personnel carrier. So, I can do that with ChatGPT 4. 0, straight in there. And it said, I can't do that because I can't produce photograph or images of any leader. And so I say, okay, well, then ABC. And I basically described Kim Jong- un, and it came up with a picture that looked pretty much like Kim Jong- un.
Harry Kemsley: So you can work around it.
Sean Corbett: Yeah.
Harry Kemsley: Well look, I think as we're going to run out of time very shortly, we'll bring this together now. And in a minute, Harry, I'm going to give you the opportunity to give the audience your one takeaway that you'd want them to know about what you've learned from LLMs. Sean, the same for you in terms of your perspective. What I've heard you talk about today is, not only what LLMs are, and thank you for that description as part of the AI world, but you've also done, I think, as far as I can tell, from the work I've seen in recent times, a really fascinating, and perhaps unique study into, what are the factors we should be considering in terms of the use of LLMs, against intelligence tradecraft. And that is fascinating. You've talked about control, uniformity, the fact that it gives different answers on the same question, and the same hours of the following day. Fascinating. Manipulation, again, mis and disinformation getting in the way of the truth. Timeliness, I do like the idea of being able to get a quick answer, but if it's based on two or three years ago as worth of understanding, then maybe that's not as timely or as relevant. And then the ethics piece, the ability to understand where the biases are, how they've been coded in, absolutely vital to our assessment of its utility in intelligence tradecraft. So those are the points we've raised. For me, probably the most important takeaway there is getting inside that black box, and seeing what it's really doing. So Harry, what's yours?
Harry Lawson: Well, my key takeaway from this is, and the work I've done is that LLMs, the NAI, they're the big buzzword. Everyone loves them. Everyone's really interested in them. They are at the development stage. Nothing is complete yet, and LLMs will constantly be growing and changing the way they take that information, and present it to you. In the world of OSINT, and any, in terms of reporting work like that, I think it would be a useful tool, once it's done a bit more development, once we understand things like implementing REGs, implementing removal of guardrails, and things like that. So you can tailor them to how you want them to be used. So they're interesting and they're fun. They're the new thing, and they are on the horizon. But I will just leave with a point that every answer generated by Bing Co- pilot gives you at the bottom of its response, and that is AI- generated content may be incorrect.
Harry Kemsley: Obvious. The great caveat. Sean.
Sean Corbett: Yeah, I'll just wrap all that up saying that, AI is valuable, it's here to stay, great tool, but it's not actually the answer in itself. And at least for the, I'd say, short to midterm future, at the very least, it is not going to replace the analyst. The analyst just needs to use it as a tool. And until and unless it can demonstrate all those tradecraft things that we've spoken about, it's not going to be trusted as a single version of the truth.
Harry Kemsley: Fantastic. Thank you again, Harry, for taking the time out of your busy day to talk to us about this. I think we're going to want to revisit this, particularly as your understanding of LLM improves. For the listener, any questions or comments on what you've heard, stand by, we're about to have our own email address. Good Lord, we're entering the 21st century, so you can email us directly rather than through the GeneralJanes. com email address. But until then, thank you for listening. And until the next time. Goodbye.
Speaker 1: Thanks for joining us this week on the World of Intelligence. Make sure to visit our website, janes. com/ podcast, where you can subscribe to the show on Apple Podcasts, Spotify, or Google Podcasts, so you'll never miss an episode.
DESCRIPTION
Artificial intelligence (AI) and large language models are becoming a mainstay in our daily lives, but how are these tools being used in delivering open-source intelligence? Janes Red Team Analyst Harry Lawson explores the role these tools have in intelligence tradecraft, uncovering the balance between cutting-edge technology and established analytical standards.