Producers Aubrey and Shannon review the latest articles and research on using AI in OSINT. Should you consider using AI chatbots in research now or in the future? And if so, how can you do so securely and with verification in mind?
Shannon Ragan
Which I think is the problem that I think a lot of people are facing right now with using AI for their skill set, for like, their main job. It's like, you know, presumably you're good at your main job. Welcome to NeedleStack, the podcast for professional online research. I'm Shannon Ragan.
Aubrey Byron
And I'm Aubrey Byron. Today we're wrapping up our discussion on AI and OSINT by looking at the metastate of affairs and some tips and precautions for using it in your research.
Shannon Ragan
Yes, it's been a great discussion with our recent guests. Shout out to Chris Poulter, Neil Spencer, Trent Lewis and Declan Trezise. Wonderful discussions with all of them. We really enjoyed having them on and I think we've learned a lot along the way. And I think what we want to talk about today is kind of the perspective that hopefully our listeners have kind of come to as well. And we've learned in these discussions. So to kind of color the conversation, we also wanted to bring in some new material as well. There have been a couple of great articles on AI and OSINT recently, both in terms of some use cases in OSINT that it can be used for specifically in geolocation and then also in the fraudster cybercriminal scammer field. And how the prevalence of LLMs and being able to clone and use them for purposes for good or ill, is just going to be the way of the world in the future. So should we start with the Wired? Should we start with the ill?
Aubrey Byron
The ill? Yeah. I think Wired put out a great article summarizing some research about things like FraudGPT and some of these AIs...WormGPT is another one that are already coming out that are going to assist scammers, particularly low level phishing scammers, who maybe probably weren't that good at their jobs, I guess you would call it, before. It's just going to up the skills of everybody. The phishing emails are going to be better. They're going to be more accurate looking all the time and less suspicious looking, and everyone, just from security teams to users, need to be on the lookout for that kind of thing.
Shannon Ragan
I think in the article they covered these two. One is predominantly for phishing, one was touting that it was for creating, quote, undetectable malware, finding vulnerabilities, and assisting with scams as well. But I think this is just a window into the future that the article was also good to point out that's, like, there's no telling if these are actually legitimate. Like if you decide as the lowly cybercriminal that you are or scammer that you're going to hand over your money for this, that you're going to get anything in return. The dark web and marketplaces like this are full of illegitimate products and services just trying to scam the scammers. So buyer beware to scammers, this might not be an actual thing.
Aubrey Byron
Scamming. Scammers. What a moral conundrum.
Shannon Ragan
What do I do. If these two aren't legitimate, there will be ones that are. The number of LLMs that is just going to come out. I think when ChatGPT first came on the scene, it was like, oh, what is this one thing? And it's like, oh no, what is this universe of things that is going to change the way we communicate and produce content and produce anything? Really? It's just huge.
Aubrey Byron
You know, this reminds me... call back to episode 17 in season one with Eileen Ormsby talking about the hitmen services on the Dark Web that are really just scamming people out of their money because why would they go do that?
Shannon Ragan
Seems very dangerous. Just give me your money and I'll run. You know, these two are again touting a specific service for phishing or malware or whatnot. But that AI can be used for anything that you want to. I think one of the main lessons that has come out of these discussions with AI experts in OSINT is we're just really going to have to start doubting everything, which just feels so horrible to say. And the issue of truth has been up for discussion a lot in recent years. But this is such a game changer for people that it's your job to collect and verify information. So on one hand, OSINTers will be really good at doing this, but on the other hand, it just makes their job that much more difficult because what you assume could be reliable information just has that much more possibility of being manipulated.
Aubrey Byron
Well, speaking of reliability, another article that came out recently from Bellingcat tested Bing and Bard for geolocation to see if these AI chats can geolocate an image specifically without EXIF data. Because most OSINT researchers, when they're finding an image, if you have EXIF data, bam, the job is done. You have geolocated that image unless it's been manipulated, you did it. So particularly when the image has no EXIF data and that is the coordinate that is in the metadata of an image.
Shannon Ragan
Yeah, it's a really detailed article and I think one of the nice things that it does is there's so much black boxing in AI is that through the screenshots that are shared of the responses, particularly that Bing gives, that it is much better at walking through. Like here's how I arrived at this conclusion. So you could kind of go back and either replicate those steps on your own to verify that information or distrust maybe a step that's like, oh, that's not what I would do. But it seemed pretty basic essentially because as the researcher points out, bing is essentially teaching itself from the Internet and articles and ten step programs and things like that, of how to be a geolocator, that it's learning just the way a novice would, but also presenting its information in a very authoritative way. Bing also does a good job, though, of saying the credibility of this one is low I'm not going to tell you for sure this is where it is, but this is where I think it is. So that was really interesting, the way that it broke down the inner workings of what's going on in the AI and then the commentary on the findings as well.
Aubrey Byron
Yeah, I think the other interesting part for me is that part of the problem with trying to do this exercise is the same problem for getting anything out of ChatGPT or the other current AI chats is it takes excessive prompting to get the results that you're looking for. And even in this case, when the writer told it that there was not except data, it hallucinated some at first, and then he had to say, no, that's not correct, and then go back. But I find it interesting that on one of the tests he did several, it gave them coordinates for the results and they weren't correct, but they were close. It got kind of part of the way there.
Shannon Ragan
Yeah. And that was a lot of it that he was saying. It was finding things that because of the other information that it's looking at on the Internet or from the language that it's built from, that he's like it would reference things that weren't in the photo but were nearby because it's understanding certain things about the image in the context of all this other information. So it can get you close in that sense if you just have no idea where in the world this could be. It's like, oh, maybe this is in Edmonton or Ottawa were the examples in the article. But for an OSINT researcher, close is not necessarily good enough. You're trying to look for this street, this person, this shop, whatever. That close is not great.
Aubrey Byron
Yeah. And to be skeptical and verify the results that it gives you, because probably you're going to get several errors back, at least for now. And I think his conclusion at the end of that article is using an AI chatbot is inadvisable right now for geolocation. But I think the big asterisk there, as with so many things in AI, is right now because this technology is going to continue to improve and in the near, distant future it might be more worth it. And I think the other aspect too, is just how much time are you using to prompt a chat bot to get inaccurate results that you could be using to just go geolocate the image yourself. And right now that's probably a scale that's tipped a little too far one way.
Shannon Ragan
Yeah, you could definitely sense a degree of pain in researchers writing it's like I asked it to do this and then it did this get a load of this thing. Which I think is the problem that I think a lot of people are facing right now with using AI for their skill set, for their main job. It's like, presumably you're good at your main job and AI is strange at the thing that is your niche. So if you're really good at it, asking it to do simple tasks, it's like, well, I didn't really need you to do that. I can do that in a second. And then asking it to do things that you're not good at, it's hard to verify then because you don't have the background knowledge to know if you can trust certain things or not. Like I said, it's in a weird spot right now, but as you said, lightning speed. Like, where we will be at the end of the year with this. If the same article was written in December of 2023, it would just be interesting to see how much better it's gotten. Like, we've seen the change with ChatGPT 3, 3.5, 4, like, the enormous strides that it has done in these rigorous tests between those versions, and it is just leveling up at frightening speeds.
Aubrey Byron
Right. We're in the dial up era of ChatGPT, and make no mistake that there will be a fiber connection in the future.
Shannon Ragan
Yeah.
Aubrey Byron
By the way, the articles that we mentioned are by Matt Burgess and Dennis Kovtun, respectively.
Shannon Ragan
You know, I was thinking when you were saying about, know, when will we get to fiber that? Just trying to gain perspective on this, I think, because it's new and it's very disruptive, that people have a lot of feelings around using AI or not using AI and how good it is at what we're asking it to do. And to be able to look at this in the long line of technological history that has disrupted, but in the end benefited OSINT. OSINT came out of more non OSINT of clandestine activities in the government and spy agencies and things like that, using newspaper and media and open source information nonetheless. And then you get the Internet. I cannot imagine how frustrating it would have been for a career CIA guy sitting at his desk in '92 and being like, we're going to connect you to the Internet and you're going to need to start using it all of the time. He probably had it in like '89 or something, to be fair. But there is that just like, oh, is this really something that I have to start mastering and very quickly? The Internet, this search engine aspect of the Internet, social media.
Shannon Ragan
Again if you told that same guy, I'm really going to need you to get on Facebook. I'm really going to need you now to get on TikTok and discord and all these other things that you just have to keep constantly learning and learning is, you know, I understand the pushback, but now is the time to at least start getting comfortable with speaking to these things in terms of learning, prompt writing, learning how and when to start. Distrusting the output that it gives you versus bake this into your everyday workflows. Now is at least the time to start experimenting and adopting because it is going to get baked into everything very soon.
Aubrey Byron
Absolutely.
Shannon Ragan
This has been my public service announcement.
Aubrey Byron
I will say too, we all know that we need to verify the results that it gives us. If you have not been using ChatGPT, I think the vigilance with which you need to verify I think the hallucinations when I had heard about hallucinations I was expecting was the sky is red, not blue, or something slightly more interpretive. But I was asking it to summarize articles to see if it could around OSINT and one of them it made up a division of the People's Liberation Army in China. Made up a division, a name for it, an event that happened with this division that does not exist. And a year from the article that I tried to get it to read to me and then this does not exist. None of the pieces of what it made up. And when I asked it where it got the information, it told me paragraph two reread it and said, no, that's not, you know, just back and forth. But I think the creativity to which it makes stuff up may take you by surprise. Yeah. So definitely vigilance is needed.
Shannon Ragan
This came up in the Bellingcat article as well that during one of the tests, I think it says it runs a reverse image search and it's like, oh, I found it on Flickr here and here's a link. So it makes up a URL and he's like, It's nothing. It's a 404 page. He looks in the way back machine. It's nothing. You just made up a page. Or at the very least, if this is somehow some artifact from the Internet long ago, I can't verify that. This doesn't help me in my grand search. So yeah, it's interesting to see the shape that hallucinations take place not just in spouting claims, but providing resources and events and links and things like that. It's just weird.
Aubrey Byron
Yeah. Because it may not know the answer, but it does know what a URL looks like and can provide you with one.
Shannon Ragan
Yeah, sure.
Aubrey Byron
Here you yeah. And then of course, the other thing is anything with perspective, there are built in biases that you need to consider in your research as yeah, yeah.
Shannon Ragan
I think Trent Lewis's episode focused a lot on know, just thinking about the way that these things were built. And like all in technology, it inherits some of the traits of the builders. And so how does that affect outputs certain people that it might target, especially, or just that it doesn't understand? Like doesn't have as much data to analyze certain groups of people as well? So it's not necessarily malicious but can still be wildly unhelpful.
Aubrey Byron
Absolutely. So I think beyond be skeptical of everything it gives you. Another thing to consider is that a purpose built AI, like some of the things that Babble Street is coming out with or Navigator or PuriTech, something like that, that is built for OSINT, and that was the purpose of its development is going to be a lot better than chatbots. Yeah.
Shannon Ragan
Or if you have the ability to work with a data scientist and create your own LLM for your own data sets, that is a way to really trim down on the risk of hallucinations, because it's not searching the entire history.
Aubrey Byron
Of the Internet, not to mention the security aspect there, that you're not feeding your data to a program that you don't know what it will do with it. And that's kind of the last recommendation. And we talk about this a lot, and it seems maybe irrelevant, but it's very similar, I feel like, to our recommendations on using the dark web. We recommend you do use the dark web for open source investigations. We have a whole series on that, and we have a lot on our website. But one of the most important parts of that is that you should have a policy with your employer of how you're going to use it, the legality, how you're going to audit, how you use it. Everything with that applies here. And you need an AI access policy. Yeah.
Shannon Ragan
Also centers for excellence within companies, building best practices of when to use, how to use, how to verify. Building that educational base is huge as this takes off the ground and people want to be able to harness it, but they want to be able to harness it successfully and not mess up.
Aubrey Byron
Absolutely.
Shannon Ragan
All right, well, maybe we can stop talking about AI for a while, but.
Aubrey Byron
I highly doubt unfortunately, no. Yeah. But at least on this podcast, maybe.
Shannon Ragan
Yeah, we'll try to spare you. Well, thanks for listening today as we wrap up our series on AI and OSINT. If you liked what you heard, you can view transcripts and other episode info on our website, authentic8.com/needlestack. That's authentic with the number eight.com slash needlestack. And be sure to let us know your thoughts on Twitter @needlestackpod and to like and subscribe wherever you are listening today. We'll be back in September with more discussions, tips, and advice on open source research. Thank you. We'll see you then. Bye.