Monday, May 16, 2016

Panama Papers in Maltego

By now everyone knows about the Panama Papers and the Offshore Leaks. If you don't you should read about it [here]. We've downloaded the CSV files from them, imported into a SQL database, then wrote some transforms for Maltego. That's the context.

Disclaimers. You should really really read this!

First off - some disclaimers. I know nobody ever reads disclaimers but these are pretty important so you really need to read them.

Disclaimer 1: Not everyone in the database is 'bad'. Having an offshore account is not a crime. There are good reasons to have one. Like they say on the their site: "There are legitimate uses for offshore companies and trusts. We do not intend to suggest or imply that any persons, companies or other entities included in the ICIJ Offshore Leaks Database have broken the law or otherwise acted improperly."

Disclaimer 2: People have the same names. Who would have thought?! You find someone in the data and go 'oooh! Het jou katvis!' - but remember that it could be someone else with that same name. Manually verify results - always!

Disclaimer 3: The data is not very clean. There could be four entries for the same person and in Maltego these nodes will not merge (different node_IDs). You'll need to manually merge them if you feel like it. Of course, see 2 - e.g. they could be four different people. The same goes for addresses - the data was clearly captured by hand, so people write the same address in many different ways. Best thing here is to take the most significant part of the address and search for that - then manually verify.

Disclaimer 4: The transforms might break. I am not even a proper coder. It should be OK, but when a query does not return or stuff falls apart then remember this disclaimer. If we get a LOT of interest on this then we might rewrite the transforms properly. Also - there's a lot of improvements that can be made on the transforms. Display info etc. etc. Don't tell us - we know this.

This was hacked together on a Friday afternoon and a Saturday night and by the end of the day it seemed very useful and that's why we're releasing it now.

With that out the way, let's first see how to get the transforms and entities into Maltego. We thought about adding this into the Transform Hub but decided against it. It's cool, but it's not THAT cool. That means you need to install the transforms by hand. Luckily, it's pretty easy.

How to install

In the transform hub, click on the [+] sign. Fill in the fields as you wish. The only part that needs to be the same as our example is the seed URL. The seed URL is []


Once you filled it in hit OK. You'll now see the item appears in the transforms hub:

Hover over it and click on 'Install'. It should look something like this when you're done (this is Maltego 4, but the other versions should look similar):

Woot! Now you're ready to start using the transforms.

How to use 

Before we start we want to quickly discuss the data. There are 4 tables. Officers (people), Entities (companies, trusts or other legal entities), Addresses (duh - addresses), Intermediaries (think agents or companies or people doing the work on behalf of the officers). Then there's a table that links all of these together. 

There are 4 entities in Maltego - Officers, Entities, Intermediaries, Addresses and Country. The transforms implement an almost fully meshed grid between these with a couple of spaces where it's not really applicable.

The starting point for all transforms is a Phrase. As the data is mostly linked by node IDs you cannot start with any of the 'PanamaP' entities as you don't know what the node ID is. You always start with a Phrase and search from there.

Let's see how this works. Let's assume we're looking for an officer called 'Hillary Clinton'. We suggest looking for just the word 'Clinton'. We drag a Phrase entity (in the Personal section) onto the graph, double click on the text and change it to 'Clinton'. Then we right click on the entity to bring up the context menu, navigate all the way to the top (right click on the menu) and select the Panama Papers transforms:
In that group we select the 'PP Search officer' transform:
This results in:
Let's assume we're interested in one of the nodes and want to see what entities and addresses are connected to that officer. We select one of the nodes, right click and run the 'PP Get details' transform:
We can do the same on the Entity that's returned from here:

And so the story goes on...

Another interesting way to look at the data is to start looking for the Addresses. This is sometimes useful to identify Officers from certain locations. For broader searches you can start from a country...

Let's see which officers stays in Beverly Hills. We start with a phrase 'Beverly Hills' and run the 'PP Search addresses':
We get 47 addresses in Beverly Hills that's in the database. Let's see what's going on there. We select all the nodes and run the transform 'PP To officers or entities here' transform:

...but wait...

Does 'Beverly Hills' exist in other countries too? Yes. In Australia. In Hong Kong. Probably in other countries too. So we need to remove them. Control F, type in 'Hong'. Hit find. Control shift down arrow (select children). Delete. Rinse and repeat for others. Hmmm.. perhaps Beverly Hills was a bad choice. There's even a Beverly Hills in Balito, South Africa. Really? REALLY?

Anyhow. Rinse. Repeat. And then:

Pretty please read the disclaimers at the start of this post. You probably scrolled to the end right away. But please read them.

And this time, for realsies -- use responsibly!

Tuesday, May 3, 2016

Maltego 4 - it's finally time...


Maltego 4 is finally on the picture below to view the release video:

Download the software [here]

...but if you want to know more...

The Maltego 4 story

In March of 2015 myself, Chris and Andrew sat in a room in Cape Town to decide which feature to build next. It's one of the hardest challenges managing Maltego - deciding what to do next. There's always at least five major features competing for our attention. Be that geospatial view, temporal view, feeders or a browser plugin - there's always the next big thing waiting. We argued the entire day, everyone having their own favorite. At around 7 o clock we were tired, hungry and irate. I asked Paul (at the time still pretty green and struggling to keep up with all the intricacies of a new design) "if you could have any feature in Maltego - what would it be?". He didn't have to think long and answered "handling big graphs". Then he casually put his headphones back on and ignored us.

It wasn't what I wanted to hear. We kept on ignoring the issue to the point that we almost believed it wasn't a problem anymore. We didn't want to fix it. It was hard to fix. It meant ripping the guts out of our product. We all knew that it would mean many months of nothing but rebuilding things we already had. No new features, no new flashy bits. Just hard work - rebuilding Maltego from the ground up. But Paul was right. It wasn't the popular answer, but probably the right answer.

For the months to follow we had no new features coming out. We issued a couple of patches for Maltego Chlorine (3.6) and kept supporting the old version. I asked Chris and Sonja if they had a rough idea on when we'll be done. The first date we tried for was Black Hat Las Vegas 2015. August. By June we all knew it was way too early and we pushed it back to Christmas 2015. In early December 2015 they sent me a barely working version. It included lots of disclaimers on which parts I could play with - but it could handle 30 000 nodes with ease. It was exciting, so exciting that I had to make a video about it. We decided we needed a new website too. Paul was to run with that - it had to be ready to go with the release of the new version.

Putting back all the pieces took longer than we anticipated and we hadn't even started on collection nodes - the secret weapon in the fight against large graphs. Collection nodes were not a new concept. We tried it back in 2009 and never released it - it failed miserably, partly because the product (and perhaps we ourselves) were simply not mature enough. The trick then was usability and the usability of collection nodes was a major struggle now. We decided to completely redo the interface. The version I had in my hands looked really bad. The user experience was bad. It was riddled with bugs, things that simply didn't work. I pulled the video. It said we'll have it before 2016. There was simply no way we'd have it done. Christmas came and went and we had nothing.

During January 2016 I felt like the new version was never going to happen and that, even if we did get it right, users would hate it.  I didn't even want Andrew and Paul to try this version because it would leave a bad taste in their mouths. But we kept slogging and gradually things started to get better.

The turning point was early March 2016. After many usability / look /feel meetings we were slowly getting there. Things started to fall into place. It was looking the part and after several iterations the interface was starting to behave the way you expected it to. Preparing for a conference in April I exclusively used the new version. Using it in anger for the first time it was clear that this was something really special. All of the hard work was starting to pay off. Things that only lived in our imagination for a year were now right there on the interface, and it was working exactly the way we envisioned it. It was fast - terribly fast. And slick. And it handled almost anything I could throw at it. There would be no going back to Chlorine ever. It was time to set a date for the final release.

The date was set to be the first of May 2016. But that was a Sunday so we went with May 2. This was a public holiday in South Africa (and in many other countries) so we went with Tuesday May 3. Now we had to tie up all loose ends (memory leaks, branding, testing/fixing/testing/fixing). We contemplated calling the new version Plutonium, but this release was so different to anything we've had in the past that we decided it would be easier to just go with 'Maltego 4'. We sent out betas to a select group of trusted users. The feedback was phenomenal. They loved it.

We made a 'camera-ready' release  on the 26 of April and I flew to Cape Town to go make the release video. We shot an afternoon, an evening and the next morning and I flew back to Gauteng to edit. After some hiccups the final edit was ready on the Sunday before the release.

Today is Monday. Tomorrow we release. A brand new website, a brand new product. The release is not perfect. There are always things we want to improve and there are most likely a few minor bugs that we'll squash over time. With a system as complex as Maltego it's almost impossible to achieve perfection and I have to constantly remind myself that nobody cares about Maltego as much as we do. It's a child we all raised together as parents,siblings and a crazy uncle.

Some other stuff we probably need to say

Maltego 4 comes in two commercial flavors. Classic (the standard version) and XL (the pro version). The *only* difference between the two is that Classic is capped at 10 000 nodes. Oh wait - and the price - Classic is still $760 and XL is $1800. We had lots and lots of discussions about the price. We haven't raised the price on Maltego for a long time and we didn't want to raise the price for the new version. So we decided to split it into two products (we've been wanting to do this for a while now). We then had to decide what's in the XL version and what's not. An easy out would have been to exclude collection nodes from the Classic version. But collection nodes are super useful - even when working with small graphs as they quickly show you where you need to (probably) look - NOT at the collections. So collection nodes stayed. Then it was crippling Classic in some way...but that just felt wrong and so we didn't. Every time we thought about taking things out of Classic we cringed. Finally we decided capping the total number of nodes in a graph. But where to cap it? We decided on 10K nodes because of two reasons - the first being that in the past, working with 10K nodes would be painfully slow - so - we weren't taking functionality away from they never had it. Secondly the slider was always maxed out of 10k - it didn't make sense to have it at a lower number. 10K it was.

Still more stuff

Maltego Chlorine users will be able to simply download Maltego 4 Classic and activate it with their license key. No upgrade fee to Maltego 4. Users that wish to upgrade to XL should just pop us an email.

Then there's the question of the community edition. Ye - we're no longer supporting it and we'll be removing it from our site. Hehe.. no. Give it a bit of time. We'll create Maltego 4 CE and Maltego 4 Kali soon. No really. We will. Currently the CE versions are still using the old tech.

And finally..

One last thing. CaseFile. The one we always leave behind at the bus stop. There's good news. With Maltego 4 being so totally amazing we're making CaseFile completely free. No registration. No nothing. Just download and use. And in time we'll upgrade CaseFile to goodness of collection nodes, large graphs and a face lift.

Right, that's about it. We're super excited to see what you think about all our new tech. It's been a long journey and we're really pleased with our progress. We hope you're too!

RT and the rest of the (tired) team.

Monday, May 2, 2016

Network Footprinting with Maltego.

One common task that Maltego is used for is doing infrastructure footprints on an organisation's network. This post will detail a possible methodology used for network footprints as well as demonstrate how they can be performed in Maltego. Finally the post will show how the process is drastically simplified with the use of machines that automates the process of running transforms in Maltego.

Network footprinting methodology.

When performing a footprint on a domain the goal is to find as much information about the domain as possible on an infrastructure level. When dealing with a large footprint it can be quite difficult to know when you have found all possible information that is publicly available for that particular domain. To make the process a little easier we have a structured methodology that we follow when conducting a network footprint in Maltego. This process is outlined in the data model below in the Image 1.

(Image 1)

At each level of this data model we want to find as much information as possible relating to the domain in question. Arrows in the data model relates to transforms within Maltego that can be used to find related information either above, below or on the same level of the model. Throughout this blog post I will refer back to this data model.

Starting at the top of the model with the target domain you'll see an arrow that points from a domain back to a domain. This transform relates to the TLD (top level domain) expansion of the target. In the real world this means going from (for example) to all the other Google domains (, etc.)

Once the top level domains are enumerated the first step is to try find as many DNS names from that domain’s zone file. This includes getting the domain’s MX records, its NS records and as many A records as possible. In Maltego there are nine transforms for finding DNS names related to a domain. Explaining how each of these transforms work is out of the scope of this post, however, transform explanations can be found in our transforms guide. In Maltego there is also a transform set named DNS from Domain that includes all nine of these transforms. Running this transform set on the domain results in the graph shown in Image 2 below:

(Image 2)

Note that there are 742 entities in the DNS name collection node.

From the DNS name level on the data model back in Image 1 you will see that there are three transforms for going back up a level from DNS names to find more related domains. Two of these transforms look for domains that share the same name servers (NS) or mails servers (MX) that have been found from our original domain. The third transform simply extracts the domain from that DNS name.

When finding shared infrastructure it is important to consider whether the name servers and mail servers are hosted by your target organisation or by an ISP. Looking at the shared infrastructure belonging to an ISP will results in many domains being returned that are hosted by the ISP but not related to your target. Determining if a MX or NS is hosted can be tricky but visiting the website of the related enitity mostly helps in making that decision. It is outside the scope of this document to detail this process (but it's mostly just common sense). 

The next step in going down the data model is to resolve all the DNS names to IP addresses. Doing so results in the graph the below:

(Image 3)

It is interesting to note here that 283 of the DNS names that we found all resolve to a single IP address shown in Image 3 above. From the image it can also be noted that there were 97 DNS names that currently do not resolve to an IP address at all. This might be an indication of old DNS names or DNS names that resolve to internal resources configured on a split DNS system.

On the IP address layer of the data model we could now go back up a level to find more DNS names related to the IP addresses. This can be done by looking at historical DNS records collected from passive DNS, reverse DNS and by querying Bing to see what other website have been seen resolving to the same IP address (aka the "IP:" trick). 

Continuing down the data model from the IP addresses we next want to find the netblocks that the addresses belong to and determine whether the entire netblocks actually belongs to our target organisation. Finding the correct netblock size can be a tricky process and often requires some trial and error to get right. In Maltego there are three transforms for finding netblocks from an IP address and it is important to understand how each of these work. These three transforms are listed below:

  • To Netblocks [Using natural boundaries] - This transform will sort IP addresses into netblock sizes specified by the user. 
  • To Netblocks [Using routing info] - This transform determines the netblock that an IP address belongs to by looking up its routing table information.
  • To Netblocks [Using WHOIS info] - This transform will look up the Netblock for an IP address by querying the registrars.

It is very important to place IP addresses into the correctly sized netblocks.  If you make the block size too small you will miss out on IP space belonging to your target organisation. You also do not want to make the netblock too large and include IP space belonging to someone else. Running the transform To Netblocks [Using WHOIS info] on our example graph from Image 3 results in the following graph:

(Image 4)

Image 4 above shows a portion of the resulting graph. Once we have these netblocks it is important to validate that we are still looking at our target's infrastructure and have not included IP space belonging to "innocent bystanding" organisations. One way of doing this is to run the historical DNS transform on the netblock and then manually inspect whether or not the block belongs to your target by looking at the (reverse) DNS names that you get back. This is done by running the transform To DNS names in netblock [reverse DNS]. Running this transforms on the netblock found previously results in 121 DNS names being returned. 

(Image 5)

Manually inspecting these DNS names it is quite easy to see that they all do belong to our target organisation and we can therefore make the assumption with near certainty that the entire netblock does in fact belong to our target. In this step we have also found more DNS names related to our target and the process can be repeated by resolving the newly found DNS names to IP addresses and then finding the netblock that they belong to. 

Next from the netblocks we have found we can have a look at the Autonomous Systems (AS-es) that they belong to and determine whether the entire AS is in fact owned by our target organisation. First we run the transform To AS number on the netblocks we have. We then run the transform To Company [Owner] to see who owns the AS. Doing so on our example results in seven AS-es being returned that belong to the LinkedIn organisation. Image 6 below shows a small portion of the graph and the path taken to get to one of these AS-es:

(Image 6)

At this point we have reached the bottom level of the data model from Image 1. The next step would be to take the AS-es we have found belonging to our target organisation and start moving back up the data model to find more related information at each level. First we would get all the netblocks in the AS-es and from these new netblocks we would then find more DNS names by looking at their historical or reverse DNS records. From new DNS names that are found we could potentially find more domains belonging to the target and then start the whole process again on the new domains. Note that this step is not included on the example graph in this post.

An important aspect to realize here is that a network footprint is a cyclical process, not a linear one (and you're never done, you just give up ;)) . The most simple footprint you can do would be to go from the top of the data model to the bottom without moving up the model at any stage as we have done here in this example. However we could continue this footprint by moving back up the data model from the AS-es that we have found belonging to our target.

The final graph from our example in bubble view is shown in Image 7 below. Bubble view will size entities according to the number of incoming links it has from different sources. This makes it easy to identify the most connected parts of the network as well as its outliers.

(Image 7)

Foot-printing machines

Fortunately, it is not required to remember every step of this footprinting process thanks to the concept of machines in Maltego. Machines allow you to script transforms together and have them run sequentially in an automated fashion. Out-the-box Maltego comes with three machines for network footprinting that roughly follow the process described previously. These three machines are described briefly below. Note that Maltego also ships with a forth machine for footprinting named Footprint XXL. Footprint XXL uses a different method which is useful when footprinting larger networks. However this machine is not within the scope of this blog post as it is aimed at advanced Maltego users footprinting massive multi-national organizations.

Footprint L1:
This is the most basic footprinting machine and runs through the data model from Image 1 straight down from top to bottom without looking at any shared infrastructure or historical DNS records. 

Footprint L2:
This machine will run through the same steps as Footprint L1 above. Additionally this machine will look for additional domains related to the original domain by looking for shared infrastructure of its name servers (NS) and mail servers (MX). The machine will also look for other websites hosted on the same IP addresses. The machine also has user filters - these are popups which are displayed while the machine is running and prompts the user to manually inspected results and decides with ones to continue with. In machine L2's case user filters are used to allow the user to choose which name servers, mail servers and websites are hosted by the target organisation or by an ISP. This is done to prevent the machine from looking for shared infrastructure on DNS names that are not hosted by the target. An example of a user filter when running Footprint L2 on the domain is shown in the Image 8 below:

(Image 8)
From visual inspection it is clear that Paterva's mail is hosted by Google and their name servers are hosted by Linode. Therefore you would not want the machine to continue to run transforms that look for shared infrastructure on these entities as you'll follow the rabbit hole all the way to Google's (and Linode's) infrastructure!

Footprint L3:
Footprint L3 runs the same transforms as Footprint L2 but additionally it will look at historical / reverse DNS records on the netblocks that are found in order to find additional DNS names belonging to the target. Again the machine will use user filters to allow the user to specify which of the netblocks are still relevant.

Footprint L3 will also run a transform named ToServerTechnologyWebsite on selected website entities on the graph and returns the name of different server technologies that are used on that particular website. Running this transform provides an easy way to identify which technologies are used commonly across many of the target's websites as well as outliers - the (sometimes more outdated) technologies that are only be used on one or two servers. The screenshot in Image 8 below shows the results of the transform ToServerTechnologyWebsite on web servers of

(Image 9)

51 websites are found that are related to the domain and inspecting the graph you can identify the website technologies that are commonly used. Selecting all the BuiltWith entities on the graph and ordering the detail view in descending order according to the number of incoming links shows which website technologies are the 'odd-ones-out' and are only used on a couple of websites - these are often the more 'interesting' sites...


The examples shown in this blog post provides one possible strategy for conducting a network footprint in a structured and repeatable way. The three footprinting machines that come with Maltego out-the-box provide an easy method for applying this strategy to any domain while each machine differs in exploration depth.