Investigative network mapping: Link analysis with Maltego and Gephi
Learn Link Analysis Graphing (LAG) to visualize complex data, find gatekeepers, and uncover hidden connections in your investigations
From exposing corruption to tracking extremist groups, the best investigative stories rarely rely on a single figure but on complex webs of connections among people, companies, and digital assets.
Manually analyzing such links takes too much time and often fails to reveal subtle, non-obvious relationships. Network mapping, or Link-Analysis Graphing (LAG), is a process that converts disparate facts (names, emails, domains) into a graphical map for you to navigate, find key actors, gatekeepers, and hidden structures in a web of complex investigations.
1.1. The investigative need (The “why”)
Journalists accumulate hundreds of data points from public records, documents, and search engine results during investigations. This data at its raw state can sometimes be daunting and obfuscate relationships. Tools such as Maltego and Gephi automate the query of public sources, as well as immediately visualizing the relationship and revealing the “who is connected to whom” and “how are they connected”. It’s essential for demonstrating coordination, naming major players, and turning a dense narrative digestible to the public.
1.2. Learning outcomes
Differentiate between Nodes (Entities) and Edges (Relationships) in a network graph.
Use Maltego to automatically gather OSINT and visually map connections from a single starting entity (e.g., an email or domain).
Import structured data (CSV) into Gephi and calculate key network centrality metrics.
Interpret the investigative significance of Betweenness Centrality and Clustering Coefficient.
1.3. Case study hook
A journalist investigates a global influence network. They begin with a single domain name from a shell company. Using Maltego’s Transforms, they map all associated email addresses, then pivot from those emails to find linked social media profiles, and finally use graph analysis in Gephi to identify a single, seemingly obscure individual who is the only connection (the “gatekeeper”) between the entire finance arm and the political messaging arm of the operation.
💡 2. Foundational theory and ethical-legal framework
2.1. Key terminology
Node/Entity: A data point (e.g., a person, company, email, phone number, IP address) represented as a circle or icon in the graph.
Edge/relationship: The line connecting two nodes, representing a link (e.g., “Works At,” “Owns,” “Communicated With”).
Transform (Maltego): An automated query that fetches related data from a source (e.g., given a Domain, run a Transform to find associated Email Addresses).
Centrality metrics: Algorithms that measure the importance of a node in a network (e.g., Degree, Betweenness).
Graph visualization: The process of using algorithms (layouts) to arrange the nodes and edges for readability and pattern recognition.
⚠️ 2.2. Ethical and legal boundaries
2.2.1. Consent & privacy: Data aggregation risk
Maltego and similar tools aggregate public data quickly, which can inadvertently create a highly intrusive picture of an individual, even if each data point is publicly available.
The ethical boundary is heightened: Do not use these tools to map personal networks or expose private individuals unless their activities are strictly in the public interest and relevant to the investigation. Always ensure the collection remains passive (no attempted logins or network probing, such as aggressive Nmap scans).
2.2.2. Legal Considerations: Licensing and Intent
Maltego comes with a free Community Edition (CE), but has usage limits. Commercial editions have more features and data integrations. Using the tool improperly, especially running Transforms directly against systems or data sources in which you do not have authorization, could be considered illegal network reconnaissance. Tools such as Maltego must be utilized for OSINT (Open Source Intelligence) which is derived from publicly indexed or API-accessible information. Unauthorized access is illegal.
Mandatory disclaimer: Consult legal department to ensure your investigation’s scope and methods comply with jurisdictional laws regarding data collection, privacy, and unauthorized access, particularly when dealing with cross-border data.
🛠️ 3. Applied methodology: Step-by-step practical implementation
3.1. Required tools & setup
Maltego CE (Community Edition): Free version for graphical link analysis (requires registration).
Gephi: Free, open-source software for visualizing and analyzing large graphs (requires Java).
Data scraper/parser: A tool or script to export collected data (names, emails, phone numbers) into a structured CSV file. Octoparse is a recommended no-coding solution.
Isolated environment: A VM or dedicated clean machine for running these tools.
👷♀️ 3.2. Practical execution (The “How”)
Scenario A: Automated mapping with Maltego
Goal: Start with a domain name and automatically identify its registered owner, hosting details, and associated infrastructure.
Scenario B: Advanced analysis with Gephi (For structured data)
Goal: Import a spreadsheet of scraped data (e.g., leaked email communication logs, where each row is a “Source” email talking to a “Target” email) and calculate the most critical person.
💾 3.3. Data preservation and Chain of Custody
Maltego Export: Export the final graph visualization as a high-resolution image (PNG/PDF) and the underlying data as a GraphML file or a CSV of nodes and edges.
Gephi Export: Export the static image (PNG) of the final visualized graph, and the GEXF (Gephi Graph File) project file, which preserves all the applied metrics and layout.
Source Log: Create a table or spreadsheet logging the original source of every Entity (Node) added to the graph. If an email originated from a Google Dork search that led to a public PDF, log the exact URL of the PDF and the date and time it was collected. Generate an SHA-256 hash for all source files.
🧠 4. Verification, analysis, and editorial integration
4.1. Corroboration strategy
Network analysis only suggests a possible relationship; it does not prove why the relationship exists.
Edge corroboration: If a Maltego Transform shows an “Affiliation”, you must verify the nature of that link. Is the link based on a shared phone number, a past job listing, or a public registry entry? Examine the original source document or API record to confirm the link’s validity.
Cross-method verification: If Gephi indicates that "Jane Doe" has a high Betweenness Centrality score, this supports her role as a broker. Does her LinkedIn profile, public records, or the content of the documents you collected (Section 3 of the previous tutorial) confirm her intermediary role?
4.2. Translating data to narrative
Translate network metrics into actionable story elements.
4.3. AI assistance in analysis
LLMs can help interpret the context behind the graph’s structure.
Summarizing log files: Input the source data (e.g., the text of the meeting minutes or email logs that generated the Edges) to an LLM.
Prompt example:
Analyze the provided source text (log files/meeting minutes/emails) that generated the network graph.
Identify the dominant context and relationship behind the Edge between the nodes identified as [Node A Name/ID] and [Node B Name/ID].
Specifically, summarize the primary topic of discussion, the purpose of their interaction, and the tone/sentiment expressed in the source material related to this connection.
Finally, explain if the source text supports the weight or directionality assigned to this specific Edge in the graph structure.Identifying Key entities and roles: Once a cluster of 15 names is found, an LLM can scan their publicly available job titles (if provided in the node attributes) and group them into functional roles such as “Finance Team”, “Logistics Support”, etc.
Translation: Translate all foreign-language node labels or Edge descriptions (e.g., from a foreign corporate registry) into the investigative language.
⚠️ Critical warning: Privacy and hallucination
Never upload an investigative graph (GEXF file) or its associated sensitive data (node/edge attributes) to any public LLM service. The graph itself can reveal the structure of your investigation and potentially expose sources or targets. Always use private, secure tools.
All interpretations, summaries, or roles identified by an AI must be rigorously fact-checked by cross-referencing against the original public source documents/records that generated the graph’s Nodes and Edges. The graph is a map; the underlying facts are the territory.
🚀 5. Practice and resources
5.1. Practice exercise
Open Maltego CE.
Add a Website Entity using a public company’s domain (e.g.,
paterva.com).Run the Transform: “To Email Address – using search engine”.
Run the Transform: “To Person – PGP Key” on one of the found email addresses.
Filter and rearrange the graph to clearly visualize the link from Domain $\rightarrow$ Email $\rightarrow$ Person.
5.2. Advanced Resources
Maltego documentation/training: Maltego offers a robust series of tutorial videos and documentation for advanced features.
NetworkX (Python): A library for advanced network analysis scripting, allowing for custom metrics and data integration beyond GUI tools.
GHDB for network tools: Find dorks that specifically expose lists of names, emails, or servers to quickly generate bulk data for graph import.
6. Key takeaways and investigative principles
Nodes & edges: The graph is only as good as the underlying data—ensure every Node and Edge is derived from a verifiable source.
Pivot wisely: In Maltego, only run Transforms that will yield relevant data to avoid cluttering the graph with noise.
Analyze centrality: Use Betweenness Centrality (Gephi) to quickly identify the crucial “gatekeeper” entities in a large network.
Visualize the narrative: Use layouts, sizing, and coloring to make the graph tell the story visually—big nodes are important; tight clusters are closed groups.
Security First: Never expose sensitive graph data to insecure environments or public AI models.
👁️ Coming next week…
Detecting bots, trolls, and disinformation campaigns
This tutorial provides the analytical framework to identify coordinated inauthentic behavior (CIB) in real-time. We cover key techniques like analyzing post frequency shifts, tracking user network centralization, and decoding rapid sentiment shifts to expose hidden botnets and troll farms.






