
Spam Terminology
Spam is commonly transmitted from bots operating from cloud or hosting service accounts of compromised devices that host malware (spambots). Spambots are malware, installed without consent. The bot itself and the thousands of emails each bot emits consume CPU, RAM, bandwidth, and storage from the source of transmission to the spam recipients’ devices. These unauthorized uses of resources are threats that are considered cybercrimes in the Council of Europe's Convention on Cybercrime (see also, Measurements).
We measure and provide scoring metrics for three aspects of spam identified in the Counsel of Europe’s Guidance Note #8: content (specifically, hyperlinks that direct recipients to ilicit or scam sites, the act of sending unsolicited messages, and the mechanism for sending (e.g., delivery infrastructures such as spambots or botnets).
The terminology that we use in our analyses and reporting follow.
Spam Report
We gather information about spamactivity detected by multiple feeds. Spam reports are source records that we collect from a threat intelligence feed (a blocklist). These identify the URL or domain name that a feed associates with hosting spammed content or the transmission of spam messages.
A spam activity can be identified in more than one feed. And in some cases, an activity tagged as spam by one feed can be tagged as phishing in a different feed. When we prefix “report” with a particular threat such as spam, we are indicating that we are using only those source records that were tagged as spam for our analyses.
Spam Records
We eliminate duplicates and enhance spam reports with metadata — domain and IP address registration data, ICANN registry and registrar monthly reports, routing data, and other indicators — to create spam records. These records are the bases for our measurements and scoring metrics.
Spam Domain Scores
To allow comparison of large and small Top-level Domains, we use a scoring metrics, TLD spam domain score, which is calculated by dividing the number of domain names associated with spam content or spambot hosting in a TLD by the number of domains delegated from that TLD.
TLD Spam Domain Score =
(number of unique domains reported for spam in a TLD/domains delegated from TLD) * 10,000
We use a similar scoring metric to compare small and large gTLD registrars. The gTLD Registrar Spam Domain score is a ratio of the number of domain names used for spam to the number of registered domain names under management (DUM) at that gTLD registrar.
gTLD Registrar Spam Domain Score =
(number of unique domains reported for spam in a gTLD registrar DUM / DUM at gTLD Registrar) * 10,000
Lastly, we use a metric to compare small and large hosting networks (ASNs). The Spamhost-Spambot score is a ratio of the number of IPv4 addresses associated with hosting spam content or spambots to the number of routed IPv4 addresses allocated to an autonomous system.
Spamhost-Spambot score =
(number of unique IPv4 addresses reassociated with spamhosts-spambots in ASN/ routed IPv4 addresses allocated to ASN) * 10,000
In our annual landscape studies, we use a similar metric to measure the prevalence of spam domains in TLDs and gTLD registrars. Here, we take the sum of the four quarters of unique spam domains reported and divide by the average of the domains under management per TLD and per gTLD registrar for each of the four quarters. We call these metrics a Yearly Spam Scores.
Yearly TLD Spam Domain Score =
(number of unique Spam domains reported in a TLD at end of period / number of domains delegated from a TLD) * 10,000Yearly gTLD Registrar Spam Score =
(number of unique Spam domains reported in a gTLD registrar at end of period / number of domains under management at gTLD Registrar) * 10,000Yearly Spamhost-Spambot score =
(number of unique IPv4 addresses reassociated with spamhosts-spambots in ASN at end of period/ routed IPv4 addresses allocated to ASN) * 10,000
Maliciously registered domain names
(also, malicious domain registrations).
We define a maliciously registered domain as a domain registered by a criminal to carry out a malicious or criminal act. For our studies, we distinguish maliciously registered domains from compromised domains, which we define as domain names that were registered for legitimate purposes but co-opted by criminals through some form of compromise.
For example, an attacker may hijack a legitimate user’s domain registrar account, alter the DNS to resolve a name or URL to a host that the attacker controls; here, the domain and DNS are compromised. An attacker may also exploit a vulnerability at a legitimate web hosting site, upload fake or malicious content to a web site, and create a URL that points to the malicious content at the legitimate web site; in this case, the web server is compromised.
This distinction is important because it often identifies where investigators should go for assistance with mitigation of the criminal activity:
If the domain is maliciously registered, an investigator will seek assistance from a domain name registrar, a TLD operator, or the operator that provides DNS for the malicious domain to suspend the domain name registration or name resolution.
For a compromised domain, such suspensions further victimize a legitimate party already victimized by the compromise, so investigators will contact the administrator of the compromised host to have the malicious content removed.
Note that parties that discover spam will do their best to blocklist URLs that identify malicious content to avoid further victimization, whereas they may block maliciously registered domain names (and thus all hostnames and URLs created using this name) to contain the pervasive malicious activity.
For this measurement, we consider:
1.The age of the domain name — the number of days elapsed between domain registration and the use of the domain for a malicious purpose. In general, the older the domain name, the higher the likelihood it will legitimate. Miscreants tend to use their domains within the first year of registration, before they must pay for renewal.
2. The content of the domain name. We apply rules to determine whether the composition of the name contains indicators of misuse or harmful intent, for example, the presence of a famous brand , a misspelled brand or a string intended to resemble a brand.
When the above criteria identify domains, we then look for clear evidence of common control and usage as an indicator to flag additional domains in a batch.
Hosting Network (ASN)
Autonomous system (AS) is a term used to describe a collection of networks that operate under a common administration. Its primary use is to identify peers and destinations in the global Internet routing system. Conceptually, routing at the AS level is a function of (i) identifying the autonomous systems that are adjacent to your AS, (ii) learning from your AS peers which destination autonomous systems you can reach by routing through them, and (iii) choosing which peer to use to optimally forward traffic to a given destination AS.
Autonomous systems are assigned unique numbers (ASNs) as part of the registration process that is required for operators to participate in the global Internet routing system. Autonomous system numbers are a sort of “shorthand”. Each number “represents” a list of IP address blocks (or IP prefixes) that are “reachable” in the AS. Cyber investigators are interested in destination AS numbers because this is the hosting network wherein an IP address that reportedly hosts phishing, malware, or other criminal site (or content) is located.
Whois services operated by Regional Internet Registries (ARIN, APNIC, AFRNIC, LACNIC, RIPE) provide registration data, including contact data, for autonomous systems and the IP address blocks that were allocated to autonomous system registrants. Some organizations operate several or dozens of autonomous systems. While we study the degree of “churn” in autonomous systems – adds and drops of IP address block allocations to an AS - we only report on individual autonomous systems, by number. Thus, we can call attention to individual hosting networks (ASN) where there are interesting concentrations of cybercrime activity, but we continue to explore ways to
identify organizations where interesting concentrations of cybercrime activity are present across several of the autonomous systems under that organization’s administration.