Resumen |
Current binary and multi-class (family) approaches for malware classification can hardly be of use for the identification and analysis of other samples. Popular family classification methods lack any formal naming definitions and the ability to describe samples with single and multiple behaviors. However, alternatives such as manual and detailed analysis of malware samples are expensive both in time and computational resources. This generates the need to find an intermediate point, with which the labeling of samples can be speeded up, while at the same time, a better description of their behavior is obtained. In this paper, we propose a new automated malware sample labeling scheme. Said scheme assigns a set of labels to each sample, based on the mapping of keywords found in file, behavior, and analysis reports provided by VirusTotal, to a proposed multi-label behavior-focused taxonomy; as well as measuring similarity between samples using multiple fuzzy hashing functions. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG. |