styx/README.md
Christopher Talib 84e4937f85 Major version update
This new work implements the server and the loader in two different
binaries allowing the code while updating the IOC list.

It updates also the documentation to reflect the new changes.
2020-08-24 17:20:07 +02:00

298 lines
6.5 KiB
Markdown

# Styx
## What are we trying to solve
Styx is the passive sibling of Vader. When Vader allows users to get "on-demand"
data from connectors, Styx will just ingest streams and streams of data. Styx
will find things when they are happening on the contrary of retro-hunting, Styx
find patterns in the current events and flags them on the spot. It's not
retro-hunting, it's present-hunting or even future-hunting as we hope to find
actors movement when they happen.
## Prerequisites
Styx uses a couple of other services to run:
* Kafka for messaging (not implemented yet in the docker, but currently not necessary)
* Dgraph for graph representation of results
* Docker-compose to launch everything
For that purposes, there is a `docker-compose.yml` file that you can spin up
with the following command when in the directory:
```sh
docker-compose up -d
```
*Note*: for some reasons, OpenVPN blocks the establishment of the docker
compose, you can alternatively run Dgraph manually as such:
```sh
docker run --rm -it -p 8080:8080 -p 9080:9080 -p 8000:8000 -v ~/dgraph:/dgraph dgraph/standalone:v20.03.0
```
## Install
```sh
go get -u gitlab.dcso.lolcat/LABS/styx
cd $GOPATH/src/gitlab.dcso.lolcat/LABS/styx
go build gitlab.dcso.lolcat/LABS/styx/cmd/styxd
docker-compose up -d # or the other docker command
./styxd
# build the loader helper binary
go build gitlab.dcso.lolcat/LABS/styx/cmd/iocloader
# update the IOC list while the programm is already running
./iocloader
```
*Note*: if you have issues with the docker compose, make sure it runs on the
same subnet. Check [this](https://serverfault.com/questions/916941/configuring-docker-to-not-use-the-172-17-0-0-range) for inspiration.
### Example configuration:
*Note*: For Pastebin, you will have to authorise your IP address when you login through the web interface.
```
certstream:
activated: true
pastebin:
activated: true
shodan:
activated: true
key: "SHODAN_KEY"
ports:
- 80
- 443
```
## Dgraph Interface
You can connect to the Dgraph interface at this default address: http://localhost:8000.
There you would be able to run GraphQL+ queries, here to query a node.
```graphql
query {
Node(func: eq(uid, 0x23)) {
uid
ndata
modified
type
id
}
}
```
Or filter node by type, this example works for certstream nodes:
```graphql
query {
Node(func: eq(type, "certstream")) {
uid
created
modified
type
ndata
certNode {
uid
fingerprint
cn
raw {
uid
id
}
chain {
uid
id
}
sourceName
serialNumber
basicConstrains
notBefore
notAfter
}
}
}
```
Example query for pastebin data:
```graphql
query {
Node(func: eq(type, "pastebin")) {
uid
created
modified
type
ndata
pasteNode {
id
type
created
modified
fullPaste {
full
meta {
full_url
size
expire
title
syntax
user
scrape_url
date
key
}
}
}
}
}
```
Dgraph also supports full text search, so you can query things like:
```graphql
query {
Node(func: allofterms(full, "code")) {
uid
created
modified
type
full
}
}
```
The following fields have can be used as index for searches:
* id
* type
* sourceName
* cn
* serialNumber
* hostnames
* organization
* full (full text of a pastbin)
* title
* user
By design, each node has a `type` field so you know which field you should query
each time you query something.
## Datastructure
### Meta
Edges provide an existing relation between two nodes of different origin. They are part of Dgraph features.
Node --[Edge]-- Node
```go
type Node struct {
ID string `json:"id"`
Type string `json:"type"`
Data string `json:"data"` // For plain Node, the data is the ID of another typed node or a unique value like a domain or a host name.
Created string `json:"created"`
Modified string `json:"modified"`
}
// Edge defines a relation between two nodes.
type Edge struct {
ID string `json:"id"`
NodeOneID string `json:"nodeOneID"`
NodeTwoID string `json:"nodeTwoID"`
Timestamp string `json:"timestamp"`
Source string `json:"source"`
}
```
### Certstream
Node -- CertNode -- CertStreamRaw
```go
// CertStreamRaw is a wrapper around the stream function to unmarshall the
// data receive in a Go structure.
type CertStreamRaw struct {
ID string `json:"id"`
Type string `json:"type"`
Data CertStreamStruct `json:"data"`
Created string `json:"created"`
Modified string `json:"modified"`
}
// CertNode represents our custom struct of data extraction from CertStream.
type CertNode struct {
ID string `json:"id"`
Fingerprint string `json:"fingerprint"`
NotBefore string `json:"notBefore"`
NotAfter string `json:"notAfter"`
CN string `json:"cn"`
SourceName string `json:"sourceName"`
SerialNumber string `json:"serialNumber"`
BasicConstraints string `json:"basicConstraints"`
RawUUID string `json:"rawUUID"`
Chain []CertNode `json:"chainedTo"`
}
```
### Pastebin
Node -- PasteNode -- FullPaste
```go
// PasteNode is a node from PasteBin.
type PasteNode struct {
ID string `json:"id"`
Type string `json:"type"`
Data FullPaste `json:"data"`
Created string `json:"create"`
Modified string `json:"modified"`
}
// FullPaste wrapes meta and information from Pastebin.
type FullPaste struct {
Meta PasteMeta `json:"meta"`
Full string `json:"full"`
Type string `json:"type"`
}
```
### Shodan
Node -- ShodanNode -- Node(s) (hostnames and domains)
```go
type ShodanNode struct {
ID string `json:"id"`
Type string `json:"type"`
Data *shodan.HostData `json:"data"`
Created string `json:"created"`
Modified string `json:"modified"`
}
```
### Balboa (not in Dgraph yet)
Balboa enrichment happens on domains and hostnames extracted from Certstream
and Shodan streams and the node is created only if Balboa returns data.
Node -- ShodanNode -- Node (domain) -- BalboaNode
```go
type BalboaNode struct {
ID string `json:"id"`
Type string `json:"type"`
Data []balboa.Entries `json:"data"`
Created string `json:"created"`
Modified string `json:"modified"`
}
```