Introduction

This is the user/operator facing manual for the River reverse proxy application.

River is a reverse proxy application under development, utilizing the pingora reverse proxy engine from Cloudflare. It is written in the Rust language. It is configurable, allowing for options including routing, filtering, and modification of proxied requests.

River acts as a binary distribution of the pingora engine - providing a typical application interface for configuration and customization for operators.

The source code and issue tracker for River can be found on GitHub

For developer facing documentation, including project roadmap and feature requirements for the 1.0 release, please refer to the docs/ folder on GitHub.

Installation

Pre-compiled versions of River and installation instructions are provided on the Releases page on GitHub.

Currently, builds are provided for:

  • x86-64 Linux (GNU libc)
  • x86-64 Linux (MUSL libc)
  • aarch64 MacOS (M-series devices)

The primary target is currently x86-64 Linux (GNU libc). Other platforms may not support all features, and are supported on a best-effort basis.

Core Concepts

River is a Reverse Proxy application.

It is intended to handle connections from Downstream clients, forward Requests to Upstream servers, and then forward Responses from the Upstream servers back to the Downstream clients.

┌────────────┐          ┌─────────────┐         ┌────────────┐
│ Downstream │       ┌ ─│─   Proxy  ┌ ┼ ─       │  Upstream  │
│   Client   │─────────▶│ │           │──┼─────▶│   Server   │
└────────────┘       │  └───────────┼─┘         └────────────┘
                      ─ ─ ┘          ─ ─ ┘
                        ▲              ▲
                     ┌──┘              └──┐
                     │                    │
                ┌ ─ ─ ─ ─ ┐         ┌ ─ ─ ─ ─ ─
                 Listeners           Connectors│
                └ ─ ─ ─ ─ ┘         └ ─ ─ ─ ─ ─

For the purpose of this guide, we define Requests as messages sent from the downstream client to the upstream server, and define Responses as messages sent from the upstream server to the downstream client.

River is capable of handling connections, requests, and responses from numerous downstream clients and upstream servers simultaneously.

When proxying between a downstream client and upstream server, River may modify or block requests or responses. Examples of modification include the removal or addition of HTTP headers of requests or responses, to add internal metadata, or to remove sensitive information. Examples of blocking include the rejection of requests for authentication or rate limiting purposes.

Services

River is oriented around the concept of Services. Services are composed of three major elements:

  • Listeners - the sockets used to accept incoming connections from downstream clients
  • Connectors - the listing of potential upstream servers that requests may be forwarded to
  • Path Control Options - the modification or filtering settings used when processing requests or responses.

Services are configured independently from each other. This allows a single instance of the River application to handle the proxying of multiple different kinds of traffic, and to apply different rules when proxying these different kinds of traffic.

Each service also creates its own pool of worker threads, in order to allow for the operating system to provide equal time and resources to each Service, preventing one highly loaded Service from starving other Services of resources such as memory and CPU time.

Listeners

Listeners are responsible for accepting incoming connections and requests from downstream clients. Each listener is a single listening socket, for example listening to IPv4 traffic on address 192.168.10.2:443.

Listeners may optionally support the establishment and termination of TLS. They may be configured with a TLS certificate and SNI, allowing them to securely accept traffic sent to a certain domain name, such as https://example.com.

Unlike some other reverse proxy applications, in River, a given listener is "owned" by a single service. This means that multiple services may not be listening to the same address and port. Traffic received by a given Listener will always be processed by the same Service for the duration of time that the River application is running.

Listeners are configured "statically": they are set in the configuration file loaded at the start of the River application, and are constant for the time that the River application is running.

Connectors

Connectors are responsible for the communication between the Service and the upstream server(s).

Connectors manage a few important tasks:

  • Allowing for Service Discovery, changing the set up potential upstream servers over time
  • Allowing for Health Checks, selectively enabling and disabling which upstream servers are eligible for proxying
  • Load balancing of proxied requests across multiple upstream servers
  • Optionally establishing secure TLS connections to upstream servers
  • Maintaining reusable connections to upstream servers, to reduce the cost of connection and proxying

Similar to Listeners, each Service maintains its own unique set of Connectors. However, Services may have overlapping sets of upstream servers, each of them considering an upstream server in the list of proxy-able servers in their own connectors. This allows multiple services to proxy to the same upstream servers, but pooled connections and other aspects managed by Connectors are not shared across Services.

Path Control

Path Control allows for configurable filtering and modification of requests and responses at multiple stages of the proxying process.

Configuration

River has three sources of configuration:

  1. Command Line Options
  2. Environment Variable Options
  3. Configuration File Options

When configuration options are available in multiple sources, priority is given in the order specified above.

Configuration File Options

The majority of configuration options are provided via configuration file, allowing users of River to provide files as part of a regular deployment process. Currently, all configuration of Services (and their Listener, Connector, and Path Control options) are provided via configuration file.

At the current moment, two configuration file formats are supported:

  • KDL - the current preferred format
  • TOML - likely to be removed soon

For more information about configuration parameters available, see The KDL Configuration Format section for more details.

Environment Variable Options

At the moment, there are no options configurable via environment variables.

In the future, environment variables will be used for configuration of "secrets", such as passwords used for basic authentication, or bearer tokens used for accessing management pages.

It is not expected that River will make all configuration options available through environment variables, as highly structured configuration (e.g. for Services) via environment variable requires complex and hard to reason about logic to parse and implement.

Command Line Options

A limited number of options are available via command line. These options are intended to provide information such as the path to the configuration file.

It is not expected that River will make all configuration options available through CLI.

For more information about options that are available via Command Line Interface, please refer to The CLI Interface Format.

Command Line Interface

River: A reverse proxy from Prossimo

Usage: river [OPTIONS]

Options:
      --validate-configs
          Validate all configuration data and exit
      --config-toml <CONFIG_TOML>
          Path to the configuration file in TOML format
      --config-kdl <CONFIG_KDL>
          Path to the configuration file in KDL format
      --threads-per-service <THREADS_PER_SERVICE>
          Number of threads used in the worker pool for EACH service
      --daemonize
          Should the server be daemonized after starting?
      --upgrade
          Should the server take over an existing server?
      --upgrade-socket <UPGRADE_SOCKET>
          Path to upgrade socket
      --pidfile <PIDFILE>
          Path to the pidfile, used for upgrade
  -h, --help
          Print help

--validate-configs

Running River with this option will validate the configuration, and immediately exit without starting any Services. A non-zero return code will be given when the configuration fails validation.

--config-toml <CONFIG_TOML>

Running River with this option will instruct River to load the configuration file from the provided path. Cannot be used with --config-kdl.

--config-kdl <CONFIG_KDL>

Running River with this option will instruct River to load the configuration file from the provided path. Cannot be used with --config-toml.

--threads-per-service <THREADS_PER_SERVICE>

Running River with this option will instruct River to use the given number of worker threads per service.

--daemonize

Running River with this option will cause River to fork after the creation of all Services. The application will return once all Services have been started.

If this option is not provided, the River application will run until it is commanded to stop or a fatal error occurs.

--upgrade

Running River with this option will cause River to take over an existing River server's open connections. See Hot Reloading for more information about this.

--upgrade-socket <UPGRADE_SOCKET>

Running River with this option will instruct River to look at the provided socket path for receiving active Listeners from the currently running instance.

This must be an absolute path. This option only works on Linux.

See Hot Reloading for more information about this.

--pidfile <PIDFILE>

Running River with this option will set the path for the created pidfile when the server is configured to daemonize.

This must be an absolute path.

Environment Variables

TODO: We don't use any environment variables yet

Configuration File (KDL)

The primary configuration file format used by River uses the KDL Configuration Language.

KDL is a language for describing structured data.

There are currently two major sections used by River:

The system section

Here is an example system configuration block:

system {
    threads-per-service 8
    daemonize false
    pid-file "/tmp/river.pidfile"

    // Path to upgrade socket
    //
    // NOTE: `upgrade` is NOT exposed in the config file, it MUST be set on the CLI
    // NOTE: This has issues if you use relative paths. See issue https://github.com/memorysafety/river/issues/50
    // NOTE: The upgrade command is only supported on Linux
    upgrade-socket "/tmp/river-upgrade.sock"
}

system.threads-per-service INT

This field configures the number of threads spawned by each service. This configuration applies to all services.

A positive, non-zero integer is provided as INT.

This field is optional, and defaults to 8.

system.daemonize BOOL

This field configures whether River should daemonize.

The values true or false is provided as BOOL.

This field is optional, and defaults to false.

If this field is set as true, then system.pid-file must also be set.

system.pid-file PATH

This field configured the path to the created pidfile when River is configured to daemonize.

A UTF-8 absolute path is provided as PATH.

This field is optional if system.daemonize is false, and required if system.daemonize is true.

system.upgrade-socket

This field configured the path to the upgrade socket when River is configured to take over an existing instance.

A UTF-8 absolute path is provided as PATH.

This field is optional if the --upgrade flag is provided via CLI, and required if --upgrade is not set.

The services section

Here is an example services block:

services {
    Example1 {
        listeners {
            "0.0.0.0:8080"
            "0.0.0.0:4443" cert-path="./assets/test.crt" key-path="./assets/test.key" offer-h2=true
        }
        connectors {
            load-balance {
                selection "Ketama" key="UriPath"
                discovery "Static"
                health-check "None"
            }
            "91.107.223.4:443" tls-sni="onevariable.com" proto="h2-or-h1"
        }
        path-control {
            request-filters {
                filter kind="block-cidr-range" addrs="192.168.0.0/16, 10.0.0.0/8, 2001:0db8::0/32, 127.0.0.1"
            }
            upstream-request {
                filter kind="remove-header-key-regex" pattern=".*(secret|SECRET).*"
                filter kind="upsert-header" key="x-proxy-friend" value="river"
            }
            upstream-response {
                filter kind="remove-header-key-regex" pattern=".*ETag.*"
                filter kind="upsert-header" key="x-with-love-from" value="river"
            }
        }
        rate-limiting {
            rule kind="source-ip" \
                max-buckets=4000 tokens-per-bucket=10 refill-qty=1 refill-rate-ms=10

            rule kind="specific-uri" pattern="static/.*" \
                max-buckets=2000 tokens-per-bucket=20 refill-qty=5 refill-rate-ms=1

            rule kind="any-matching-uri" pattern=r".*\.mp4" \
                tokens-per-bucket=50 refill-qty=2 refill-rate-ms=3
        }
    }
    Example3 {
        listeners {
            "0.0.0.0:9000"
            "0.0.0.0:9443" cert-path="./assets/test.crt" key-path="./assets/test.key" offer-h2=true
        }
        file-server {
            // The base path is what will be used as the "root" of the file server
            //
            // All files within the root will be available
            base-path "."
        }
    }
}

Each block represents a single service, with the name of the service serving as the name of the block.

services.$NAME

The $NAME field is a UTF-8 string, used as the name of the service. If the name does not contain spaces, it is not necessary to surround the name in quotes.

Examples:

  • Example1 - Valid, "Example1"
  • "Example2" - Valid, "Example2"
  • "Server One" - Valid, "Server One"
  • Server Two - Invalid (missing quotation marks)

services.$NAME.listeners

This section contains one or more Listeners. This section is required. Listeners are specified in the form:

"SOCKETADDR" [cert-path="PATH" key-path="PATH" [offer-h2=BOOL]]

SOCKETADDR is a UTF-8 string that is parsed into an IPv4 or IPv6 address and port.

If the listener should accept TLS connections, the certificate and key paths are specified in the form cert-path="PATH" key-path="PATH", where PATH is a UTF-8 path to the relevant files. If these are not provided, connections will be accepted without TLS.

If the listener should offer HTTP2.0 connections, this is specified in the form offer-h2=BOOL, where BOOL is either true or false. offer-h2 may only be specified if cert-path and key-path are present. This configuration is optional, and defaults to true if TLS is configured. If this field is true, HTTP2.0 will be offered (but not required). If this field is false then only HTTP1.x will be offered.

services.$NAME.connectors

This section contains one or more Connectors. This section is required. Connectors are specified in the form:

"SOCKETADDR" [tls-sni="DOMAIN"] [proto="PROTO"]

SOCKETADDR is a UTF-8 string that is parsed into an IPv4 or IPv6 address and port.

If the connector should use TLS for connections to the upstream server, the TLS-SNI is specified in the form tls-sni="DOMAIN", where DOMAIN is a domain name. If this is not provided, connections to upstream servers will be made without TLS.

The protocol used to connect with the upstream server us specified in the form proto="PROTO", where PROTO is a string with one of the following values:

  • h1-only: Only HTTP1.0 will be used to connect
  • h2-only: Only HTTP2.0 will be used to connect
  • h2-or-h1: HTTP2.0 will be preferred, with fallback to HTTP1.0

The proto field is optional. If it is not specified and TLS is configured, the default will be h2-or-h1. If TLS is not configured, the default will be h1-only, and any other option will result in an error.

services.$NAME.connectors.load-balance

This section defines how load balancing properties are configured for the connectors in this set.

This section is optional.

services.$NAME.connectors.load-balance.selection

This defines how the upstream server is selected.

Options are:

  • selection "RoundRobin"
    • Servers are selected in a Round Robin fashion, giving equal distribution
  • selection "Random"
    • Servers are selected on a random basis, giving a statistically equal distribution
  • selection "FNV" key="KEYKIND"
    • FNV hashing is used based on the provided KEYKIND
  • selection "Ketama" key="KEYKIND"
    • Stable Ketama hashing is used based on the provided KEYKIND

Where KEYKIND is one of the following:

  • UriPath - The URI path is hashed
  • SourceAddrAndUriPath - The Source address and URI path is hashed

services.$NAME.path-control

This section contains the configuration for path control filters

Each path control filter allows for modification or rejection at different stages of request and response handling.

This section is optional.

Example:

path-control {
    request-filters {
        filter kind="block-cidr-range" addrs="192.168.0.0/16, 10.0.0.0/8, 2001:0db8::0/32, 127.0.0.1"
    }
    upstream-request {
        filter kind="remove-header-key-regex" pattern=".*(secret|SECRET).*"
        filter kind="upsert-header" key="x-proxy-friend" value="river"
    }
    upstream-response {
        filter kind="remove-header-key-regex" pattern=".*ETag.*"
        filter kind="upsert-header" key="x-with-love-from" value="river"
    }
}

services.$NAME.path-control.request-filters

Filters at this stage are the earliest. Currently supported filters:

  • kind = "block-cidr-range"
    • Arguments: addrs = "ADDRS", where ADDRS is a comma separated list of IPv4 or IPv6 addresses or CIDR address ranges.
    • Any matching source IP addresses will be rejected with a 400 error code.

services.$NAME.path-control.upstream-request

  • kind = "remove-header-key-regex"
    • Arguments: pattern = "PATTERN", where PATTERN is a regular expression matching the key of an HTTP header
    • Any matching header entry will be removed from the request before forwarding
  • kind = "upsert-header"
    • Arguments: key="KEY" value="VALUE", where KEY is a valid HTTP header key, and VALUE is a valid HTTP header value
    • The given header will be added or replaced to VALUE

services.$NAME.path-control.upstream-response

  • kind = "remove-header-key-regex"
    • Arguments: pattern = "PATTERN", where PATTERN is a regular expression matching the key of an HTTP header
    • Any matching header entry will be removed from the response before forwarding
  • kind = "upsert-header"
    • Arguments: key="KEY" value="VALUE", where KEY is a valid HTTP header key, and VALUE is a valid HTTP header value
    • The given header will be added or replaced to VALUE

services.$NAME.rate-limiting

This section contains the configuration for rate limiting rules.

Rate limiting rules are used to limit the total number of requests made by downstream clients, based on various criteria.

Note that Rate limiting is on a per service basis, services do not share rate limiting information.

This section is optional.

Example:

rate-limiting {
    rule kind="source-ip" \
        max-buckets=4000 tokens-per-bucket=10 refill-qty=1 refill-rate-ms=10

    rule kind="specific-uri" pattern="static/.*" \
        max-buckets=2000 tokens-per-bucket=20 refill-qty=5 refill-rate-ms=1

    rule kind="any-matching-uri" pattern=r".*\.mp4" \
        tokens-per-bucket=50 refill-qty=2 refill-rate-ms=3
}

services.$NAME.rate-limiting.rule

Rules are used to specify rate limiting parameters, and applicability of rules to a given request.

Leaky Buckets

Rate limiting in River uses a Leaky Bucket model for determining whether a request can be served immediately, or if it should be rejected. For a given rule, a "bucket" of "tokens" is created, where one "token" is required for each request.

The bucket for a rule starts with a configurable tokens-per-bucket number. When a request arrives, it attempts to take one token from the bucket. If one is available, it is served immediately. Otherwise, the request is rejected immediately.

The bucket is refilled at a configurable rate, specified by refill-rate-ms, and adds a configurable number of tokens specified by refill-qty. The number of tokens in the bucket will never exceed the initial tokens-per-bucket number.

Once a refill occurs, additional requests may be served.

How many buckets?

Some rules require many buckets. For example, rules based on the source IP address will create a bucket for each unique source IP address observed in a request. We refer to these as "multi" rules.

However, each of these buckets require space to contain the metadata, and to avoid unbounded growth, we allow for a configurable max-buckets number, which serves to influence the total memory required for storing buckets. This uses an Adaptive Replacement Cache to allow for concurrent access to these buckets, as well as the ability to automatically evict buckets that are not actively being used (somewhat similar to an LRU or "Least Recently Used" cache).

There is a trade off here: The larger max-buckets is, the longer that River can "remember" a bucket for a given factor, such as specific IP addresses. However, it also requires more resident memory to retain this information.

If max-buckets is set too low, then buckets will be "evicted" from the cache, meaning that subsequent requests matching that bucket will require the creation of a new bucket (with a full set of tokens), potentially defeating the objective of accurate rate limiting.

For "single" rules, or rules that do not have multiple buckets, a single bucket will be shared by all requests matching the rule.

Gotta claim 'em all

When multiple rules apply to a single request, for example rules based on both source IP address, and the URI path, then a request must claim ALL applicable tokens before proceeding. If a given IP address is making it's first request, but to a URI that that has an empty bucket, it will immediately obtain the IP address token, but the request will be rejected as the URI bucket claim failed.

Kinds of Rules

Currently three kinds of rules are supported:

  • kind="source-ip" - this tracks the IP address of the requestor.
    • This rule is a "multi" rule: A unique bucket will be created for the IPv4 or IPv6 address of the requestor.
    • The max-buckets parameter controls how many IP addresses will be remembered.
  • kind="specific-uri" pattern="REGEX" - This tracks the URI path of the request, such as static/images/example.jpg
    • This rule is a "multi" rule: if the request's URI path matches the provided REGEX, the full URI path will be assigned to a given bucket
    • For example, if the regex static/.* was provided:
      • index.html would not match this rule, and would not require obtaining a token
      • static/images/example.jpg would match this rule, and would require obtaining a token
      • static/styles/example.css would also match this rule, and would require obtaining a token
      • Note that static/images/example.jpg and static/styles/example.css would each have a UNIQUE bucket.
  • kind="any-matching-uri" pattern="REGEX" - This tracks the URI path of the request, such as static/videos/example.mp4
    • This is a "single" rule: ANY path matching REGEX will share a single bucket
    • For example, if the regex .*\.mp4 was provided:
      • index.html would not match this rule, and would not require obtaining a token
      • static/videos/example1.mp4 would match this rule, and would require obtaining a token
      • static/videos/example2.mp4 would also match this rule, and would require obtaining a token
      • Note that static/videos/example1.mp4 and static/videos/example2.mp4 would share a SINGLE bucket (also shared with any other path containing an MP4 file)

services.$NAME.file-server

This section is only allowed when connectors and path-control are not present.

This is used when serving static files, rather than proxying connections.

services.$NAME.file-server.base-path

This is the base path used for serving files. ALL files within this directory (and any children) will be available for serving.

This is specified in the form base-path "PATH", where PATH is a valid UTF-8 path.

This section is required.

Configuration File (TOML)

TODO: We're probably going to retire TOML configuration file support.

Hot Reloading

River does not support changing most settings while the server is running. In order to change the settings of a running instance of River, it is necessary to launch a new instance of River.

However, River does support "Hot Reloading" - the ability for a new instance of River to take over the responsibilities of a currently executing server.

From a high level view, this process looks like:

  1. The existing instance of River is running
  2. A new instance of River is started, configured with "upgrade" enabled via the command line. The new instance does not yet begin execution, and is waiting for a hand-over of Listeners from the existing instance
  3. A SIGQUIT signal is sent to the FIRST River instance, which causes it to stop accepting new connections, and to transfer all active listening Listener file descriptors to the SECOND River instance
  4. The SECOND River instance begins listening to all Listeners, and operating normally
  5. The FIRST River instance continues handling any currently active downstream connections, until either all connections have closed, or until a timeout period is reached. If the timeout is reached, all open connections are closed ungracefully.
  6. At the end of the timeout period, the FIRST River instance exits.

In most cases, this allows seamless hand over from the OLD instance of RIVER to the NEW instance of River, without any interruption of service. As long as no connections are longer-lived than the timeout period, then this hand-over will not be observable from downstream clients.

Once the SIGQUIT signal is sent, all new incoming connections will be handled by the new instance of River. Existing connections will continue to be serviced by the old instance until their connection has been closed.

There are a couple moving pieces that are necessary for this process to occur:

pidfile

When River is configured to be daemonized, it will create a pidfile containing its process ID at the configured location.

This file can be used to determine the process ID necessary for sending SIGQUIT to.

When the second instance has taken over, the pidfile of the original instance will be replaced with the pidfile of the new instance.

In general, both instances of River should be configured with the same pidfile path.

upgrade socket

In order to facilitate the transfer of listening socket file descriptors from one instance to another, a socket is used to transfer file descriptors.

This transfer begins when the SIGQUIT signal is sent to the first process.

Both instances of River MUST be configured with the same upgrade socket path.