Introduction
This is the user/operator facing manual for the River reverse proxy application.
River is a reverse proxy application under development, utilizing the pingora
reverse proxy engine
from Cloudflare. It is written in the Rust language. It is configurable, allowing for options
including routing, filtering, and modification of proxied requests.
River acts as a binary distribution of the pingora
engine - providing a typical application
interface for configuration and customization for operators.
The source code and issue tracker for River can be found on GitHub
For developer facing documentation, including project roadmap and feature requirements for the
1.0 release, please refer to the docs/
folder on GitHub.
Installation
Pre-compiled versions of River and installation instructions are provided on the Releases page on GitHub.
Currently, builds are provided for:
- x86-64 Linux (GNU libc)
- x86-64 Linux (MUSL libc)
- aarch64 MacOS (M-series devices)
The primary target is currently x86-64 Linux (GNU libc). Other platforms may not support all features, and are supported on a best-effort basis.
Core Concepts
River is a Reverse Proxy application.
It is intended to handle connections from Downstream clients, forward Requests to Upstream servers, and then forward Responses from the Upstream servers back to the Downstream clients.
┌────────────┐ ┌─────────────┐ ┌────────────┐
│ Downstream │ ┌ ─│─ Proxy ┌ ┼ ─ │ Upstream │
│ Client │─────────▶│ │ │──┼─────▶│ Server │
└────────────┘ │ └───────────┼─┘ └────────────┘
─ ─ ┘ ─ ─ ┘
▲ ▲
┌──┘ └──┐
│ │
┌ ─ ─ ─ ─ ┐ ┌ ─ ─ ─ ─ ─
Listeners Connectors│
└ ─ ─ ─ ─ ┘ └ ─ ─ ─ ─ ─
For the purpose of this guide, we define Requests as messages sent from the downstream client to the upstream server, and define Responses as messages sent from the upstream server to the downstream client.
River is capable of handling connections, requests, and responses from numerous downstream clients and upstream servers simultaneously.
When proxying between a downstream client and upstream server, River may modify or block requests or responses. Examples of modification include the removal or addition of HTTP headers of requests or responses, to add internal metadata, or to remove sensitive information. Examples of blocking include the rejection of requests for authentication or rate limiting purposes.
Services
River is oriented around the concept of Services. Services are composed of three major elements:
- Listeners - the sockets used to accept incoming connections from downstream clients
- Connectors - the listing of potential upstream servers that requests may be forwarded to
- Path Control Options - the modification or filtering settings used when processing requests or responses.
Services are configured independently from each other. This allows a single instance of the River application to handle the proxying of multiple different kinds of traffic, and to apply different rules when proxying these different kinds of traffic.
Each service also creates its own pool of worker threads, in order to allow for the operating system to provide equal time and resources to each Service, preventing one highly loaded Service from starving other Services of resources such as memory and CPU time.
Listeners
Listeners are responsible for accepting incoming connections and requests
from downstream clients. Each listener is a single listening socket, for
example listening to IPv4 traffic on address 192.168.10.2:443
.
Listeners may optionally support the establishment and termination of TLS.
They may be configured with a TLS certificate and SNI, allowing them
to securely accept traffic sent to a certain domain name, such as
https://example.com
.
Unlike some other reverse proxy applications, in River, a given listener is "owned" by a single service. This means that multiple services may not be listening to the same address and port. Traffic received by a given Listener will always be processed by the same Service for the duration of time that the River application is running.
Listeners are configured "statically": they are set in the configuration file loaded at the start of the River application, and are constant for the time that the River application is running.
Connectors
Connectors are responsible for the communication between the Service and the upstream server(s).
Connectors manage a few important tasks:
- Allowing for Service Discovery, changing the set up potential upstream servers over time
- Allowing for Health Checks, selectively enabling and disabling which upstream servers are eligible for proxying
- Load balancing of proxied requests across multiple upstream servers
- Optionally establishing secure TLS connections to upstream servers
- Maintaining reusable connections to upstream servers, to reduce the cost of connection and proxying
Similar to Listeners, each Service maintains its own unique set of Connectors. However, Services may have overlapping sets of upstream servers, each of them considering an upstream server in the list of proxy-able servers in their own connectors. This allows multiple services to proxy to the same upstream servers, but pooled connections and other aspects managed by Connectors are not shared across Services.
Path Control
Path Control allows for configurable filtering and modification of requests and responses at multiple stages of the proxying process.
Configuration
River has three sources of configuration:
- Command Line Options
- Environment Variable Options
- Configuration File Options
When configuration options are available in multiple sources, priority is given in the order specified above.
Configuration File Options
The majority of configuration options are provided via configuration file, allowing users of River to provide files as part of a regular deployment process. Currently, all configuration of Services (and their Listener, Connector, and Path Control options) are provided via configuration file.
At the current moment, two configuration file formats are supported:
- KDL - the current preferred format
- TOML - likely to be removed soon
For more information about configuration parameters available, see The KDL Configuration Format section for more details.
Environment Variable Options
At the moment, there are no options configurable via environment variables.
In the future, environment variables will be used for configuration of "secrets", such as passwords used for basic authentication, or bearer tokens used for accessing management pages.
It is not expected that River will make all configuration options available through environment variables, as highly structured configuration (e.g. for Services) via environment variable requires complex and hard to reason about logic to parse and implement.
Command Line Options
A limited number of options are available via command line. These options are intended to provide information such as the path to the configuration file.
It is not expected that River will make all configuration options available through CLI.
For more information about options that are available via Command Line Interface, please refer to The CLI Interface Format.
Command Line Interface
River: A reverse proxy from Prossimo
Usage: river [OPTIONS]
Options:
--validate-configs
Validate all configuration data and exit
--config-toml <CONFIG_TOML>
Path to the configuration file in TOML format
--config-kdl <CONFIG_KDL>
Path to the configuration file in KDL format
--threads-per-service <THREADS_PER_SERVICE>
Number of threads used in the worker pool for EACH service
--daemonize
Should the server be daemonized after starting?
--upgrade
Should the server take over an existing server?
--upgrade-socket <UPGRADE_SOCKET>
Path to upgrade socket
--pidfile <PIDFILE>
Path to the pidfile, used for upgrade
-h, --help
Print help
--validate-configs
Running River with this option will validate the configuration, and immediately exit without starting any Services. A non-zero return code will be given when the configuration fails validation.
--config-toml <CONFIG_TOML>
Running River with this option will instruct River to load the configuration file from
the provided path. Cannot be used with --config-kdl
.
--config-kdl <CONFIG_KDL>
Running River with this option will instruct River to load the configuration file from
the provided path. Cannot be used with --config-toml
.
--threads-per-service <THREADS_PER_SERVICE>
Running River with this option will instruct River to use the given number of worker threads per service.
--daemonize
Running River with this option will cause River to fork after the creation of all Services. The application will return once all Services have been started.
If this option is not provided, the River application will run until it is commanded to stop or a fatal error occurs.
--upgrade
Running River with this option will cause River to take over an existing River server's open connections. See Hot Reloading for more information about this.
--upgrade-socket <UPGRADE_SOCKET>
Running River with this option will instruct River to look at the provided socket path for receiving active Listeners from the currently running instance.
This must be an absolute path. This option only works on Linux.
See Hot Reloading for more information about this.
--pidfile <PIDFILE>
Running River with this option will set the path for the created pidfile when the server is configured to daemonize.
This must be an absolute path.
Environment Variables
TODO: We don't use any environment variables yet
Configuration File (KDL)
The primary configuration file format used by River uses the KDL Configuration Language.
KDL is a language for describing structured data.
There are currently two major sections used by River:
The system
section
Here is an example system
configuration block:
system {
threads-per-service 8
daemonize false
pid-file "/tmp/river.pidfile"
// Path to upgrade socket
//
// NOTE: `upgrade` is NOT exposed in the config file, it MUST be set on the CLI
// NOTE: This has issues if you use relative paths. See issue https://github.com/memorysafety/river/issues/50
// NOTE: The upgrade command is only supported on Linux
upgrade-socket "/tmp/river-upgrade.sock"
}
system.threads-per-service INT
This field configures the number of threads spawned by each service. This configuration applies to all services.
A positive, non-zero integer is provided as INT
.
This field is optional, and defaults to 8
.
system.daemonize BOOL
This field configures whether River should daemonize.
The values true
or false
is provided as BOOL
.
This field is optional, and defaults to false
.
If this field is set as true
, then system.pid-file
must also be set.
system.pid-file PATH
This field configured the path to the created pidfile when River is configured to daemonize.
A UTF-8 absolute path is provided as PATH
.
This field is optional if system.daemonize
is false
, and required if
system.daemonize
is true
.
system.upgrade-socket
This field configured the path to the upgrade socket when River is configured to take over an existing instance.
A UTF-8 absolute path is provided as PATH
.
This field is optional if the --upgrade
flag is provided via CLI, and required if
--upgrade
is not set.
The services
section
Here is an example services
block:
services {
Example1 {
listeners {
"0.0.0.0:8080"
"0.0.0.0:4443" cert-path="./assets/test.crt" key-path="./assets/test.key" offer-h2=true
}
connectors {
load-balance {
selection "Ketama" key="UriPath"
discovery "Static"
health-check "None"
}
"91.107.223.4:443" tls-sni="onevariable.com" proto="h2-or-h1"
}
path-control {
request-filters {
filter kind="block-cidr-range" addrs="192.168.0.0/16, 10.0.0.0/8, 2001:0db8::0/32, 127.0.0.1"
}
upstream-request {
filter kind="remove-header-key-regex" pattern=".*(secret|SECRET).*"
filter kind="upsert-header" key="x-proxy-friend" value="river"
}
upstream-response {
filter kind="remove-header-key-regex" pattern=".*ETag.*"
filter kind="upsert-header" key="x-with-love-from" value="river"
}
}
rate-limiting {
rule kind="source-ip" \
max-buckets=4000 tokens-per-bucket=10 refill-qty=1 refill-rate-ms=10
rule kind="specific-uri" pattern="static/.*" \
max-buckets=2000 tokens-per-bucket=20 refill-qty=5 refill-rate-ms=1
rule kind="any-matching-uri" pattern=r".*\.mp4" \
tokens-per-bucket=50 refill-qty=2 refill-rate-ms=3
}
}
Example3 {
listeners {
"0.0.0.0:9000"
"0.0.0.0:9443" cert-path="./assets/test.crt" key-path="./assets/test.key" offer-h2=true
}
file-server {
// The base path is what will be used as the "root" of the file server
//
// All files within the root will be available
base-path "."
}
}
}
Each block represents a single service, with the name of the service serving as the name of the block.
services.$NAME
The $NAME
field is a UTF-8 string, used as the name of the service. If the name
does not contain spaces, it is not necessary to surround the name in quotes.
Examples:
Example1
- Valid, "Example1""Example2"
- Valid, "Example2""Server One"
- Valid, "Server One"Server Two
- Invalid (missing quotation marks)
services.$NAME.listeners
This section contains one or more Listeners. This section is required. Listeners are specified in the form:
"SOCKETADDR" [cert-path="PATH" key-path="PATH" [offer-h2=BOOL]]
SOCKETADDR
is a UTF-8 string that is parsed into an IPv4 or IPv6 address and port.
If the listener should accept TLS connections, the certificate and key paths are
specified in the form cert-path="PATH" key-path="PATH"
, where PATH
is a UTF-8
path to the relevant files. If these are not provided, connections will be accepted
without TLS.
If the listener should offer HTTP2.0 connections, this is specified in the form
offer-h2=BOOL
, where BOOL
is either true
or false
. offer-h2
may only
be specified if cert-path
and key-path
are present. This configuration is
optional, and defaults to true
if TLS is configured. If this field is true
,
HTTP2.0 will be offered (but not required). If this field is false
then only
HTTP1.x will be offered.
services.$NAME.connectors
This section contains one or more Connectors. This section is required. Connectors are specified in the form:
"SOCKETADDR" [tls-sni="DOMAIN"] [proto="PROTO"]
SOCKETADDR
is a UTF-8 string that is parsed into an IPv4 or IPv6 address and port.
If the connector should use TLS for connections to the upstream server, the TLS-SNI
is specified in the form tls-sni="DOMAIN"
, where DOMAIN is a domain name. If this
is not provided, connections to upstream servers will be made without TLS.
The protocol used to connect with the upstream server us specified in the form
proto="PROTO"
, where PROTO
is a string with one of the following values:
h1-only
: Only HTTP1.0 will be used to connecth2-only
: Only HTTP2.0 will be used to connecth2-or-h1
: HTTP2.0 will be preferred, with fallback to HTTP1.0
The proto
field is optional. If it is not specified and TLS is configured, the default
will be h2-or-h1
. If TLS is not configured, the default will be h1-only
, and any
other option will result in an error.
services.$NAME.connectors.load-balance
This section defines how load balancing properties are configured for the connectors in this set.
This section is optional.
services.$NAME.connectors.load-balance.selection
This defines how the upstream server is selected.
Options are:
selection "RoundRobin"
- Servers are selected in a Round Robin fashion, giving equal distribution
selection "Random"
- Servers are selected on a random basis, giving a statistically equal distribution
selection "FNV" key="KEYKIND"
- FNV hashing is used based on the provided KEYKIND
selection "Ketama" key="KEYKIND"
- Stable Ketama hashing is used based on the provided KEYKIND
Where KEYKIND
is one of the following:
UriPath
- The URI path is hashedSourceAddrAndUriPath
- The Source address and URI path is hashed
services.$NAME.path-control
This section contains the configuration for path control filters
Each path control filter allows for modification or rejection at different stages of request and response handling.
This section is optional.
Example:
path-control {
request-filters {
filter kind="block-cidr-range" addrs="192.168.0.0/16, 10.0.0.0/8, 2001:0db8::0/32, 127.0.0.1"
}
upstream-request {
filter kind="remove-header-key-regex" pattern=".*(secret|SECRET).*"
filter kind="upsert-header" key="x-proxy-friend" value="river"
}
upstream-response {
filter kind="remove-header-key-regex" pattern=".*ETag.*"
filter kind="upsert-header" key="x-with-love-from" value="river"
}
}
services.$NAME.path-control.request-filters
Filters at this stage are the earliest. Currently supported filters:
kind = "block-cidr-range"
- Arguments:
addrs = "ADDRS"
, whereADDRS
is a comma separated list of IPv4 or IPv6 addresses or CIDR address ranges. - Any matching source IP addresses will be rejected with a 400 error code.
- Arguments:
services.$NAME.path-control.upstream-request
kind = "remove-header-key-regex"
- Arguments:
pattern = "PATTERN"
, wherePATTERN
is a regular expression matching the key of an HTTP header - Any matching header entry will be removed from the request before forwarding
- Arguments:
kind = "upsert-header"
- Arguments:
key="KEY" value="VALUE"
, whereKEY
is a valid HTTP header key, andVALUE
is a valid HTTP header value - The given header will be added or replaced to
VALUE
- Arguments:
services.$NAME.path-control.upstream-response
kind = "remove-header-key-regex"
- Arguments:
pattern = "PATTERN"
, wherePATTERN
is a regular expression matching the key of an HTTP header - Any matching header entry will be removed from the response before forwarding
- Arguments:
kind = "upsert-header"
- Arguments:
key="KEY" value="VALUE"
, whereKEY
is a valid HTTP header key, andVALUE
is a valid HTTP header value - The given header will be added or replaced to
VALUE
- Arguments:
services.$NAME.rate-limiting
This section contains the configuration for rate limiting rules.
Rate limiting rules are used to limit the total number of requests made by downstream clients, based on various criteria.
Note that Rate limiting is on a per service basis, services do not share rate limiting information.
This section is optional.
Example:
rate-limiting {
rule kind="source-ip" \
max-buckets=4000 tokens-per-bucket=10 refill-qty=1 refill-rate-ms=10
rule kind="specific-uri" pattern="static/.*" \
max-buckets=2000 tokens-per-bucket=20 refill-qty=5 refill-rate-ms=1
rule kind="any-matching-uri" pattern=r".*\.mp4" \
tokens-per-bucket=50 refill-qty=2 refill-rate-ms=3
}
services.$NAME.rate-limiting.rule
Rules are used to specify rate limiting parameters, and applicability of rules to a given request.
Leaky Buckets
Rate limiting in River uses a Leaky Bucket model for determining whether a request can be served immediately, or if it should be rejected. For a given rule, a "bucket" of "tokens" is created, where one "token" is required for each request.
The bucket for a rule starts with a configurable tokens-per-bucket
number. When a request arrives,
it attempts to take one token from the bucket. If one is available, it is served immediately. Otherwise,
the request is rejected immediately.
The bucket is refilled at a configurable rate, specified by refill-rate-ms
, and adds a configurable
number of tokens specified by refill-qty
. The number of tokens in the bucket will never exceed the
initial tokens-per-bucket
number.
Once a refill occurs, additional requests may be served.
How many buckets?
Some rules require many buckets. For example, rules based on the source IP address will create a bucket for each unique source IP address observed in a request. We refer to these as "multi" rules.
However, each of these buckets require space to contain the metadata, and to avoid unbounded growth,
we allow for a configurable max-buckets
number, which serves to influence the total memory required
for storing buckets. This uses an Adaptive Replacement Cache
to allow for concurrent access to these buckets, as well as the ability to automatically evict buckets that
are not actively being used (somewhat similar to an LRU or "Least Recently Used" cache).
There is a trade off here: The larger max-buckets
is, the longer that River can "remember" a bucket
for a given factor, such as specific IP addresses. However, it also requires more resident memory to
retain this information.
If max-buckets
is set too low, then buckets will be "evicted" from the cache, meaning that subsequent
requests matching that bucket will require the creation of a new bucket (with a full set of tokens),
potentially defeating the objective of accurate rate limiting.
For "single" rules, or rules that do not have multiple buckets, a single bucket will be shared by all requests matching the rule.
Gotta claim 'em all
When multiple rules apply to a single request, for example rules based on both source IP address, and the URI path, then a request must claim ALL applicable tokens before proceeding. If a given IP address is making it's first request, but to a URI that that has an empty bucket, it will immediately obtain the IP address token, but the request will be rejected as the URI bucket claim failed.
Kinds of Rules
Currently three kinds of rules are supported:
kind="source-ip"
- this tracks the IP address of the requestor.- This rule is a "multi" rule: A unique bucket will be created for the IPv4 or IPv6 address of the requestor.
- The
max-buckets
parameter controls how many IP addresses will be remembered.
kind="specific-uri" pattern="REGEX"
- This tracks the URI path of the request, such asstatic/images/example.jpg
- This rule is a "multi" rule: if the request's URI path matches the provided
REGEX
, the full URI path will be assigned to a given bucket - For example, if the regex
static/.*
was provided:index.html
would not match this rule, and would not require obtaining a tokenstatic/images/example.jpg
would match this rule, and would require obtaining a tokenstatic/styles/example.css
would also match this rule, and would require obtaining a token- Note that
static/images/example.jpg
andstatic/styles/example.css
would each have a UNIQUE bucket.
- This rule is a "multi" rule: if the request's URI path matches the provided
kind="any-matching-uri" pattern="REGEX"
- This tracks the URI path of the request, such asstatic/videos/example.mp4
- This is a "single" rule: ANY path matching
REGEX
will share a single bucket - For example, if the regex
.*\.mp4
was provided:index.html
would not match this rule, and would not require obtaining a tokenstatic/videos/example1.mp4
would match this rule, and would require obtaining a tokenstatic/videos/example2.mp4
would also match this rule, and would require obtaining a token- Note that
static/videos/example1.mp4
andstatic/videos/example2.mp4
would share a SINGLE bucket (also shared with any other path containing an MP4 file)
- This is a "single" rule: ANY path matching
services.$NAME.file-server
This section is only allowed when connectors
and path-control
are not present.
This is used when serving static files, rather than proxying connections.
services.$NAME.file-server.base-path
This is the base path used for serving files. ALL files within this directory (and any children) will be available for serving.
This is specified in the form base-path "PATH"
, where PATH
is a valid UTF-8 path.
This section is required.
Configuration File (TOML)
TODO: We're probably going to retire TOML configuration file support.
Hot Reloading
River does not support changing most settings while the server is running. In order to change the settings of a running instance of River, it is necessary to launch a new instance of River.
However, River does support "Hot Reloading" - the ability for a new instance of River to take over the responsibilities of a currently executing server.
From a high level view, this process looks like:
- The existing instance of River is running
- A new instance of River is started, configured with "upgrade" enabled via the command line. The new instance does not yet begin execution, and is waiting for a hand-over of Listeners from the existing instance
- A SIGQUIT signal is sent to the FIRST River instance, which causes it to stop accepting new connections, and to transfer all active listening Listener file descriptors to the SECOND River instance
- The SECOND River instance begins listening to all Listeners, and operating normally
- The FIRST River instance continues handling any currently active downstream connections, until either all connections have closed, or until a timeout period is reached. If the timeout is reached, all open connections are closed ungracefully.
- At the end of the timeout period, the FIRST River instance exits.
In most cases, this allows seamless hand over from the OLD instance of RIVER to the NEW instance of River, without any interruption of service. As long as no connections are longer-lived than the timeout period, then this hand-over will not be observable from downstream clients.
Once the SIGQUIT signal is sent, all new incoming connections will be handled by the new instance of River. Existing connections will continue to be serviced by the old instance until their connection has been closed.
There are a couple moving pieces that are necessary for this process to occur:
pidfile
When River is configured to be daemonized, it will create a pidfile containing its process ID at the configured location.
This file can be used to determine the process ID necessary for sending SIGQUIT to.
When the second instance has taken over, the pidfile of the original instance will be replaced with the pidfile of the new instance.
In general, both instances of River should be configured with the same pidfile path.
upgrade socket
In order to facilitate the transfer of listening socket file descriptors from one instance to another, a socket is used to transfer file descriptors.
This transfer begins when the SIGQUIT signal is sent to the first process.
Both instances of River MUST be configured with the same upgrade socket path.