Monoid Protocol
The Monoid Protocol describes a series of standard components and all the interactions between them in order to declare a specification for a data silo, and to enable automated requests against that data silo.
The formulation of the Monoid Protocol (along with Monoid's documentation) are inspired by Airbyte.
Monoid Protocol Interface
Communication with live data silos follows the Monoid Protocol interface. Any protocol implementation (Monoid itself uses Docker) must enable the functions defined in the interface, and communicate using the types used by the protocol (which are described below).
The Monoid Protocol Interface is as follows:
spec() -> MonoidSiloSpec
validate(Config) -> MonoidValidateMessage
query(Config, MonoidQuery) -> MonoidStream
delete(Config, PersistenceConfig, MonoidQuery) -> MonoidStream
scan(Config, PersistenceConfig, MonoidSchemasMessage) -> Stream<MonoidRecord>
stream_read(Config, PersistenceConfig, MonoidStreamHandle) -> Stream<MonoidRecord> | MonoidFile
stream_status(Config, MonoidStreamHandle) -> MonoidStreamStatus
schema(Config) -> MonoidSchemasMessage
Spec
spec() -> MonoidSiloSpec
The spec command allows a data silo to broadcast information about itself and how it can be configured.
Input:
- None
 
Output:
spec-- aMonoidSiloSpecwrapped in aMonoidMessageof typeSPEC.
Validate
validate(Config) -> MonoidValidateMessage
The validate command indicates whether a configuration is valid or not.
Input:
config-- a JSON object that conforms to the JSON Schema spec provided by thespecfunction. The configuration should include any API keys/passwords and other data required to connect to the silo.
Output:
MonoidValidateMessage-- a message that indicates whether or not the configuration is valid. If the configuration is valid, thestatusfield will beSUCCESS, otherwise, it will beFAILURE. Themessagefield should contain any error message if the validation fails.
Query
query(Config, MonoidQuery) -> MonoidStream
The query command is used to pull data from the silo for a data right-to-know request. Depending
on the type of silo, the types of results can vary.
Input:
Config-- The silo config.MonoidQuery-- Information about the query is provided here. This variable an array ofMonoidQueryIdentifiers, each of which includes information about the data source to query, the field to query by, and the schema of the output. Some of this information may be irrelevant for some silos -- for example, a SaaS service that offers its own data export features may not use the output schema fields (see the Mixpanel Connector for an example of this).
Output:
MonoidStream-- provides a handle that can be used to access the stream, along with the stream status. If the stream status isCOMPLETE, you must provide adataType, as well.
Postgres Example
When you run a query on a postgres database, the stream that will be returned is automatically complete, with the monoid query as the data field of the handle.
{
    status: {
        status: 'COMPLETE'
        dataType: 'RECORDS'
    },
    handle: {
        data: {
            request: 'query'
            query: ...MonoidQuery
        }
    }
}
This is because retrieving data from postgres is fully synchronous -- we can directly read from the stream of records
from the database after running a query. This, however, may not be the case when you run query on a SaaS data silo, like Mixpanel, since they may not be able to generate a complete picture of your data on-demand.
Mixpanel Example
Initially, the result of a mixpanel query operation will look something like this:
{
    status: {
        status: 'PROCESSING'
    },
    handle: {
        data: {
            requestId: ...
        }
    }
}
You can pass the handle directly to the stream_status function to get updates as to the status of the
request, and when the status becomes COMPLETE, you can call stream_read to get the zip file of the data
provided by mixpanel. Of course, most of these calls are abstracted by the Monoid platform -- you just have
to fill in the implementation if you're writing new data silo integrations.
Delete
delete(Config, PersistenceConfig, MonoidQuery) -> MonoidStream
The delete command is used to run a right-to-delete request.
Input:
Config-- The silo config.PersistenceConfig-- Information about a file store that can be used to persist data that can be used to store data for further processing. For example, a connector that parses unstructured data and stores pointers to where data that may potentially need to be deleted exists.MonoidQuery-- Information about the data to delete.
Output:
MonoidStream-- provides a handle that can be used to access the stream, along with the stream status. The resultingMonoidStreammay be complete with adataTypeofNONEfrequently, as the data has been deleted.
Scan
scan(Config, PersistenceConfig, MonoidSchemasMessage) -> Stream<MonoidRecord>
The scan command is used to iterate over a subset of the records in the data source.
This can be a no-op if the data silo doesn't need to be scanned, but it's highly recommended that
your connectors implement this function.
Input:
Config-- The silo config.PersistenceConfig-- The persistence config.MonoidSchemasMessage-- The schemas that need to be scanned and outputted.
Output:
Stream<MonoidRecord>-- A stream of records that are being scanned.
Stream Status
stream_status(Config, MonoidStreamHandle) -> MonoidStreamStatus
The stream_status get the status of a MonoidStream
Input:
Config-- The silo config.PersistenceConfig-- The persistence config.MonoidStreamHandle-- Represents the data that can be used to retrieve data from a stream.
Output:
MonoidStreamStatus-- An object that describes information about the stream. If the stream is ready to read from, thestatusfield will beCOMPLETE. ThedataTypefield indicates the type of data that should be expected fromstream_read.
Stream Read
stream_read(Config, PersistenceConfig, MonoidStreamHandle) -> Stream<MonoidRecord> | MonoidFile
The stream_read command is used to read data from a MonoidStream.
Input:
Config-- The silo config.PersistenceConfig-- The persistence config.MonoidStreamHandle-- Represents the data that can be used to retrieve data from a stream.
Output:
Stream<MonoidRecord> | MonoidFile-- If the stream'sdataTypeisRECORDS, this function will return a stream of records. If the stream'sdataTypeisFILE, this must return the path to the file on thePersistenceConfiginstance. Do not output URLs that are not localized to the data layer described by thePersistenceConfig, because the platform will not be able to read it.
Monoid Protocol Types
The JSON schema definitions of these types can be found in the Monoid Github repository. Please refer to those files for a more comprehensive reference as to how to use these types.
MonoidMessage
A MonoidMessage is a wrapper for all output generated by an integration. The type field of the
message is required, and the corresponding field of the MonoidMessage will be guaranteed to be defined.
MonoidStream
A MonoidStream message wraps a MonoidStreamStatus and a MonoidStreamHandle, and is the result
of a query or delete operation.
MonoidStreamStatus
A MonoidStreamStatus message includes two fields status (COMPLETE, PROGRESS, or FAILED), which indicates whether the stream has finished
processing, or is currently in progress, and a dataType, which is either NONE, FILE, or RECORDS. The
dataType field may be empty if the status is not complete.
MonoidStreamHandle
A MonoidStreamHandle includes a data field, which represents arbitrary data that can be specified by a
connector as a result of a query or delete operation.
PersistenceConfig
A PersistenceConfig object includes a directory field that provides a handle to the directory in which
you can write files. Before being persisted, the data will be encrypted by the Monoid Platform.