Getting Started
This documentation describes a database called "comma-separated value store" or csvs.
The goal of csvs is to be accessible and approachable. An engineer should be able to write a csvs library in an evening, and a child should be able to glean the contents of the database by inspecting them with a text reader. A dataset should contain valuable data even after corruption and be easy to repair. Such transparency and naïvetee are of higher priority than processing and memory efficiency.
A csvs dataset is a directory that contains plain text files in the "comma-separated value" format, or CSV. Any directory that contains a .csvs.csv
file is a valid CSVS dataset. Each CSV file represents a table with two columns and is called a "tablet". The first column is a key
, and the second column is a value
. You can store records by appending lines to the tablets. To represent complex objects and connect the tablets to each other, specify the relationships between values in the schema file _-_.csv
.
Here's an example of the simplest CSVS dataset that contains a record about visiting Japan in 2001.
.csvs.csv
csvs,0.0.2
_-_.csv
event,date
event-date.csv
visited-japan,2001-01-01
To learn more about csvs, see the Tutorial and the User guides.
Tutorial
Here's an example of the simplest CSVS dataset that contains a record about visiting Japan in 2001.
.csvs.csv
csvs,0.0.2
_-_.csv
event,date
event-date.csv
visited Japan,2001-01-01
Technically, this dataset represents three records:
event
record that saysvisited Japan in 2001-01-01
date
record that says2001-01-01
- a
_
record, pronouncedschema record
, that says"event" has a "date"
Let's add another event about climbing the Everest in 2003
.csvs.csv
csvs,0.0.2
_-_.csv
event,date
event-date.csv
visited Japan,2001-01-01
climbed Everest,2003-03-03
Now, the dataset represents five records:
event
record that saysvisited Japan in 2001-01-01
event
record that saysclimbed Everest in 2003-03-03
date
record that says2001-01-01
date
record that says2003-03-03
- a
_
record, pronouncedschema record
, that says"event" has a "date"
Let's add another value to the database to show that events happened to different people.
.csvs.csv
csvs,0.0.2
_-_.csv
event,date
event,name
event-date.csv
visited Japan,2001-01-01
climbed Everest,2003-03-03
event-name.csv
visited Japan,Donell
climbed Everest,Eva
Finally, the dataset represents seven records:
event
record that saysDonell visited Japan in 2001-01-01
event
record that saysEva climbed Everest in 2003-03-03
date
record that says2001-01-01
date
record that says2003-03-03
name
record that saysEva
name
record that saysDonell
- a
_
record, pronouncedschema record
, that says"event" has a "date" and a "name"
To remove records from the dataset, delete corresponding lines from the tablets.
Learn more about csvs in the User guides.
User Guides
To learn more about csvs, see Design and Requirements.
Nested Records
branches can depend on each other. branch is a name for piece of structure inside a dataset. Imagine that a dataset is a grove of trees, and each tree is made up of branches connected to each other. Branch is called a trunk if it has leaves - branches that describe it. A branch without leaves is called a twig and sits at the very top of the tree. A branch that does not describe any other branch and thus does not have a trunk, is called a root and sits at the very bottom of a tree.
Let's add an age
to a name
of a person that experienced an event.
.csvs.csv
csvs,0.0.2
_-_.csv
event,date
event,name
name,age
event-date.csv
visited Japan,2001-01-01
climbed Everest,2003-03-03
event-name.csv
visited Japan,Donell
climbed Everest,Eva
name-age.csv
Donell,35
Eva,70
Now, let's add a favorite quote of each person, and the author of each quote
.csvs.csv
csvs,0.0.2
_-_.csv
event,date
event,name
name,age
name,quote
quote,author
event-date.csv
visited Japan,2001-01-01
climbed Everest,2003-03-03
event-name.csv
visited Japan,Donell
climbed Everest,Eva
name-age.csv
Donell,35
Eva,70
name-quote.csv
Donell,The only way to do great work is to love what you do
Eva,"Sometimes you need to scorch everything to the ground, and start over"
quote-author.csv
The only way to do great work is to love what you do,Donovan
"Sometimes you need to scorch everything to the ground, and start over",Celeste Ng
You can even define a recursive relation to specify the parent of each person
.csvs.csv
csvs,0.0.2
_-_.csv
event,date
event,name
name,age
name,parent
event-date.csv
visited Japan,2001-01-01
climbed Everest,2003-03-03
event-name.csv
visited Japan,Donell
climbed Everest,Eva
name-parent.csv
Donell,Jack
Donell,Jacqueline
Jack,Rona
Jack,Bernard
Jacqueline,Leif
Jacqueline,Fatuma
Eva,Ismail
Eva,Hauwa
Ismail,Nelson
Ismail,Dennis
Hauwa,Rabi
Hauwa,Louis
To learn more about csvs, see Design and Requirements.
Lists of Values
repeat a line with the same key to represent a list of values. For example, let's say Donell visited Japan every year for three years.
.csvs.csv
csvs,0.0.2
_-_.csv
event,date
event,name
branch,description
event-date.csv
visited Japan,2001-01-01
visited Japan,2002-02-02
visited Japan,2003-03-03
climbed Everest,2003-03-03
event-name.csv
visited Japan,Donell
climbed Everest,Eva
Notice that the year 2003 repeats two times - once in the list of events about Japan, and once in the list of events about Everest. Be careful when you add new branches that desribe the date "2003-03-03" - they might apply to both mentions!
To avoid conflicts, make sure to use unique identifiers when you want singleton values. For example, use a unique identifier as a key.
empty value is a value, one comma is empty key to an empty value
empty line is not a value
To learn more about csvs, see Design and Requirements.
Dataset Settings
if you want to store arbitrary settings, try to write them as branches.
you can also write new key-value pairs to the .csvs.csv
file.
to store credentials and sensitive data, don't write them to the dataset near other data
you can make a separate dataset for sensitive data with stricter security practices
if you use git for version control, you can write sensitive key-value pairs to .git/config
To learn more about csvs, see Design and Requirements.
Asset Storage
you can store media files in a folder in the dataset and address them in one of the tablets
.csvs.csv
csvs,0.0.2
_-_.csv
event, file
event-file.csv
visited Japan,IMG_0890.jpeg
img/IMG_0890.jpeg
)\ O_._._._A_._._._O /(
\`--.___,'=================`.___,--'/
\`--._.__ __._,--'/
\ ,. l`~~~~~~~~~~~~~~~'l ,. /
__ \||(_)!_!_!_.-._!_!_!(_)||/ __
\\`-.__ ||_|____!!_|;|_!!____|_|| __,-'//
\\ `==---='-----------'='-----------`=---==' //
| `--. ,--' |
\ ,.`~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~',. /
\|| ____,-------._,-------._,-------.____ ||/
||\|___!`======="!`======="!`======="!___|/||
|| |---||--------||-| | |-!!--------||---| ||
__O_____O_ll_lO_____O_____O|| |'|'| ||O_____O_____Ol_ll_O_____O__
o H o o H o o H o o H o o |-----------| o o H o o H o o H o o H o
___H_____H_____H_____H____O =========== O____H_____H_____H_____H___
/|=============|\
()______()______()______() '==== +-+ ====' ()______()______()______()
||{_}{_}||{_}{_}||{_}{_}/| ===== |_| ===== |\{_}{_}||{_}{_}||{_}{_}||
|| || || / |==== s( )s ====| \ || || ||
======================() ================= ()======================
----------------------/| ------------------- |\----------------------
/ |---------------------| \
-'--'--' () '---------------------' ()
/| ------------------------- |\ --'--'--'
--'--' / |---------------------------| \ '--'
() |___________________________| () '--'-
--'- /| _______________________________ |\
--' gpyy / |__________________________________| \
Art by Glory Moon
if you version control the dataset with git, you can add the asset folder to Large File Storage
git lfs install
git lfs track "img/**"
git add .gitattributes
git add img/IMG_0890.jpeg
git commit -m "add picture of Japan"
To learn more about csvs, see Design and Requirements.
Branch Metadata
If you want to describe each branch in detail, you can create a set of tablets for the "branch" branch.
.csvs.csv
csvs,0.0.2
_-_.csv
event,date
event,name
branch,description
branch-description.csv
event,something that happened
date,something happened at this time
name,something happened to this person
event-date.csv
visited Japan,2001-01-01
climbed Everest,2003-03-03
event-name.csv
visited Japan,Donell
climbed Everest,Eva
By default, a csvs dataset represents lists of objects of strings. You can define custom value types based on details about each branch. For example, define a tablet called branch-datatype.csv
and specify an age
branch with type number
. Just make sure to check that age
values are really numbers.
.csvs.csv
csvs,0.0.2
_-_.csv
event,date
event,name
name,age
branch,datatype
branch-datatype.csv
age,number
event-date.csv
visited Japan,2001-01-01
climbed Everest,2003-03-03
event-name.csv
visited Japan,Donell
climbed Everest,Eva
name-age.csv
Donell,35
Eva,70
To learn more about csvs, see Design and Requirements.
Writing a Client
Let's write a simple client for csvs. We will use pseudocode so you can follow along in your favorite language.
A client library defines three functions that mirror SQL's SELECT, UPDATE and DELETE commands.
# finds a branch key that is connected to `value`
select branch value =
# find a leaf of the branch
schema = parse "_-_.csv"
relation = find line in schema where line contains branch
tokens = split relation ","
leaf = tokens[1]
# find a value in the tablet where leaf matches value
tablet = parse "$branch-$leaf.csv"
relation = find line in tablet where line contains value
tokens = split relation ","
value = tokens[0]
return value
# adds a trunk key connected to a leaf value
update trunk leaf key value =
line = "$key,$value"
append "$trunk-$leaf" line
# removes the branch key and the connected leaf value
delete branch key =
# find a leaf of the branch
schema = parse "_-_.csv"
relation = find line in schema where line contains branch
tokens = split relation ","
leaf = tokens[1]
# find a value in the tablet where branch matches key
tablet = parse "$branch-$leaf.csv"
relation = find line in tablet where line contains key
# delete the line from the tablet
filter tablet relation
This can be implemented in an evening. A more carefully written client could be much more robust and efficient - try to write your own!
To learn more about csvs, see Design and Requirements.
Design
plain-text relational database
competes: recutils, sql
interacts: filesystem, clients, text editors
constitutes: a set of csv files
includes: config, schema, data
resembles: recutils
patterns:
stakeholders: fetsorn
Specification
Also see the Requirements.
CSVS File Format
This document specifies the CSVS file format.
"CSVS" stands for "Comma-Separated Value Store"
CSVS file format is a subset of RFC 4180. In cases where this document contradicts the RFC, RFC takes precedence and this document should be corrected.
A CSVS file MUST have UTF-8 encoding.
A CSVS file MUST have .csv file extension.
grammar
- newline: either Carriage Return 0x0D \r, Line Feed 0x0A \n, or both CR LF 0x0D 0x0A \r\n
- string: sequence of any utf8 characters, newlines MUST be escaped
- key: string
- value: string
- file:
[[key][,[value]]newline]
each line in csvs file MUST represent a relation between values of two collections
each line MUST contain zero, one, or two values separated by a comma
value that contains a comma, a newline or a doble quote MUST be escaped with double quotes ""
omitted value MUST represent an empty string
all characters between the first unescaped comma and an unescaped newline MUST be read as part of the second value
multiple identical lines MUST represent multiple unique relations between identical values
an exact duplicate of a line MUST represent two unique relations
a line that consists only of a newline character MUST represent a relation between one empty string value "" and another empty string value ""
the file in csvs format CAN be called a "tablet"
the first column CAN be called a "key"
the second column CAN be called a "value"
empty lines \n
MUST be ignored.
a trailling newline \n
MUST be ignored.
these are equivalent
,\n
"",\n
"",""\n
a line CAN have no comma.
these are equivalent
2024-01-01\n
2024-01-01,""\n
these are equivalent
"\n"\n
"\n",\n
examples
1,bob\n
: key is1
, value isbob
1,bob\\n\n
: key is1
, value isbob\n
,bob\n
: key is "", value isbob
1,\n
: key is1
, value is ""1\n
: key is1
, value is ""\n
: key is "", value is ""2,bob,alice\n
: key is2
, value isbob,alice
3,apple\n3,pear\n
: key is3
, values areapple
andpear
3,apple\n3,apple\n
: key is3
, values areapple
andapple
CSVS Dataset Format
This document specifies the CSVS dataset format.
"CSVS" stands for "Comma-Separated Value Store"
a csvs dataset represents relationships between collection values
terminology
each collection CAN be called a "branch", plural "branches"
an collection without attributes CAN be called a "twig", plural "twigs"
an collection with attributes CAN be called a "trunk", plural "trunks"
an collection that is an attribute of another collection CAN be called a "leaf", plural "leaves"
an collection that is not an attribute of any other collection CAN be called a "root", plural "roots"
a dataset CAN have multiple roots
a branch CAN have multiple trunks
a branch CAN have multiple leaves
.csvs.csv
a dataset MUST contain a tablet named.csvs.csv
which describes the dataset
tablet is for metadata
.csvs.csv
tablet MUST have a line csvs,0.0.2
this line is to support future breaking changes to the format.
_-_.csv
a dataset SHOULD contain a tablet named _-_.csv
which describes relationships between collections
reserved technical implementation details
underscroll-dash-underscroll
examples:
_-_.csv
:event,date
- dataset has an "event" collection with an attribute "date"
if there is no _-_.csv
tablet, dataset MUST be considered empty
an collection name MUST NOT be "_".
an collection name MUST NOT include the following characters: [/\<>':"```|?*-.,[];{}$&]
.
an collection name CAN include any of the following: [azAZ09_%+@]
, whitespace and other unicode characters
NOTE: when there's no _-_.csv
file, list directory and deduce relations from tablet names.
collection-collection.csv
underscore is like SQL table? underscore is not like SQL table? underscore is like MongoDB collection? underscore is not like MongoDB collection?
a dataset CAN have a tablet named {collection1}-{collection2}.csv
which describes relationships between values of two collections
contains values of two collections
"went to groceries" is an identificator here examples:
-
description-date.csv
:went to groceries,2024-01-01
-
description-date.csv
:went to groceries,2003-01-01
{ _: description, description: "went to groceries", date: [2024-01-01, 2003-01-01]} { _: date, date: "2024-01-01"} { _: date, date: "2003-01-01"} -
event-description
:0acab,went to groceries\n0abac,went to groceries
-
event-date
:0acab,2024-01-01\n0abac,2003-01-01
{ _: event, event: "0acab", description: "went to groceries", date: "2024-01-01"} { _: event, event: "0abac", description: "went to groceries", date: "2003-01-01"}
how to create two different values with the same text
a relation between collections MUST be listed in _-_.csv
a relation between collections CAN be recursive.
examples:
- collection "person" CAN have an attribute "person".
- collection "product" CAN have an attribute "competitor" which has an attribute "product".
notes for the dataset maintainer
to remove the value of collection from the dataset, prune the collection value from the tablet for each leaf of collection, {collection}-{leaf}.csv
csvs dataset SHOULD be version controlled.
credentials and other sensitive data SHOULD be stored separately from the dataset, e.g. in .git/config
or in another csvs dataset under access control.
binary blobs SHOULD be stored in a folder inside the dataset directory, and associate each blob with a value of collection filename
.
In datasets version controlled by git
, the asset directory SHOULD be filtered with git-lfs or git-index.
collection "text" with large multiline string values CAN be refactored into two collections - "text_hash" where each value is a hash of a text
value, to create a content-addressable index of text records.
CSVS File Format
This document specifies the CSVS file format.
"CSVS" stands for "Comma-Separated Value Store"
grammar: [(no comma) [comma (any utf8)] newline]
A CSVS file MUST have UTF-8 encoding.
A CSVS file MUST have .csv file extension.
A CSVS file MUST not start with a comma.
A CSVS file MUST consist of lines separated by a newline, either Carriage Return 0x0D \r, Line Feed 0x0A \n, or both CR LF 0x0D 0x0A \r\n.
Each line in a CSVS file CAN have no commas.
Each line in a CSVS file CAN have one comma, first comma in each line separates a KEY from VALUE.
only two columns, second optional
all following commas are part of value
unless specified in the extension, keys SHOULD be unique, if non-unique keys are found, only the first value MUST be treated, the rest matching uuids are to be discarded
csvs dataset format 0.0.1
This document specifies the CSVS dataset format.
"CSVS" stands for "Comma-Separated Value Store"
metadir.json
json object MUST have fields for each entity, fields of:
type of list: string, none
task of list: date, text, app-defined
trunk for relative table
description for UI labels
props/
MUST have a folder {prop} for each branch
folder MUST have csvs file "index.csv"
if type is string, value MUST be json-escaped
if type hash, there CAN be no value, and key MUST be treated as value
pairs/
MUST have {trunk}-{leaf}.csv for each branch with trunk
Requirements
shrug drive shed
- user must view dataset in a text editor
family park lazy
- user must version control the dataset
pistol puzzle own
- user must query with grep
adapt satoshi limb
- user must deduplicate values
render elegant inner
- user without a client should figure out the dataset structure
bubble immense boat
- developer without a client should figure out how to restore data
field woman slot
- developer must implement searches
fine unveil juice
- user must specify relations between entities
attract chief school
- user must store arrays, lists
raccoon leave turn
- user must store records, sets
will face innocent
- user must store escaped strings
cattle fork depth
- user must store similar data in efficient space
title shell snap
- user should specify foreign keys as sha256sum hashes of values
scout path cousin
- user should specify foreign keys as multiformats hashes of values
inch thunder bind
- user should store relative values
snack heavy square
- user should store escaped string
under pass useful
- user should store without duplication
deputy another health
- user must store values
bean weapon rely
- developer must extend
upon digital execute
- user must read plain text
layer blind glide
- user must read dataset files
stick congress label
- user must write dataset files
relax slender wise
- developer must easily reverse engineer
club step tennis
- developer must must easily support