Lattice: A Lisp Database System
The inspiration:
Why are we not out there to offer a real database system with Common Lisp datatypes instead of the tragic mess that SQL imposes on us in the C-based APIs out there (not to mention the XML calamity)?
—Erik Naggum, "Re: PART TWO: winning industrial-use of lisp: Re Norvig's latest paper on Lisp"
Project Overview
Create a database system that natively stores Lisp data types. While Dr. Naggum would have liked for this to be implemented in Common Lisp, this might be prototyped first in Racket.
Working name: lattice
Goals for this project:
- Operate on native Lisp data types. At a minimum: numbers, symbols, strings, and Lists. Also desirable are vectors and structures, and for Common Lisp, CLOS objects and keywords.
- Rethink what the interface to this database looks like. Think
past SQL or key-value store datastores: can Lisp-style
operations (
map
,filter
,reduce=/=fold
) be used as the query language? Perhaps a declarative logic-based language (of course I find out about Datalog while poking around the Racket docs late Sunday night)? While valuable, to what extent are we constrained to think of database systems in terms of SQL, key-value stores, and document stores? What should the database systems in 30 years look like? Will they still be SQL? Is it because the design has vetted itself so perfectly, or is it because we gave in to an incremental futures?
Some obvious and some not-so-obvious questions:
- How are items organised? How should they be named? Keeping in mind Joe Armstrong's "Use hashes for names, use DHTs for host discovery" in relation to distributed systems, how does this apply (does it apply?) to this system?
- How is data stored to disk? Do we take the Postgres-style approach or Mnesia's "Don't think about it—just indicate whether the store should back to disk at all".
- How does this relate to Datomic? Wasn't there a Lispish data store released in the past two (three?) years?
- How is this useful to non-Lisp programmers? It isn't, and they are outside the scope of the system. This project aims entirely to serve Lisp programmers. (But which ones?)
- How is data distributed? Does the system take the form of a ØMQ-based (nanomq perhaps?) or a DHT? Some other mechanism?
- How are the standard security expectations met? Things like access control for reads, writes, etc.; remote connections; cluster management.
Sketch 1: Hash-Table-esque Interface
This sketch imagines an interface that is presented similarly to hash tables in Common Lisp.
> (define network (list (node 'node-1 '()) (node 'node-2 '(node-1 node-3)) (node 'node-3 '(node-1 node-2)) (node 'node-4 '(node-3)) (node 'node-5 '(node-1)))) > (define datastore (connect-store "127.0.0.1" 4321)) > (store! (datastore-workspace datastore 'networks) network) > (query-datastore datastore (let ([hub (get-node 'node-1)]) (filter (λ (node) (linked? node hub)) (datastore-workspace datastore 'networks)))) (list (node 'node-2 '(node-1 node-3)) (node 'node-3 '(node-1 node-2)) (node 'node-5 '(node-1)))
Sketch 2: A very crude prototype of sorts
;;;; lattice is an imagining of what this data store might look like, ;;;; how users might interface with it. This prototype is based on ;;;; hash-tables, with reader serialisation to and from disk for ;;;; persistence --- a very crude beginning, but it's meant as the set ;;;; on a stage: it's no way to build a house, but we can talk about ;;;; living. (defvar *workspaces* (make-hash-table :test 'equal)) (defun create-workspace (name) "Create the workspace; this is primarily called inside store." (setf (gethash name *workspaces*) (make-hash-table :test 'equal))) (defun write-store (filespec) "Write the store to disk, overwriting any existing copy." (with-open-file (out filespec :direction :output :if-exists :supersede) (with-standard-io-syntax (write *workspaces* :stream out)))) (defun load-store (filespec) "Load the store from disk." (with-open-file (in filespec) (with-standard-io-syntax (setf *workspaces* (read in))))) (defun workspace-existsp (workspace) "Returns true if the workspace exists in the data store." (multiple-value-bind (workspace present) (gethash workspace *workspaces*) (declare (ignore workspace)) present)) (defun store (workspace name record) "store places the record under name into workspace. Any previous record stored under that name will be overwritten." (let ((workspace-ht (if (workspace-existsp workspace) (gethash workspace *workspaces*) (create-workspace workspace)))) (let ((rec-ht (setf (gethash name workspace-ht) (make-hash-table :test 'equal)))) (dolist (rec record) (format t "Store '~A' as '~A'...~%" (cadr rec) (car rec)) (setf (gethash (car rec) rec-ht) (cadr rec)))))) (defun lookup (workspace name) "Retrieve the named record from the workspace." (if (not (workspace-existsp workspace)) nil (let ((rec-ht (gethash name (gethash workspace *workspaces*)))) (loop for key being the hash-keys of rec-ht collecting (list key (gethash key rec-ht)))))) (defun filter (workspace search-key pred) "Filter applies pred to the key in all records in the workspace, returning a list of records that match." (if (not (workspace-existsp workspace)) nil (let ((lookup-key (lambda (key) (lookup workspace key))) (workspace (gethash workspace *workspaces*))) (loop for key being the hash-keys of workspace when (funcall pred (gethash search-key (gethash key workspace))) collecting (list key (funcall lookup-key key))))))
Here is a sample interaction with this datastore:
CL-USER> (defun store-hacker (name langs projects company location) (store 'hackers name `((:languages ,langs) (:projects ,projects) (:company ,company) (:location ,location)))) STORE-HACKER CL-USER> (store-hacker "kyle" '(:COMMON-LISP :RACKET :LFE :GO) '("dropsonde" "folio") "CloudFlare" "San Francisco") Store '(COMMON-LISP RACKET LFE GO)' as 'LANGUAGES'... Store '(dropsonde folio)' as 'PROJECTS'... Store 'CloudFlare' as 'COMPANY'... Store 'San Francisco' as 'LOCATION'... NIL CL-USER> (store-hacker "steve" '(:CLOJURE :RACKET :C-SHARP) '("goup" "rockets") "SFX" "NYC") Store '(CLOJURE RACKET C-SHARP)' as 'LANGUAGES'... Store '(goup rockets)' as 'PROJECTS'... Store 'SFX' as 'COMPANY'... Store 'NYC' as 'LOCATION'... NIL CL-USER> (store-hacker "jeremy" '(:CLOJURE :RACKET :FACTOR) '("parsers" "flobot") "SFX" "NYC") Store '(CLOJURE RACKET FACTOR)' as 'LANGUAGES'... Store '(parsers lime)' as 'PROJECTS'... Store 'SFX' as 'COMPANY'... Store 'NYC' as 'LOCATION'... NIL CL-USER> (filter 'hackers :location (lambda (loc) (equal loc "NYC"))) (("steve" ((:LANGUAGES (:CLOJURE :RACKET :C-SHARP)) (:PROJECTS ("goup" "rockets")) (:COMPANY "SFX") (:LOCATION "NYC"))) ("jeremy" ((:LANGUAGES (:CLOJURE :RACKET :FACTOR)) (:PROJECTS ("parsers" "lime")) (:COMPANY "SFX") (:LOCATION "NYC")))) CL-USER> (defun contains-p (key lst) (reduce (lambda (x y) (or x y)) (mapcar (lambda (x) (equal x key)) lst))) CONTAINS-P CL-USER> (filter 'hackers :languages (lambda (lst) (contains-p :CLOJURE lst))) (("steve" ((:LANGUAGES (:CLOJURE :RACKET :C-SHARP)) (:PROJECTS ("goup" "rockets")) (:COMPANY "SFX") (:LOCATION "NYC"))) ("jeremy" ((:LANGUAGES (:CLOJURE :RACKET :FACTOR)) (:PROJECTS ("parsers" "lime")) (:COMPANY "SFX") (:LOCATION "NYC")))) CL-USER> (filter 'hackers :languages (lambda (lst) (contains-p :racket lst))) (("kyle" ((:LANGUAGES (:COMMON-LISP :RACKET :LFE :GO)) (:PROJECTS ("dropsonde" "folio")) (:COMPANY "CloudFlare") (:LOCATION "San Francisco"))) ("steve" ((:LANGUAGES (:CLOJURE :RACKET :C-SHARP)) (:PROJECTS ("goup" "rockets")) (:COMPANY "SFX") (:LOCATION "NYC"))) ("jeremy" ((:LANGUAGES (:CLOJURE :RACKET :FACTOR)) (:PROJECTS ("parsers" "lime")) (:COMPANY "SFX") (:LOCATION "NYC")))) CL-USER> (filter 'hackers :languages (lambda (lst) (contains-p :go lst))) (("kyle" ((:LANGUAGES (:COMMON-LISP :RACKET :LFE :GO)) (:PROJECTS ("dropsonde" "folio")) (:COMPANY "CloudFlare") (:LOCATION "San Francisco"))))
Sketch 3: The Racket version
#lang racket #| lattice is an imagining of what this data store might look like, how users might interface with it. This prototype is based on hash-tables, with reader serialisation to and from disk for persistence --- a very crude beginning, but it's meant as the set on a stage: it's no way to build a house, but we can talk about living. The code aims to avoid exposing users to its implementation, and instead provides utilities to focus on the semantics. |# (provide write-store! load-store! store lookup select) (define WORKSPACES (make-hash)) ;; Store the global workspaces to file. (define (write-store! filespec) (write-to-file WORKSPACES filespec #:mode 'binary #:exists 'replace)) ;; Read the global workspaces from a file, preferably created with ;; load-store!. (define (load-store! filespec) (set! WORKSPACES (file->value filespec))) ;; Create a new workspace and return it; this is less-useful for ;; direct calling but is used internally. (define (create-workspace name) (hash-set! WORKSPACES name (make-hash)) (hash-ref WORKSPACES name)) ;; workspace-exists returns true if the workspace exists in the data ;; store. (define (workspace-exists? name) (hash-has-key? WORKSPACES name)) ;; store writes the record to the named workspace under the given ;; name. record should be an alist; e.g. ;; ;; (store 'users "John Q. Public" '((#:id 42) ;; (#:plan #:basic) ;; (#:email "jqp@example.net") ;; (#:documents ("7C19B" "23FA6" "48ABD")))) (define (store workspace name record) (let ([workspace-ht (if (workspace-exists? workspace) (hash-ref WORKSPACES workspace) (create-workspace workspace))]) (let ([record-ht (make-hash)]) (for ([rec record]) (hash-set! record-ht (car rec) (cadr rec))) (hash-set! workspace-ht name record-ht)))) ;; record-exists? returns true if a record named by name is present ;; in the named workspace. (define (record-exists? workspace name) (if (workspace-exists? workspace) (hash-has-key? (hash-ref WORKSPACES workspace) name) #f)) ;; Retrieve the record from the named workspace; if no such record ;; exists, return an empty list. (define (lookup workspace name) (if (and (workspace-exists? workspace) (record-exists? workspace name)) (hash->list (hash-ref (hash-ref WORKSPACES workspace) name)) '())) ;; Return all records in the workspace for which the application of ;; the predicate on the search-key for each record returns true. ;; ;; For example, given the user workspace and "John Q. Public" entry ;; from the comment on store: ;; ;; > (select 'users '#:id (λ (id) (equal? id 42))) ;; '(("John Q. Public" ;; (#:plan . #:basic) ;; (#:id . 42) ;; (#:documents "7C19B" "23FA6" "48ABD") ;; (#:email . "jqp@example.net"))) ;; ;; If we add some more users: ;; > (store 'users "Alyssa P. Hacker" '((#:id 314) ;; (#:plan #:basic) ;; (#:email "aph@example.net") ;; (#:documents ("87FAC" "19D1A" "86BC2")))) ;; > (store 'users "Ben Bitdiddle" '((#:id 17) ;; (#:plan #:pro) ;; (#:email "bb@example.net") ;; (#:documents ()))) ;; > (select 'users '#:plan (λ (plan) (equal? plan '#:basic))) ;; '(("Alyssa P. Hacker" ;; (#:plan . #:basic) ;; (#:id . 314) ;; (#:documents "87FAC" "19D1A" "86BC2") ;; (#:email . "aph@example.net")) ;; ("John Q. Public" ;; (#:plan . #:basic) ;; (#:id . 42) ;; (#:documents "7C19B" "23FA6" "48ABD") ;; (#:email . "jqp@example.net"))) (define (select workspace search-key pred) (when (workspace-exists? workspace) (let ([lookup-key (lambda (key) (lookup workspace key))] [workspace (hash-ref WORKSPACES workspace)]) (filter-not empty? (hash-map workspace (lambda (key value) (if (pred (hash-ref value search-key)) (cons key (hash->list value)) '())))))))
A sample interaction:
$ racket -i -t lattice.rkt Welcome to Racket v6.0.1. > (define (store-hacker name langs projects company location) (store 'hackers name `((#:languages ,langs) (#:projects ,projects) (#:company ,company) (#:location ,location)))) > (store-hacker "kyle" '(#:common-lisp #:racket #:lfe #:go) '("dropsonde" "folio") "CloudFlare" "San Francisco") > (store-hacker "steve" '(#:clojure #:racket #:c-sharp) '("goup" "rockets") "SFX" "NYC") > (store-hacker "jeremy" '(#:clojure #:racket #:factor) '("parsers" "lime") "SFX" "NYC") > (select 'hackers '#:location (λ (loc) (equal? loc "NYC"))) '(("steve" (#:location . "NYC") (#:company . "SFX") (#:projects "goup" "rockets") (#:languages #:clojure #:racket #:c-sharp)) ("jeremy" (#:location . "NYC") (#:company . "SFX") (#:projects "parsers" "lime") (#:languages #:clojure #:racket #:factor))) > (select 'hackers '#:languages (λ (languages) (member '#:racket languages))) '(("steve" (#:location . "NYC") (#:company . "SFX") (#:projects "goup" "rockets") (#:languages #:clojure #:racket #:c-sharp)) ("kyle" (#:location . "San Francisco") (#:company . "CloudFlare") (#:projects "dropsonde" "folio") (#:languages #:common-lisp #:racket #:lfe #:go)) ("jeremy" (#:location . "NYC") (#:company . "SFX") (#:projects "parsers" "lime") (#:languages #:clojure #:racket #:factor))) > (select 'hackers '#:languages (λ (languages) (member '#:go languages))) '(("kyle" (#:location . "San Francisco") (#:company . "CloudFlare") (#:projects "dropsonde" "folio") (#:languages #:common-lisp #:racket #:lfe #:go)))
Commentary on the sketches
Rather than focusing on the implementation mechanics of the sketch, I'd like to focus on what it provides and what sorts of interface questions and ideas it brings up.
There are some features that should be highlighted:
- The data store is persistable—though it should have a running thread to persist it regularly.
- The user doesn't know that this implementation uses hash-tables under the hood.
It does open up some questions, though.
- What are the alternatives to using alists?
- How would a user connect to a remote datastore?
- What sort of wire-protocol is sent?
Sketch 4: Illustrating an idea for client/server connections
This is implemented on some ugly syntax hacks—need to read more on syntax objects and suchforth in Racket. For example, select won't work because the predicate would need to be converted into a syntax object.
Server
#lang racket (require "lattice.rkt") (define-namespace-anchor a) (define ns (namespace-anchor->namespace a)) (define (serve port-no) (define listener (tcp-listen port-no 5 #t)) (define die #f) (define (loop) (accept-and-handle listener) (unless die (loop))) (thread loop) (lambda () (set! die #t))) (define (accept-and-handle listener) (define-values (in out) (tcp-accept listener)) (thread (lambda () (handle in out) (close-input-port in) (close-output-port out)))) (define last-request '()) (define (handle in out) (let* ([request (read in)]) (set! last-request request) (let ([request (read (open-input-string request))]) (define-values (command) (car request)) (set! last-request request) (cond [(equal? command 'store) (handle-request request out)] [(equal? command 'lookup) (handle-request request out)] [(equal? command 'select) (handle-request request out)] [else (fprintf out "((#:success #f) (#:result \"Unknown command ~A\"))" command) (set! last-request request)])))) (define (handle-request request out) (with-handlers ([exn:fail? (lambda (v) (eprintf "Error handling request: ~v~%" v) ( `((#:success #f) (#:result ,v)) out))]) (fprintf out "~v" (eval request ns)))) (define DEFAULT-PORT 4321) (define shutdown (serve DEFAULT-PORT))
Client
#lang racket (provide datastore%) (define datastore% (class object% (init host [port 4321]) (define ds-host host) (define ds-port port) (super-new) (define/public (store workspace name record) (send-request ds-host ds-port (format "(store ~v ~v ~v)" workspace name record))) (define/public (lookup workspace name) (send-request ds-host ds-port (format "(lookup ~v ~v)" workspace name))) (define/public (select workspace search-key pred) (send-request ds-host ds-port (format "(select ~v ~v ~v)" workspace search-key pred))))) (define (send-request host port request) (define-values (in out) (tcp-connect host port)) (write request out) (close-output-port out) (let ([response (read in)]) (close-input-port in) response)) (define (store-hacker ds name langs projects company location) (send ds store 'hackers name `((#:languages ,langs) (#:projects ,projects) (#:company ,company) (#:location ,location)))) (define (lookup-hacker ds name) (send ds lookup 'hackers name)) (define lh (new datastore% [host "127.0.0.1"])) (store-hacker lh "kyle" '(#:common-lisp #:racket #:lfe #:go) '("dropsonde" "folio") "CloudFlare" "San Francisco") (lookup-hacker lh "kyle")
Commentary:
In this model, s-expressions are encoded as strings and sent over the
network. I toyed briefly with Racket syntax objects, but that will
require a some more research to get working (and isn't necessarily
applicable to Common Lisp). This model means that the select
model
won't work over the network, as users would (should) pass in a
λ-form—which naturally requires lexical context. It quickly becomes
a remarkably difficult problem, but a fascinating one. This
generalised problem could be interesting in sending work to a remote
system… but that's an entirely different problem.