Part 4. Keys, patterns, maps, and learning — the beginnings of representational semantics in data
Imagine what happens when you walk into a smart building, wielding your biometrics and wearing digital electronic devices — recording your environment with embedded sensors, smart glasses or pads. A torrent of information interactions and registrations is unleashed, leading to a virtual “wiring up” of devices with services for identification and authentication. It’s a lot of information, which could be represented by flowcharts, websites, data circuitry, supply chains, user logs, cartography, etc — in all cases, there are complex interactions between key-valuelook-ups, document archives, scratch-pads used by algorithms, and graphical maps of events play a basic role in both what we experience and create. In this post, I’ll explain (hands-on) the mechanics of representing spacetime processes as associative memory, both in Go code and in the ArangoDB database.
Space and time underlie all data
Pattern is the foundation of all information, and patterns need space. A pattern can be a simple bit (on or off), a complex string, one or more documents with structured fields, or a complex web of nodes expressing intricate relationships — and all changing in time. Without some kind of notion of space, there couldn’t be information. Space is what we use for memory. It’s the fabric of state.
We probably tend to think of space as the kind of open space we learn in school (Euclidean space), but there are other kinds too. If we ignore reserved mathematical terminology, then space is just a collection of distinct locations, without any assumed structure: a single bit (a light bulb), a row of buckets, a spider’s web, or an infinite chess board, are all versions of space. They contain degrees of freedom that can hold and express information.
Time, on the other hand, is the phenomenon of state changing, expressed at these locations. Changes to that state are known as a process’s proper time. So space and time are not independent things. Together, they describe processes (or spacetime phenomena) — and we strive to observe, understand, and evenintentionally manipulate these processes for a purpose. This purpose is what we refer to by semantics.
The options for representing space in IT begin with the smallest atoms: bits, from there words, and we are already into the realm of key-value stores. From associating names with value, we can form arrays of them, more structured packages like “documents” or “tables” (vectors and tuples), and finally graphsor networks with completely ad hoc structures. I’ll work through these in the languages of the chosen tools, starting with the basic atomic building blocks.
Key-Value pairs
Space is memory, and memory means “variables”. Key-value associations were called associative arrays in the heyday of Perl. Now they are more pretentiously called “maps” in Go, and in programming libraries. Famous cloud-embedded databases like Redis, Zookeeper, Consul, and etcd, specialize in key-values, but in fact every kind of memory is some form of key value store.
A key is a “data address” in the broadest sense (just a variable name), and a value is something we want to remember (variable contents). The keys and the values might have multiple lines, representing several “dimensions”. Local key stores like The Windows Registry, BerkeleyDB, SQLite, TokyoCabinet, and others, pioneered the use of structured data to store distributed configurations, logs, and data records locally on every host. ArangoDB also has it as a simple option too.
A key-value pair is the simplest kind of table or ledger too (see figure 1): a two column list, it has labels (names or numbers) as keys, and values (any kinds of data) as values corresponding to each key. It has no other structure. It could be dates and times, names and phone numbers, or whatever basic association we like. Key value lists are extremely useful for counting or jotting down values. If we treat them not as static lookup tables, but rather as dynamical living values, then we can also use them to update moving averages, current positions, scores, balances, and so on. A key-value list like figure 1 is basically a histogram.
Figure 1: Key-Value pairs are natural histograms. As we update the values, we learn more about the profile generated by the keys. It’s all about the interpretation.
We might think of a key-value database as an itemized list. It can be ordered somehow, e.g. by numbering the keys, like in a hotel, with rooms labelled by room number and floor coordinates (room,floor). Yes, a hotel is a key-value store for humans. However, in other examples, keys have no particular order (they could be just names), so to find the location of the data, we need to search through the data. If the hotel itself is a key-value store for humans by room number, the check-in ledger is a key-value store for rooms by name.
Key: Mr BurgessValue: Room 123, Suite with sea view, jacuzzi, and butler.
Searching linearly through a list is inefficient when the list is long, so one uses a hash table, in which a non-integer key is turned into the integer number of some reserved data slot — something like booking a hotel room and being assigned a room number. Hashes are usually integer numbers (say replacing “Mr Burgess” with 1836537), and expensive hashes are often used in cryptography (so they may be called cryptohashes). They aim to assign a unique number to any key string of data to help assign a non-ordered name to an ordered memory structure. There may be occasional accidental collisions between hashes (as when two people assigned the same room in a hotel) so we have to be careful and handle those cases in large tables.
Key-Value Maps in Go
A map in Go is a tool for using key-value stores. It’s what we used to call associative arrays in Perl. It’s what many technologists use Redis for. All databases are some variation on key-value stores.
Golang maps are associative arrays. One of the quirks of Go is that, somewhat inconsistently, we have to use the “make(data type)” memory allocation function in Go to set up structures that contain multiple key-value pairs. It’s one of those many ad hoc things that you trip over when using different languages, but it’s quickly learned, so not a show stopper. The following is how you could define an associative array to map integers to strings, and from strings to strings.
var mytable = make(map[int]string)mytable[4] = “coffee table”var association = make(map[string]string)association[“number”] = “an idea”fmt.Println(“test”, association[“number”])
Maps work as databases work for storing data, so they make a idea pairing.
As scientific users, who are not dedicated developers, we don’t really want the hassle of fussy programming APIs, especially for a database, as these are largely concerned with the details of exception handling. Robustness can be secondary to simplicity in a friendly environment. We’d like to use some simple layer of abstractions that we can trust to take care of these details in normal circumstances. So, let’s begin by building a layer on top of the flexible but fragile Go APIs for mapping and for ArangoDB to create robust and repeatable read/write functions.
Comments