ADDED www/html_node/Back_002dends.html Index: www/html_node/Back_002dends.html ================================================================== --- www/html_node/Back_002dends.html +++ www/html_node/Back_002dends.html @@ -0,0 +1,106 @@ + + + + + +zeptodb: Back-ends + + + + + + + + + + + + + + + + + + + + +
+

+Previous: , Up: Introduction   [Contents][Index]

+
+
+ +

1.2 Back-ends

+ +

By default, zeptodb uses the GNU dbm (GDBM) library to create and manipulate the DBM databases. +Alternatively, you may choose to use the +Kyoto Cabinet library +instead. This is specified by passing the +--with-kyotocabinet option to the configure script +before compiling zeptodb. +

+

Note that databases created with these two different back-ends are +not compatible, thus databases created with Kyoto Cabinet can +only be accessed by zeptodb if it has been compiled with support for +the library. +

+

Databases created with Kyoto Cabinet are required to have the +.kch file extension. By convention, databases created with +GDBM should have the .db file extension. +

+

For most purposes, databases created with GDBM should be sufficient. +For particularly large data sets, however, Kyoto Cabinet is +preferred, since it can add values more quickly and has a much larger +upper limit on the database size. On the other hand, Kyoto Cabinet is +not as widely available in GNU/Linux distributions as GDBM so it often +must be installed manually. +

+ + + + + ADDED www/html_node/Tutorial.html Index: www/html_node/Tutorial.html ================================================================== --- www/html_node/Tutorial.html +++ www/html_node/Tutorial.html @@ -0,0 +1,230 @@ + + + + + +zeptodb: Tutorial + + + + + + + + + + + + + + + + + + + + +
+

+Next: , Previous: , Up: Introduction   [Contents][Index]

+
+
+ +

1.1 Tutorial

+ +

The zeptodb tools are used to create small databases that are stored to +disk and then to store, fetch and remove records from those databases. +Note that these databases are much simpler than, say, SQL databases. +The databases follow the DBM format as created by the GDBM library +see Back-ends. Each record in a DBM database consists of a key and +a value. All keys and values are stored as plain text, regardless of +their formats. +

+

First, you create a new database with zdbc: +

+
+
$ zdbc foo.db
+
+ +

You can customize the creation of a database in two ways. The first is +by specifying the number of "buckets" that comprise the database, +specified via the -b/--num-buckets option. A DBM +database can be imagined as a series of buckets. When a new item is +added, an algorithm determines which bucket it belongs in based on its +value. Likewise, the same algorithm will be used in determining the +bucket from which to fetch an item. If each bucket only contains a +maximum of one item, then you are guaranteed to be able to find any item +in the same amount of time as any other item. On the other hand, if the +number of buckets is smaller than the number of items, then when you go +to fetch an item from a bucket, you might then have to search through +all the items in that bucket to find the one that you want. This might +slow you down. On the other hand, if the number of buckets is far +greater than the maximum number of items that will be added, the +algorithm will be wasteful. Thus it’s best to use a number of buckets +that will be slightly greater than the expected maximum number of items. +As a rule of them, use about four times more buckets. +

+

The second option is the size (in bytes) of the memory mapped region to +use, via the -m/--mmap-size option. While the +database is stored on the disk as a file, when it is opened by zeptodb, +some or all of that file is mapped in a one-to-one manner with a region +of virtual memory. Thus, when the program reads from some address in +that region of memory, it reads directly from the corresponding address +in the file. This will generally speed up reading and writing compared +to traditional file access. If the memory-mapped region is smaller than +the size of the database, only portions of the file can be mapped at a +time, thus slowing down performance. Therefore, it is recommended to +use a sufficiently larger value than the size of the database (taking +into account the expected number of records and the size of the data +that is expected to fill the record values). +

+

Thus, for a big database, you might do: +

+
+
$ zdbc --num-buckets=10000 --mmap-size=512000000 big.db
+
+ +

With the database created, you may now store values to it using +zdbs. zdbs normally takes its input from +stdin. It expects one record per line and for each key/value +pair to be separated by a delimiter character (’|’ by default). Note +that records are unique: an attempt to store a record with a +pre-existing key will overwrite that record with a new value. +

+

For example, let’s say that you have a text file emails.txt +containing the following records: +

+
+
Brandon|foo@example.com
+Joe|bar@example.com
+Mary|baz@example.com
+
+ +

You could store the records in foo.db like so: +

+
+
$ zdbs foo.db <emails.txt
+
+ +

Note that if you simply don’t like shell redirections like this, you can +also use the -i or --input option to specify the input +file: +

+
+
$ zdbs -i emails.txt foo.db
+
+ +

Of course, it’s more likely that you’ll want to pipe in records from +some other process: +

+
+
$ fancy_pipeline.sh | zdbs foo.db
+
+ +

If your records are formatted differently, using, say, ’-’ as the +delimiter (i.e "key-value"), you would specify it using the -d +or --delimiter option. +

+

Records can then be fetched from the database using zdbf. In +this case, queries in the form of keys with one key per line are read +from stdin: +

+
+
$ zdbf foo.db
+Brandon
+foo@example.com
+Joe
+bar@example.com
+Jon
+../trunk/src/zdbf: Key does not exist in the database: Jon: Invalid argument
+
+ +

As with zdbs, you can also specify a file containing the +queries using the -i option or you can read them in through a +pipe. +

+

If you would prefer the output to include the key, you must specify your +desired delimiter using the -d option: +

+
+
$ echo Brandon | zdbf -d':' foo.db
+Brandon:foo@example.com
+
+ +

Finally, you can dump out all of the contents of the database using the +-a option. Note that the order of the records is in no way +guaranteed. +

+
+
$ zdbf -d'|' -a foo.db
+Joe|bar@example.com
+Brandon|foo@example.com
+Mary|baz@example.com
+
+ +

Records may then be removed from the database using zdbr. +zdbr operates in a very similar way to zdbf: +

+
+
$ zdbr foo.db <<EOF
+> Brandon
+> Joe
+> EOF
+$ zdbf -a -d'|' foo.db
+Mary|baz@example.com
+
+ +
+
+

+Next: , Previous: , Up: Introduction   [Contents][Index]

+
+ + + + + ADDED www/html_node/zdbc.html Index: www/html_node/zdbc.html ================================================================== --- www/html_node/zdbc.html +++ www/html_node/zdbc.html @@ -0,0 +1,124 @@ + + + + + +zeptodb: zdbc + + + + + + + + + + + + + + + + + + + + +
+

+Next: , Previous: , Up: Commands   [Contents][Index]

+
+
+ +

2.1 zdbc

+ +

zdbc is used to create a new database file. It accepts two +options, one to choose the number of buckets for the database and the +other to choose the size of the memory-mapped region. These options +may only be set upon database creation and may not be altered later. +

+

As a general rule of thumb, you should have around one to four times +as many buckets as entries in the database. So, if your database will +have 200 entries, you should specify 200 to 800 buckets. A greater +number of buckets lowers the probability of collisions (two entries +mapping to the same location). +

+

If possible, you should set the size of the memory-mapped region (in +bytes) to be larger than the expected size of the database or +otherwise as large as possible. +

+
+
-b, --num-buckets=NUM
+

The number of buckets to use +

+
+
-m, --mmap-size=NUM
+

The size (in bytes) of the memory-mapped region to use +

+
+
-v, --verbose
+

Print more run-time information +

+
+
-?, --help
+

Show helpful information +

+
+
--usage
+

Show shorter helpful information +

+
+
-V, --version
+

Print the program version +

+
+ + + + + +