Not logged in

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:update documentation
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1:429fc028b5940dcc15deb48203c22f0b65ed0992
User & Date: brandon 2016-06-12 19:10:35
Context
2016-06-12
19:11
fix block-size short option check-in: 64bc518a88 user: brandon tags: trunk
19:10
update documentation check-in: 429fc028b5 user: brandon tags: trunk
17:49
update ChangeLog check-in: 2cad563fc9 user: brandon tags: trunk
Changes

Changes to doc/zeptodb.texi.

3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
..
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73

74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99



100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
...
258
259
260
261
262
263
264
265
266
267
268
269

270
271
272
273
274
275

276
277
278
279
280
281
282
283
284

285
286

287
288
289

290

291











292
293
294
295
296
297

298
299
300
301
302
303

304
305
306
307
308
309
310
311
312
313
314

315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342

343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436

437
438
439

440
441
442
443
444
445
446

447
448
449
450
451
452
453
@setfilename zeptodb.info
@include version.texi
@settitle zeptodb
@c %**end of header
@copying
This manual is for zeptodb (version @value{VERSION}, updated @value{UPDATED}).

Copyright @copyright{} 2013  Brandon Invergo

@quotation
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3
or any later version published by the Free Software Foundation;
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
A copy of the license is included in the section entitled ``GNU
................................................................................

@detailmenu
 --- The Detailed Node Listing ---

Introduction

* Tutorial::
* Back-ends::

Commands

* zdbc::
* zdbs::
* zdbf::
* zdbr::


Copying This Manual

* GNU Free Documentation License::  License for copying this manual.

@end detailmenu
@end menu

@node Introduction, Commands, Top, Top
@chapter Introduction

zeptodb is a small collection of relatively tiny command-line tools for
interacting with @dfn{DBM databases}.  For the uninitiated, DBM
databases are flat (non-relational) a databases; in other words, they
are persistent key-value hash tables. Typically they are created via a
library for C, Python, Perl, etc. These tools fill in a gap by providing
useful command-line tools. Some DBM libraries come with really basic
binaries for manipulating the databases, but they are not designed to be
very flexible or useful in the real world.

These tools may be helpful in scripts, for example, when persistant data
storage is needed but when a full database would be overkill.  DBM
databases offer a constant look-up time for any record in them, as
opposed to, say, searching through a text file, which scales linearly
with the number of lines in the file.  Thus, scripts requiring fast data
look-up would benefit greatly from them.  These commands may also be



useful if, for whatever reason, one would like to manipulate, via the
command-line or scripts, DBM databases created by other programs.

@menu
* Tutorial::
* Back-ends::
@end menu

@node  Tutorial, Back-ends, Introduction, Introduction
@section Tutorial

The zeptodb tools are used to create small databases that are stored to
disk and then to store, fetch and remove records from those databases.
Note that these databases are much simpler than, say, SQL databases.
The databases follow the DBM format as created by the GDBM library
(@pxref{Back-ends}).  Each record in a DBM database consists of a key and
a value.  All keys and values are stored as plain text, regardless of
their formats.

First, you create a new database with @command{zdbc}:

@example
$ zdbc foo.db
@end example

Note: the following two paragraphs contain technical information that is
only necessary if you will be creating large databases with many
records.  If that is not the case, you may safely skip them.

You can customize the creation of a database in two ways.  The first is
by specifying the number of @dfn{buckets} that comprise the database,
specified via the @option{-b}/@option{--num-buckets} option.  A DBM
database can be imagined as a series of buckets.  When a new item is
added, an algorithm determines which bucket it belongs in based on its
key.  Likewise, the same algorithm will be used in determining the
bucket from which to fetch an item.  If each bucket only contains a
maximum of one item, then you are guaranteed to be able to find any item
in the same amount of time as any other item.  On the other hand, if the
number of buckets is smaller than the number of items, then when you go
to fetch an item from a bucket, you might then have to search through
all the items in that bucket to find the one that you want.  This might
slow you down.  On the other hand, if the number of buckets is far
greater than the maximum number of items that will be added, the
algorithm will be wasteful.  Thus it's best to use a number of buckets
that will be slightly greater than the expected maximum number of items.
As a rule of thumb, use about four times more buckets.

The second option is the size (in bytes) of the memory mapped region to
use, via the @option{-m}/@option{--mmap-size} option.  While the
database is stored on the disk as a file, when it is opened by zeptodb,
some or all of that file is mapped in a one-to-one manner with a region
of virtual memory.  Thus, when the program reads from some address in
that region of memory, it reads directly from the corresponding address
in the file.  This will generally speed up reading and writing compared
to traditional file access.  If the memory-mapped region is smaller than
the size of the database, only portions of the file can be mapped at a
time, thus slowing down performance.  Therefore, it is recommended to
use a sufficiently larger value than the size of the database (taking
into account the expected number of records and the size of the data
that is expected to fill the record values).

Thus, for a big database, you might do:

@example
$ zdbc --num-buckets=10000 --mmap-size=512000000 big.db
@end example

With the database created, you may now store values to it using
@command{zdbs}.  @command{zdbs} normally takes its input from
@file{stdin}.  It expects one record per line and for each key/value
pair to be separated by a delimiter character ('|' by default).  Note
that records are unique: an attempt to store a record with a
pre-existing key will overwrite that record with a new value.

................................................................................
programs from the command-line, you are more likely to use them in
scripts.  For example, one script might save data to a database while
another script reads from that data.  You can even build up relations
between multiple databases, storing the keys of one database as values
in another database, allowing quite complex, but always fast, look-ups
within your scripts.

@node  Back-ends,  , Tutorial, Introduction
@section Back-ends

By default, zeptodb uses the @uref{http://www.gnu.org/software/gdb,
GNU dbm} (GDBM) library to create and manipulate the DBM databases.

Alternatively, you may choose to use the
@uref{http://fallabs.com/kyotocabinet/, Kyoto Cabinet} library
instead.  This is specified by passing the
@option{--with-kyotocabinet} option to the @file{configure} script
before compiling zeptodb.


Note that databases created with these two different back-ends are
@emph{not} compatible, thus databases created with Kyoto Cabinet can
only be accessed by zeptodb if it has been compiled with support for
the library.

Databases created with Kyoto Cabinet are required to have the
@file{.kch} file extension.  By convention, databases created with
GDBM should have the @file{.db} file extension.


For most purposes, databases created with GDBM should be sufficient.
For particularly large data sets, however, Kyoto Cabinet is

preferred, since it can add values more quickly and has a much larger
upper limit on the database size.  On the other hand, Kyoto Cabinet is
not as widely available in GNU/Linux distributions as GDBM so it often

must be installed manually.













@node Commands, Copying This Manual, Introduction, Top
@chapter Commands

Four commands are provided with zeptodb: @command{zdbc}, for creating
databases, @command{zdbs} for storing records in them, @command{zdbf},
for fetching records, and @command{zdbr}, for removing records.


@menu
* zdbc::
* zdbs::
* zdbf::
* zdbr::

@end menu


@node zdbc, zdbs, Commands, Commands
@section zdbc

@command{zdbc} is used to create a new database file.  It accepts two
options, one to choose the number of buckets for the database and the
other to choose the size of the memory-mapped region.  These options
may only be set upon database creation and may not be altered later.


As a general rule of thumb, you should have around one to four times
as many buckets as entries in the database.  So, if your database will
have 200 entries, you should specify 200 to 800 buckets.  A greater
number of buckets lowers the probability of collisions (two entries
mapping to the same location).

If possible, you should set the size of the memory-mapped region (in
bytes) to be larger than the expected size of the database or
otherwise as large as possible.

@table @option
@item -b, --num-buckets=NUM
The number of buckets to use

@item -m, --mmap-size=NUM
The size (in bytes) of the memory-mapped region to use

@item -v, --verbose
Print more run-time information

@item -?, --help
Show helpful information

@item --usage
Show shorter helpful information

@item -V, --version
Print the program version

@end table

@node zdbs, zdbf, zdbc, Commands
@section zdbs

@command{zdbs} is used to store records in a database file.  Records
are entered via @file{stdin} or, optionally, they are read from an
input file, with one record per line.  Each record should consist of
one key-value pair.  The values should be separated from the keys by a
common delimiter ('|' by default), for example ``key|value''.

In addition to the database file to be used, the @command{zdbs}
command accepts the following options:

@table @option
@item -d, --delim=CHAR
Delimiter character separating keys from values (default '|')

@item -i, --input=FILE
Read new records from a file instead of from @file{stdin}

@item -v, --verbose
Print more run-time information

@item -?, --help
Show helpful information

@item --usage
Show shorter helpful information

@item -V, --version
Print the program version
@end table

@node zdbf, zdbr, zdbs, Commands
@section zdbf

@command{zdbf} is used to fetch records from a database file.  Queries
are read from @file{stdin} or, optionally, from a text file.  Records
with key values that match the queries will be printed to
@file{stdout}.  By default, only the corresponding values will be
printed.  However, if a delimiter character is provided, both keys and
values will be printed.  Finally, an option is available to simply
print all records in the database.

In addition to the database file to be used, the @command{zdbf}
command accepts the following options:

@table @option
@item -a, --all
Fetch all the records in the database

@item -d, --delim=CHAR
Delimiter character to separate printed keys from values (default
none; only values will be printed)

@item -i, --input=FILE
Read queries from a file instead of from @file{stdin}

@item -v, --verbose
Print more run-time information

@item -?, --help
Show helpful information

@item --usage
Show shorter helpful information

@item -V, --version
Print the program version
@end table

@node zdbr,  , zdbf, Commands
@section zdbr

@command{zdbr} is used to remove records from a database.  The records
to be removed are specified by their keys and are entered via
@file{stdin} or, optionally, they are read from a text file.  If many
records are removed from the database, some fragmentation can occur.
In this case, it is advisable to reorganize the database, which is
possible via the @option{--reorganize} option.

In addition to the database file to be used, the @command{zdbf}
command accepts the following options:

@table @option
@item -i, --input=FILE
Read queries from a file instead of from @file{stdin}

@item -r, --reorganize
Reorganize the database

@item -v, --verbose
Print more run-time information


@item -?, --help
Show helpful information


@item --usage
Show shorter helpful information

@item -V, --version
Print the program version
@end table


@node Copying This Manual, Index, Commands, Top
@appendix Copying This Manual

@menu
* GNU Free Documentation License::  License for copying this manual.
@end menu







|







 







|







>











|
|
|
|
|
|
|
|

|
|
|

|
|
>
>
>
|
|



|


|


|
|
|
|
|
|
|







<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<







 







|
|

|
<
>
|
|
|
|
<

>
|
|
|
|

|
|
<

>
|
<
>
|
|
<
>
|
>

>
>
>
>
>
>
>
>
>
>
>



|

|
>






>






|
|
|
<

>
|
<
<
<
<

<
<
<
<

<
<
<
<
<
<
<
<
<
<
<
<
|
<
<
<
<
>











|
|



|


|

|
|
<
<
<
<
<
<
<
<
<













|
|



|



|


|
<
<
<
<
<
<
<
<
<
<
<
<


|









|
|



|


|

|
|
>

<
|
>

<
|
<
<
<
<
>







3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
..
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128










































129
130
131
132
133
134
135
...
220
221
222
223
224
225
226
227
228
229
230

231
232
233
234
235

236
237
238
239
240
241
242
243
244

245
246
247

248
249
250

251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288

289
290
291




292




293












294




295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318









319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344












345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370

371
372
373

374




375
376
377
378
379
380
381
382
@setfilename zeptodb.info
@include version.texi
@settitle zeptodb
@c %**end of header
@copying
This manual is for zeptodb (version @value{VERSION}, updated @value{UPDATED}).

Copyright @copyright{} 2013, 2016  Brandon Invergo

@quotation
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3
or any later version published by the Free Software Foundation;
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
A copy of the license is included in the section entitled ``GNU
................................................................................

@detailmenu
 --- The Detailed Node Listing ---

Introduction

* Tutorial::
* Common Options::

Commands

* zdbc::
* zdbs::
* zdbf::
* zdbr::
* zdbi::

Copying This Manual

* GNU Free Documentation License::  License for copying this manual.

@end detailmenu
@end menu

@node Introduction, Commands, Top, Top
@chapter Introduction

zeptodb is a small collection of relatively tiny command-line tools
for interacting with @dfn{DBM databases}.  DBM databases are flat
(non-relational) a databases; in other words, they are persistent
key-value hash tables. Typically they are created via a library for C,
Python, Perl, etc. These tools fill in a gap by providing useful
command-line tools. Some DBM libraries come with really basic binaries
for manipulating the databases, but they are not designed to be very
flexible or useful in the real world.

These tools may be helpful in scripts, for example, when persistant
data storage is needed but when a full database would be overkill.
DBM databases offer a constant look-up time for any record in them, as
opposed to, say, searching through a text file, which scales linearly
with the number of lines in the file.  Thus, scripts requiring fast
data look-up would benefit greatly from them (but note that, of
course, disk access is slower than memory access, so if you really
need the performance and you can fit your table in memory, these are
not the appropriate tools).  These commands may also be useful if, for
whatever reason, one would like to manipulate, via the command-line or
scripts, DBM databases created by other programs.

@menu
* Tutorial::
* Common Options::
@end menu

@node  Tutorial, Common Options, Introduction, Introduction
@section Tutorial

The zeptodb tools are used to create small databases that are stored
to disk and then to store, fetch and remove records from those
databases.  These databases are much simpler than, say, SQL databases,
so no queries need to be constructed.  The databases follow the DBM
format as created by the GDBM library.  Each record in a DBM database
consists of a key and a value.  All keys and values are stored as
plain text, regardless of their formats.

First, you create a new database with @command{zdbc}:

@example
$ zdbc foo.db
@end example











































With the database created, you may now store values to it using
@command{zdbs}.  @command{zdbs} normally takes its input from
@file{stdin}.  It expects one record per line and for each key/value
pair to be separated by a delimiter character ('|' by default).  Note
that records are unique: an attempt to store a record with a
pre-existing key will overwrite that record with a new value.

................................................................................
programs from the command-line, you are more likely to use them in
scripts.  For example, one script might save data to a database while
another script reads from that data.  You can even build up relations
between multiple databases, storing the keys of one database as values
in another database, allowing quite complex, but always fast, look-ups
within your scripts.

@node Common Options,  , Tutorial, Introduction
@section Common Options

The following options are available for all zeptodb commands.


@table @option
@item -b, --block-size=NUM
The block size (in bytes) to be used, representing the size of a
transfer from disk to memory.  The default value is 512.


@item -m, --mmap-size=NUM
The size (in bytes) of the memory-mapped region to be used.  With a
value greater than zero, a memory map of the database will be created;
thus the size specified must be large enough to fit the entire
database.

@item -c, --cache-size=NUM
The size (in bytes) of the bucket cache size to be used.


@item -l, --no-lock
Do not perform file locking an the database.


@item -n, --no-mmap
Do not create a memory map of the database.


@item -v, --verbose
Print more run-time information.

@item -?, --help
Show helpful information.

@item --usage
Show shorter helpful information.

@item -V, --version
Print the program version.
@end table


@node Commands, Copying This Manual, Introduction, Top
@chapter Commands

Five commands are provided with zeptodb: @command{zdbc}, for creating
databases, @command{zdbs} for storing records in them, @command{zdbf},
for fetching records, @command{zdbr}, for removing records, and
@command{zdbi} for displaying information about a database.

@menu
* zdbc::
* zdbs::
* zdbf::
* zdbr::
* zdbi::
@end menu


@node zdbc, zdbs, Commands, Commands
@section zdbc

@command{zdbc} is used to create a new database file.  It accepts all
of the common options.  Running the command on an existing database
will @emph{overwrite} the existing contents!


In addition to the database file to be used and the common options,
the @command{zdbc} command accepts the following options:









@table @option












@item -s, --sync




Automatically synchronize all database operations to the disk.
@end table

@node zdbs, zdbf, zdbc, Commands
@section zdbs

@command{zdbs} is used to store records in a database file.  Records
are entered via @file{stdin} or, optionally, they are read from an
input file, with one record per line.  Each record should consist of
one key-value pair.  The values should be separated from the keys by a
common delimiter ('|' by default), for example ``key|value''.

In addition to the database file to be used and the common options,
the @command{zdbs} command accepts the following options:

@table @option
@item -d, --delim=CHAR
Delimiter character separating keys from values (default '|').

@item -i, --input=FILE
Read new records from a file instead of from @file{stdin}.

@item -s, --sync
Automatically synchronize all database operations to the disk.









@end table

@node zdbf, zdbr, zdbs, Commands
@section zdbf

@command{zdbf} is used to fetch records from a database file.  Queries
are read from @file{stdin} or, optionally, from a text file.  Records
with key values that match the queries will be printed to
@file{stdout}.  By default, only the corresponding values will be
printed.  However, if a delimiter character is provided, both keys and
values will be printed.  Finally, an option is available to simply
print all records in the database.

In addition to the database file to be used and the common options,
the @command{zdbf} command accepts the following options:

@table @option
@item -a, --all
Fetch all the records in the database.

@item -d, --delim=CHAR
Delimiter character to separate printed keys from values (default
none; only values will be printed).

@item -i, --input=FILE
Read queries from a file instead of from @file{stdin}.












@end table

@node zdbr, zdbi, zdbf, Commands
@section zdbr

@command{zdbr} is used to remove records from a database.  The records
to be removed are specified by their keys and are entered via
@file{stdin} or, optionally, they are read from a text file.  If many
records are removed from the database, some fragmentation can occur.
In this case, it is advisable to reorganize the database, which is
possible via the @option{--reorganize} option.

In addition to the database file to be used and the common options,
the @command{zdbf} command accepts the following options:

@table @option
@item -i, --input=FILE
Read queries from a file instead of from @file{stdin}.

@item -r, --reorganize
Reorganize the database.

@item -s, --sync
Automatically synchronize all database operations to the disk.
@end table


@node zdbi,  , zdbr, Commands
@section zdbi


@command{zdbi} prints out information on a database file.  It accepts




the common options.

@node Copying This Manual, Index, Commands, Top
@appendix Copying This Manual

@menu
* GNU Free Documentation License::  License for copying this manual.
@end menu