JDB

JDB is a package of commands for manipulating flat-ASCII databases from shell scripts. JDB is useful to process medium amounts of data (with very little data you'd do it by hand, with megabytes you might want a real database). JDB is very good at doing things like:

Rather than hand-code scripts to do each special case, JDB provides higher-level functions.

JDB is built on flat-ASCII databases. By storing data in simple text files and processing it with pipelines it is easy to experiment (in the shell) and look at the output. The original implementation of this idea was /rdb, a commercial product described in the book ``UNIX relational database management: application development in the UNIX environment'' by Rod Manis, Evan Schaffer, and Robert Jorgensen (and also at their web page). JDB is an incompatible re-implementation of their idea without any accelerated indexing or forms support. (But it's free!).

For more information, see the README file or download JDB 2.1 (released 6-Apr-08): Jdb-2.1.tar.gz, and in RPM format perl-Jdb-2.1-1.noarch.rpm, perl-Jdb-2.1-1.src.rpm.

Warning: the 2.0 series is an incompatible change from 1.x, and I still expect some format and API changes. I expect 2.2 or 2.3 to be feature complete and availabe mid-2008. Users of jdb-1.x may wish to stick with 1.15 for now. New users can choose which they prefer, since both are stable. download JDB 1.15 Jdb-1.15.tar.gz, and in RPM format perl-Jdb-1.15-1.noarch.rpm, perl-Jdb-1.15-1.src.rpm.

For announcements and discussion about JDB, please subscribe to the mailing lists: http://www.heidemann.la.ca.us/mailman/listinfo/jdb-talk jdb-announce jdb-talk.

(Download older releases in tar.gz format: 2.1, 2.0, 1.15, 1.14, 1.13, 1.12, 1.11, 1.10, 1.9, 1.8, 1.7, 1.6, 1.5, 1.4, 1.3, 1.2, 1.1, 1.0.)

Known Bugs

1.12 has a problem with trailing spaces in dbstats for data with a double-space field separator. (fixed in 1.13).

Change History

1.15, 12-Nov-07

- NEW: jdb-1.14 added to the MacOS Fink system
	.
	(Thanks to Lars Eggert for maintaining this port.)
- NEW: Jdb::IO::Reader and Jdb::IO::Writer now provide reasonably
	clean OO I/O interfaces to Jdb files.  Highly recommended
	if you use jdb directly from perl.  In the fullness of time
	I expect to reimplement the entire thing using these APIs
	to replace the current dblib.pl which is still hobbled by
	its roots in perl4.
- NEW: dbmapreduce now implements a Google-style map/reduce abstraction,
       generalizing dbmultistats.

- ENHANCEMENT: jdb now uses the Perl build system (Makefile.PL, etc.),
  	instead of autoconf.  This change paves the way to better 
	perl-5-style modularization, proper manual pages, input
	of both listize and colize format for every program, and
	world peace.
- ENHANCEMENT: dblib.pl is now moved to Jdb::Old.pm.

- BUG FIX: dbmultistats now propgates its format argument (-f).
	Bug and fix from Martin Lukac (thanks!).
- ENHANCEMENT: dbformmail documentation now is clearer that
	it doesn't send the mail, you have to run the shell script
	it writes.  (Problem observed by Unkyu Park.)
- ENHANCEMENT: adapted to autoconf-2.61 (and then these changes were
  	discared in favor of The Perl Way.
- BUG FIX: dbmultistats memory usage corrected (O(# tags), not O(1))
- ENHANCEMENT: dbmultistats can now optionally run with pre-grouped input
  	in O(1) memory
- ENHANCEMENT: dbroweval -N was finally implemented (eat comments)

1.14,  24-Aug-06

- ENHANCEMENT: README cleanup
- INCOMPATIBLE CHANGE: dbcolsplit renamed dbcolsplittocols
- NEW: dbcolsplittorows  split one column into multiple rows
- NEW: dbcolsregression compute linear regression and correlation for two columns
- ENHANCEMENT: cvs_to_db: better error handling, normalize field names, skip blank lines
- ENHANCEMENT: dbjoin now detects (and fails) if non-joined files have duplicate names
- BUG FIX: minor bug fixed in calculation of Student t-distributions
	(doesn't change any test output, but may have caused small errors)


1.13,  4-Feb-04

- NEW: jdb added to the freebsd ports tree
	
	maintainer: larse@isi.edu
- BUG FIX:  properly handle trailing spaces when data must be numeric
	(ex. dbstats with -FS, see test dbstats_trailing_spaces)
	Fix from Ning Xu .
- NEW: dbcolize error message improved (bug report from Terrence
	Brannon), and list format documented in the README.
- NEW: cgi_to_db convergs CGI.pm-format storage to jdb list format
- BUG FIX: handle numeric synonyms for column names in dbcol properly
- ENHANCEMENT: "talking about columns" section added to README.
	Lack of documentation pointed out by Lars Eggert.
- CHANGE: dbformmail now defaults to using Mail ("Berkeley Mail")
	to send mail, rather than sendmail (sendmail is still an option,
	but mail doesn't require running as root)
- NEW: on platforms that support it (i.e., with perl 5.8), jdb works
	fine with unicode
- NEW: dbfilevalidate: check a db file for some common errors


1.12,  30-Oct-02

- BUG FIX: dbmultistats documentation typo fixed
- NEW: dbcolmultiscale
- NEW: dbcol has -r option for "relaxed error checking"
- NEW: dbcolneaten has new -e option to strip end-of-line spaces
- NEW: dbrow finally has a -v option to negate the test
- BUG FIX: math bug in dbcoldiff fixed by Ashvin Goel
	*** need to check Scheaffer test cases
- BUG FIX: some patches to run with Perl 5.8
	Note: some programs (dbcolmultiscale, dbmultistats, dbrowsplituniq)
	generate warnings like:
		Use of uninitialized value in concatenation (.)
		or string at /usr/lib/perl5/5.8.0/FileCache.pm line 98,
		 line 2.
	Please ignore this until I figure out how to suppress it.
	(Thanks to Jerry Zhao for noticing perl-5.8 problems.)
- BUG FIX: fixed an autoconf problem where configure would fail
	to find a reasonable prefix (thanks to Fabio Silva
	for reporting the problem)
- NEW: db_to_html_table: simple conversion to html tables
	(NO fancy stuff)
- NEW: dblib now has a function dblib_text2html() that will
	do simple conversion of iso-8859-1 to HTML


1.11,  2-Nov-01

- BUG FIX: dbcolneaten now runs in constant memory
- NEW: dbcolneaten now supports "field specifiers" that
	allow some control over how wide columns should be
- OPTIMIZATION: dbsort now tries hard to be filesystem cache-friendly
	(inspired by "Information and Control in Gray-box Systems" by
	the Arpaci-Dusseau's at SOSP 2001)
- INTERNAL: t_distr now ported to perl5 module DbTDistr


1.10, 10-Apr-01

- BUG FIX: dbstats now handles the case where there are more n-tiles
	than data
- NEW: dbstats now includes a -S option to optimize work on
	pre-sorted data (inspired by code contributed by Haobo Yu)
- BUG FIX: dbsort now has a better estimate of memory usage when
	run on data with very short records (problem detected by Haobo Yu)
- BUG FIX: cleanup of temporary files is slightly better


1.9,  6-Nov-00

- NEW: dbfilesplit, split a single input file into mutliple output files
	(based on code contributed by Pavlin Radoslavov).

- BUG FIX: dbsort now works with perl-5.6


1.8, 28-Jun-00

- BUG FIX:  header options are now preserved when writing with dblistize

- NEW:  dbrowuniq now optionally checks for uniqueness only on certain fields

- NEW: dbrowsplituniq makes one pass through a file and splits it into
	separate files based on the given fields

- NEW:  converter for "crl" format network traces

- NEW:  anywhere you use arbitrary code (like dbroweval),
	_last_foo now maps to the last row's value for field _foo.

- OPTIMIZATION: comment processing slightly changed so that
	dbmultistats now is much faster on files with lots of comments
	(for example, ~100k lines of comments and 700 lines of data!)
	(Thanks to Graham Phillips for pointing out this performance
	problem.)

- BUG FIX: dbstats with median/quartiles now correctly handles singleton
	data points

1.7,  5-Jan-00

- NEW: dbcolize now detects and rejects lines that contain embedded
	copies of the field separator

- NEW: configure tries harder to prevent people from improperly 
	configuring/installing jdb

- NEW: tcpdump_to_db converter (incomplete)

- NEW: tabdelim_to_db converter:  from spreadsheet tab-delimited files to db

- NEW: mailing lists for jdb are
	jdb-announce@heidemann.la.ca.us and
	jdb-talk@heidemann.la.ca.us
     To subscribe to either, send mail to
	jdb-announce-request@heidemann.la.ca.us
	or jdb-talk-request@heidemann.la.ca.us.
     with "subscribe" in the BODY of the message.

- BUG FIX:  dbjoin used to produce incorrect output if there
	were extra, unmatched values in the 2nd table.
	Thanks to Graham Phillips for providing a test case.

- BUG FIX:  the sample commands in the usage strings
	now all should explicitly include the source of data
	(typically from "cat foo.jdb |").  Thanks to Ya Xu
	for pointing out this doucmentation deficiency.

- DOCUMENTATION BUG FIX: dbcolmovingstats had incorrect sample output.


1.6, 24-May-99
	- NEW: dbsort, dbstats, dbmultistats now run in constant memory
		(using tmp files if necessary)
	- NEW: dbcolmovingstats does moving means over a series of data
	- NEW: dbcol has a -v option to get all columns except those listed
	- NEW: dbmultistats does quartitles and medians
	- NEW: dbstripextraheaders now also cleans up bogus comments
		before the fist header
	- BUG FIX: dbcolneaten works better with double-space-separated data

1.5, 25-Jun-98
	- BUG FIX: dbcolhisto, dbcolpercentile now handles non-numeric
		values like dbstats
	- NEW: dbcolstats computes zscores and tscores over a column
	- NEW: dbcolscorrelate computes correlation coefficients
		between two columns
	- INTERNAL: ficus_getopt.pl has been replaced by DbGetopt.pm
	- BUG FIX: all tests are now ``portable'' (previously some tests
		ran only on my system)
	- BUG FIX: you no longer need to have the db programs in your path
		(fix arose from a discussion with Arkadi Gelfond)
	- BUG FIX: installation no longer uses cp -f (to work on SunOS 4)

1.4, 27-Mar-98
	- improves error messages
		(all should now report the program that makes the error)
	- fixed a bug in dbstats output when the mean is zero

1.3, 17-Mar-98
	- adds median and quartile options to dbstats
	- adds dmalloc_to_db converter
	- fixes some warnings
	- dbjoin now can run on unsorted input
	- fixes a dbjoin bug
	- some more tests in the test suite