We're looking for more developers. Want to contribute? Join the Mailing List and introduce yourself, or contact Bruce Damer directly.
Latest News
ALife XII CompOrigins call for papers.

Prototype2009: Data Formats

From The EvoGrid: The Evolution Technology Grid

Jump to: navigation, search
A diagram showing where these data formats fit in the Protype2009 design
Enlarge
A diagram showing where these data formats fit in the Protype2009 design

Contents


Simulation History Storage

The format is JSON. For storage optimization, I suggest Gzip, as a good balance between disk size and CPU usage. Additionally, this can be fed directly into a HTTP response using Content-Encoding: gzip .

{
	"id" : 0,
	"natoms" : 2,	/* Convenience parameter that optimizes allocation of velocity and position arrays */
	"nframes" : 2,	/* Convenience parameter that optimizes allocation of frames */
	"atom type" : [ 0, 1 ],
	"frames" : 
		[
			{
				"position" : [	[0.0,0.0,0.0],
						[0.0,0.0,0.0] ],
				"velocity" : [	[0.0,0.0,0.0],
						[0.0,0.0,0.0] ],
				"nbonds" : 1,	/* Convenience parameter that optimizes allocation of bonds */
				"bonds" : [ [0, 1], ],
				"box" : [ 1, 1, 1]
			},
			{
				"position" : [	[0.0,0.0,0.0],
						[0.0,0.0,0.0] ],
				"velocity" : [	[0.0,0.0,0.0],
						[0.0,0.0,0.0] ],
				"nbonds" : 1,
				"bonds" : [ [0, 1], ]
				"box" : [ 1, 1, 1]
			}
		]
}
 

Transmission

Retrieving Simulation Parameters

This is a direct transliteration of Simulation Manager Database Design, and contains many GROMACS specific settings. These should be made more generic, in order to allow other simulators.

{
	/* Simulation Metadata */
	"id"			:	0,
	"parent simulation"	:	0,
	"priority"		:	0.0,
	"generator"		:	"Generator Identifier",
	"date generated"	:	"2009-12-15 21:56:35",
	/* Simulation functional data */
	"density"	:	0.1,	/* Used to generate atom positions if not specified. */
	"natoms"	:	1000,	/* Particle Count */
	"nsteps"	:	1000,	/* Number of loops inside GROMACS */
	"nloops"	:	1000,	/* Number of loops outside GROMACS */
	"natomtypes"	:	2,	/* Convenience parameter for allocating atom type information. */
	"temperature"		:	293.15,	/* Used to generate atom velocities if not specified. */
	"temperature couple"	:	true,	/* Uses temperature parameter. */
	"atom types"		:	[
		{
			/* Atom type 0 */
			"name"			:	"A"
			"ratio"			:	0.5,
			"radius"		:	0.1,	/* van der Waals radius */
			"volume"		:	-1.0,	/* Used only in QM, may not be needed here */
			"surface tension"	:	-1.0,	/* Used only in QM, may not be needed here */
			"mass"			:	0.6,
			"massB"			:	0.6,	/* Used only in QM, may not be needed here */
			"q"			:	1.2,
			"qB"			:	1.2,	/* Used only in QM, may not be needed here */
			"valence"		:	1,
			"electronegativity"	:	2.20,
			"forces"		:	[
				{
					/* Interaction between this type and type 0 */
					"c6"	:	0.1,	/* Used in non-bonded Lennard Jones force calculation */
					"c12"	:	0.1,	/* Used in non-bonded Lennard Jones force calculation */
					"bond type"	:	0,	/* 0 = cannot bond, 1 indicates can */
					"rA"		:	0.1,
					"krA"		:	0.1,
					"rB"		:	0.1,
					"krB"		:	0.1
				},
				{
					/* Interaction between this type and type 1 */
					"c6"	:	0.1,	/* Used in non-bonded Lennard Jones force calculation */
					"c12"	:	0.1,	/* Used in non-bonded Lennard Jones force calculation */
					"bond type"	:	0,
					"rA"		:	0.1,
					"krA"		:	0.1,
					"rB"		:	0.1,
					"krB"		:	0.1
				} ]
		},
		{
			/* Atom type 1 */
			"occurance ratio"	:	0.5,
			"decorative name"	:	"A",
			"radius"			:	0.1,
			"volume"			:	-1.0,	/* Used only in QM, may not be needed here */
			"surface tension"	:	-1.0,	/* Used only in QM, may not be needed here */
			"m"					:	0.6,
			"mB"				:	0.6,	/* Used only in QM, may not be needed here */
			"q"					:	1.2,
			"qB"				:	1.2,	/* Used only in QM, may not be needed here */
			"valence"		:	1,
			"electronegativity"	:	2.20,
			"forces"			:	[
				{
					/* Interaction between this type and type 0 */
					"c6"	:	0.1,	/* Used in non-bonded Lennard Jones force calculation */
					"c12"	:	0.1,	/* Used in non-bonded Lennard Jones force calculation */
					"bond type"	:	0,
					"rA"		:	0.1,
					"krA"		:	0.1,
					"rB"		:	0.1,
					"krB"		:	0.1
				},
				{
					/* Interaction between this type and type 1 */
					"c6"	:	0.1,	/* Used in non-bonded Lennard Jones force calculation */
					"c12"	:	0.1,	/* Used in non-bonded Lennard Jones force calculation */
					"bond type"	:	0,
					"rA"		:	0.1,
					"krA"		:	0.1,
					"rB"		:	0.1,
					"krB"		:	0.1
				} ]
		} ],
	"atom type"	: [ 0, 0, 0, 0 ],	/* This should contain natoms of values less then natomtypes. */
	"position"	: [	[0.0,0.0,0.0],	/* This should contain natoms of vectors. */
					[0.0,0.0,0.0] ],	/* Both atom type and position should be specified if either is to be used. If not specified, density and natoms is used to generate a random distribution. */
	"velocity"	: [	[0.0,0.0,0.0],	/* This should contain natoms of vectors. If not specified, temperature is used to generate a maxwell distribution. */
				[0.0,0.0,0.0] ],
	"nbonds"	: 1,	/* Convenience parameter that optimizes allocation of bonds */
	"bonds"		: [ [0, 1], ],
	"charge"	: [ 0, 0, 0, 0 ]	/* This should contain natoms of values, between -1.0 and 1.0 */
}
 

NOTE: As of 05:17, 30 December 2009 (UTC) , "date generated" is in the servers timezone. This should be forced to UTC.

The implementation is case sensitive.

Book keeping

  • id – internal identifier of the simulation. A unsigned integer.
  • parent simulation – Identifier of the simulation this was branched from (if submitted as part of a search).
  • priority – Effects the likelihood of it being simulated sooner. This is an open ended floating point value.
  • generator – Free form string
  • date generated – MySQL formatted date. In server time (currently configured to UTC)

Simulation wide data

  • density – Used when generating atom positions (if necessary). Using natoms and density, a box size is calculated that will result in the requested density, when filled evenly. Specifically:volume = natoms / densitybox size = volume ^ 1/3
  • natoms – Number of atoms in the simulation.
  • nsteps – Number of repetitions performed in GROMACS. These steps are currently hard coded to 0.001 of a second of simulated time. This amount of time was chosen arbitrarily.
  • nloops – Number of loops performed outside GROMACS. Each of these loops is where bond calculation is performed, and history data stored.
  • natomtypes – Number of atom types. Convenience parameter used for allocating memory and performing loops.
  • temperature – Initial temperature for the simulation, when generating atom velocities (if necessary). This is in Kelvins, and is used with the maxwell_speed function of the GROMACS API to generate atom velocities.
  • temperature couple – Specifies if the GROMACS temperature bath feature should be activated. If so, this causes the GROMACS to alter the simulation temperature over time to maintain an equilibrium. This should be enabled, to avoid run away thermal expansion due to implementation bug.
  • bond outer threshold – The maximum distance at which atom bonding can be performed. See Bond Formation for more information.

Atom Types

These properties are specified once per atom type. Most of these entries match entries in the GROMACS data structure.

  • name – Free form string.
  • ratio – Used when assigning atom types during initial condition generation (if necessary). The generator assigned up to ratio * natoms atoms of this type, for each ratio. If the totals are less then natoms, the remainder are assigned to the first type specified.
  • radius
  • volume – Used in QM. -1.0 indicates it should be ignored.
  • surface tension – Used in QM. -1.0 indicates it should be ignored.
  • mass
  • massB – Used in QM.
  • q
  • qB – used in QM.

Forces

For each Atom Type, the interaction behavior with each other type is specified. This means it has an exponential relationship with the number of atom types, e.g. 3 atom types means 9 forces, 4 atom types means 16 forces.

These entries match against entries in the GROMACS data structure.

  • c6 – Used in Lennard-Jones weak force interaction (between atoms not otherwise bonded).
  • c12 – Used in Lennard-Jones weak force interaction (between atoms not otherwise bonded).
  • bond type – Ignored. The bond type is hard coded to the GROMACS Harmonic type. This is used when a bond is specified between two atoms of these types.
  • rA – Parameter for the Harmonic bond.
  • krA – Parameter for the Harmonic bond.
  • rB – Parameter for the Harmonic bond.
  • krB – Parameter for the Harmonic bond.

Retrieving Simulation Statistics

An analysis client providing a Simulation Score will need one or more of the available Simulation Statistics.

{
	"nframes" : 2,
	"statistic name" : [ 0.0, 1.0, ],
	"other statistic name" : [ 0.0, 1.0 ]
}
 

There will be a mechanism for a client to request both the atom history and the statistics. This will return a merge of the two formats, as such:

{
	"natoms" : 2,	/* Convenience parameter that optimizes allocation of velocity and position arrays */
	"nframes" : 2,	/* Convenience parameter that optimizes allocation of frames */
	"frames" : 
		[
			{
				"position" : [	[0.0,0.0,0.0],
						[0.0,0.0,0.0] ],
				"velocity" : [	[0.0,0.0,0.0],
						[0.0,0.0,0.0] ],
				"bonds" : [ [0, 1], ]
			},
			{
				"position" : [	[0.0,0.0,0.0],
						[0.0,0.0,0.0] ],
				"velocity" : [	[0.0,0.0,0.0],
						[0.0,0.0,0.0] ],
				"bonds" : [ [0, 1], ]
			}
		]
	"statistic name" : [ 0.0, 1.0, ],
	"other statistic name" : [ 0.0, 1.0 ]
}
 

Why JSON?

It's standardized. It's human readable. It's extensible without breaking existing parsers. It's not as wordy as XML. It's platform agnostic. There are JSON parsers for almost every language.

Comparison to binary format

This test was performed with a file containing positions and velocities for 1000 atoms for 1000 frames, an average of 500 bonds (normally distributed random from 0 to 1000). The numbers for position and velocity are provided with 12 figures (e.g. 0.671783851307). There was some additional white-space in the JSON format, for human readability. The compression was done with maximum settings (using gzip and bzip2 on Debian). The Estimated Binary Format assumes a position of 3 floats, of 4 bytes each, velocity of the same size as position, and bonds of 2 integers, of 4 bytes.

Format Size Percentage
JSON 100,901,586 bytes 100%
JSON (Gzip) 44,237,505 bytes 43%
JSON (BZip2) 37,421,010 bytes 37%
Estimated Binary Format 26,000,000 bytes 25%

Compression tests were conducted on a randomly generated file the size of the Estimated Binary Format, sampled from /dev/urandom. Both gzip and bzip2 resulted in an increase in file size.

Personal tools