Documentation/How Tos/Calc: Database functions
List of Calc 'Database' functions
DCOUNT | DCOUNT counts the numerical entries in a column of a database table, that are in rows meeting the criteria. |
DCOUNTA | DCOUNTA counts the non-empty cells in a column of a database table, that are in rows meeting the criteria. |
DGET | DGET returns the contents of a cell in a column of a database table, in the unique row meeting the criteria. |
DMAX | DMAX returns the largest value in a column of a database table, in rows that meet the criteria. |
DMIN | DMIN returns the smallest value in a column of a database table, in rows that meet the criteria. |
DAVERAGE | DAVERAGE returns the average of the entries in a column of a database table, in rows that meet the criteria. |
DPRODUCT | DPRODUCT returns the product of the entries in a column of a database table, in rows that meet the criteria. |
DSTDEV | DSTDEV calculates the standard deviation of the entries in a column of a database table, in rows that meet the criteria. The records are treated as a sample, not as the whole population. |
DSTDEVP | DSTDEVP calculates the standard deviation of the entries in a column of a database table, in rows that meet the criteria. The records are treated as the whole population, not as a sample of the population. |
DSUM | DSUM returns the total of the entries in a column of a database table, in rows that meet the criteria. |
DVAR | DVAR returns the variance of the entries in a column of a database table, in rows that meet the criteria. The records are treated as a sample, not as the whole population. |
DVARP | DVARP calculates the variance of the entries in a column of a database table, in rows that meet the criteria. The records are treated as the whole population, not as a sample of the population. |
Overview
In the Apache OpenOffice spreadsheet 'Calc', a 'database' is simply a table of values, and has nothing to do with the more complex Apache OpenOffice database 'Base'. The 'database' functions in Calc (listed here) are quite simple to use, and allow you to choose and process data in a table.
A Calc 'database' table might look like this:
The first row of the 'database' table has headings (Name, Grade, Age … ), and each subsequent row contains the data values.
The 'criteria', which are used to select rows from the 'database', are entered in another table:
The first row of this table has headings. Subsequent rows specify the criteria: for example, using the criteria table above will match those children whose distance to school is more than 600.
The Calc 'database' functions all have a similar form. As a simple first example, using the tables above:
DCOUNT(A1:E10; 0; A13:E14)
- returns 5, the number of children whose “Distance to School” is more than 600.
- A1:E10 is the database table and A13:E14 is the criteria table.
There are other 'database' functions to return a sum, an average, a standard deviation, and so on.
Criteria
The criteria table may be placed anywhere on the spreadsheet, but it is often wise to have it near the database table.
The criteria table headings must exactly match headings in the database table, but they can appear in any order, and can appear more than once:
All criteria on a row must be satisfied for the row to be satisfied, so in this example above we find rows which have:
- “Distance to School” more than 600 AND
- “Age” greater than 8 AND
- “Age” less than or equal to 10
in other words '9 and 10 year olds who travel more than 600 to school'.
If the criteria table has more than one row of criteria, any one satisfied row means the overall criteria table is satisfied:
In this example the test is:
- (“Distance to School” more than 600 AND
- “Age” greater than 8 AND
- “Age” less than or equal to 10)
- OR
- ( “Age” less than or equal to 8)
In other words we find rows where the children are either 'aged 9 or 10 with 600 or more to travel to school', or 'aged 8 or under'.
The condition which is entered in a cell of a criteria table (eg >4 ) is simply text and has the following form:
- comparator value where
- comparator is one of >, <, >=, <=, =, <> (if comparator is omitted = is assumed );
- value is the value (number or text) to be compared.
For example:
- the condition “>4” tests if the contents of cells are greater than 4.
- the condition “<lamp” tests if the contents of cells come alphabetically before lamp (so lady and ant meet the condition, but late and zebra do not).
- the condition “lamp” tests if the contents of cells are lamp - but Calc has a number of settings which define the exact behaviour - please read the following section:
Settings for text criteria
When matching a simple text condition, (such as “<lamp”) it can be important to check the settings on the Tools → Options → OpenOffice Calc → Calculate dialog:
Case sensitive
- this check box has no effect on Calc database functions. Case is ignored - so that “lamp” will always match lamp, Lamp and LAMP, etc.
Search criteria = and <> must apply to whole cells
- if this check box is ticked, “lamp” will only match a cell containing just lamp. If this check box is not ticked, “lamp” will match lamp, clamp, lampoon, etc. - in other words it will match if lamp is found anywhere in the cell's text.
Enable regular expressions in formulas
- if this check box is ticked, then the condition is treated as a regular expression. For example the condition “l?amp” would match lamp and amp (in regular expressions “l?” means the “l” is optional). Regular expressions are similar to wildcards, but more powerful; they are described in the HowTo: Regular Expressions in Calc. Note that the Search criteria = and <> must apply to whole cells setting above also works when regular expressions are selected.
The functions results can depend on these settings - but unfortunately they are rather hidden away so that a user may be unaware if the settings are wrong.
There are two ways to combat this:
Firstly you can design the spreadsheet to work irrespective of the settings
- most easily, never do a text comparison - or if you do, always look for a whole cell match, and make sure that none of the cells that you check is a submatch of any other cell (eg if you search for 'apple' as the sole contents of a cell, make sure no other cell can contain 'crabapple'), AND
- do not use regular expressions AND
- do not use regular expressions special characters such as ., *, + [, { in criteria
Secondly you can include on your spreadsheet a warning if the settings are wrong.
- in cell A3 enter the text:
- Check:
- Now note that
- COUNTIF(A3;".*") returns 1 if regular expressions are turned on.
- COUNTIF(A3;"<>e") returns 1 if whole cell matching is turned on.
- and construct an appropriate formula in cell A4 - for example:
- To check for both regular expressions and whole cell matching:
- =IF( AND(COUNTIF(A3;".*"); COUNTIF(A3;"<>e")); "OK"; "Error: " & IF(COUNTIF(A3;".*")=0; "Turn on regular expressions. "; "") & IF(COUNTIF(A3;"<>e")=0; "Turn on whole cell matching."; "") )
- To check for regular expressions but no whole cell matching:
- =IF( AND(COUNTIF(A3;".*"); COUNTIF(A3;"<>e")=0); "OK"; "Error: " & IF(COUNTIF(A3;".*")=0; "Turn on regular expressions. "; "") & IF(COUNTIF(A3;"<>e"); "Turn off whole cell matching."; "") )
- To check for whole cell matching but no regular expressions:
- =IF( AND(COUNTIF(A3;".*")=0; COUNTIF(A3;"<>e")); "OK"; "Error: " & IF(COUNTIF(A3;".*"); "Turn off regular expressions. "; "") & IF(COUNTIF(A3;"<>e")=0; "Turn on whole cell matching."; "") )
- To check for neither regular expressions nor whole cell matching:
- =IF( AND(COUNTIF(A3;".*")=0; COUNTIF(A3;"<>e")=0); "OK"; "Error: " & IF(COUNTIF(A3;".*"); "Turn off regular expressions. "; "") & IF(COUNTIF(A3;"<>e"); "Turn off whole cell matching."; "") )
Tips and Tricks
- To make sure that the criteria table headings exactly match the database table headings, you can either copy and paste, or use '='. For example, in the table above =D1 will give 'Distance to School'
- If the criteria do not seem to work, make sure that any apparently empty cells really are empty by selecting them and deleting.
See Also