com.penguinwerks.jodene.data
Class DataReader

java.lang.Object
  extended bycom.penguinwerks.jodene.data.DataReader
All Implemented Interfaces:
java.io.Serializable
Direct Known Subclasses:
FileDataReader

public abstract class DataReader
extends java.lang.Object
implements java.io.Serializable

This is an abstract data reader. A data reader will pull data from some source, like a file or JDBC connection, and return a data set. The input fields are optionally scaled or translated. For example, a data file with the fields
30, 0.005, green
12, 0.006, blue
87, 0.002, red
Could be translated to:
0.30, 0.5, 0.1, 0.9, 0.1
0.12, 0.6, 0.1, 0.1, 0.9
0.87, 0.2, 0.9, 0.1, 0.1

The translators handle the changing of text input to numeric input and scalers adjust the range of the numeric input. The raw input would first be translated to:
30.0, 0.005, 0.0, 1.0, 0.0
12.0, 0.006, 0.0, 0.0, 1.0
87.0, 0.002, 1.0, 0.0, 0.0

Then it would be scaled to:
0.30, 0.5, 0.1, 0.9, 0.1
0.12, 0.6, 0.1, 0.1, 0.9
0.87, 0.2, 0.9, 0.1, 0.1

Returning a raw list of values, however, would require the user to find a value remembering the position of the value. For example wieght is the 3rd value, and height is the 17th. Instead, the result of reading data is returned as a list of maps. The list preserves the order in which data was read. The rows in the list, however, are maps based on name-value pairs. The names are taken from the column definitions.

The basic operation of the data reader is first to define input columns. Any translations are defined at this point. The columns are defined, and any scaling to be applied to raw input values. The readValues method, which is implemented in a concrete subclass, returns name value pairs where the name is taken from the column definition.

Author:
Paul Hoehne
See Also:
Serialized Form

Constructor Summary
DataReader()
           
 
Method Summary
 void defineColumn(java.lang.String name)
          Define a column in the output set, that does not scale.
 void defineColumn(java.lang.String name, Scaler scaler)
          Define a new column in the output set with the given name and scaler.
 void defineRealInputField(java.lang.String name)
          Defines a real valued input field in the data set with the given name.
 void defineTranslatedInputField(java.lang.String name, Translator translator)
          Defines a translated input field with the given name.
 java.util.Map nameValues(double[] scaledLine)
          Assigns the name to values.
abstract  java.util.List readValues()
          This is an abstract method that is implemented by a particular data reader, for example a JDBC data reader would read from a database.
 double[] scaleLine(double[] rawLine)
          Performs the scaling on a line translated into doubles.
 double[] translateLine(java.lang.String[] aLine)
          Performs translation on a raw line.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DataReader

public DataReader()
Method Detail

defineColumn

public void defineColumn(java.lang.String name,
                         Scaler scaler)
Define a new column in the output set with the given name and scaler.

Parameters:
name - The column name
scaler - The scaler used to transform the column.

defineColumn

public void defineColumn(java.lang.String name)
Define a column in the output set, that does not scale.

Parameters:
name - The column name

defineRealInputField

public void defineRealInputField(java.lang.String name)
Defines a real valued input field in the data set with the given name.

Parameters:
name - The field name.

defineTranslatedInputField

public void defineTranslatedInputField(java.lang.String name,
                                       Translator translator)
Defines a translated input field with the given name. Normally translated input fields are used for classifiers. An example is translating a color to a set of K classes uses the KClassTranslator.

Parameters:
name - The name of the input field.
translator - The translator used to modify the value.

translateLine

public double[] translateLine(java.lang.String[] aLine)
                       throws DataReadingException
Performs translation on a raw line. When a line is read in from a data source it is parsed into an array of strings. The array of strings is translated into an array of doubles.

Parameters:
aLine - The array of strings
Returns:
The doubles after translating the strings.
Throws:
DataReadingException - Thrown if there is a problem reading the data.

scaleLine

public double[] scaleLine(double[] rawLine)
                   throws DataReadingException
Performs the scaling on a line translated into doubles. After a line is read and translated into an array of doubles, it is scaled accordingly. The scaling is defined using the defineColumn method.

Parameters:
rawLine - The raw double values read in.
Returns:
The scaled double values.
Throws:
DataReadingException - Thrown if there is a problem with data reading.

readValues

public abstract java.util.List readValues()
                                   throws DataReadingException
This is an abstract method that is implemented by a particular data reader, for example a JDBC data reader would read from a database. The result is a list of name-value pairs, where the names are assigned by the column names.

Returns:
A list of name-value pairs.
Throws:
DataReadingException - thrown if there is an error duing data reading.

nameValues

public java.util.Map nameValues(double[] scaledLine)
Assigns the name to values. The data reader's output consists of name-value pairs. This matches a value to a column name.

Parameters:
scaledLine - The inputs after scaling.
Returns:
The collection of doubles with their names attached.