csv: a vocabulary for describing CSV files

Rurik Thomas Greenall (2014-01-26)
This document describes a vocabulary for describing CSV (and other column-oriented) files.
The vocabulary is provided under the ODC-PDDL.

Representations: HTML | RDF+XML | Turtle

Namespace

The URI for this vocabulary is
http://www.ntnu.no/ub/data/csv/
The suggested prefix for this vocabulary is
csv

Terms

TermURIDescription
ColumnOrientedDocumenthttp://www.ntnu.no/ub/data/csv#ColumnOrientedDocument
CsvDocumenthttp://www.ntnu.no/ub/data/csv#CsvDocument
Columnhttp://www.ntnu.no/ub/data/csv#Column
Cellhttp://www.ntnu.no/ub/data/csv#Cell
hasColumnhttp://www.ntnu.no/ub/data/csv#hasColumn
hasCellhttp://www.ntnu.no/ub/data/csv#hasCell
hasIndexhttp://www.ntnu.no/ub/data/csv#hasIndex
hasCharacterEncodinghttp://www.ntnu.no/ub/data/csv#hasCharacterEncoding
encodesLinebreaksAshttp://www.ntnu.no/ub/data/csv#encodesLinebreaksAs
hasHeaderLinehttp://www.ntnu.no/ub/data/csv#hasHeaderLine
hasColumnIndexhttp://www.ntnu.no/ub/data/csv#hasColumnIndex
hasRowIndexhttp://www.ntnu.no/ub/data/csv#hasRowIndex
mapsTohttp://www.ntnu.no/ub/data/csv#mapsTo
hasMultivalueSeparatorhttp://www.ntnu.no/ub/data/csv#hasMultivalueSeparator

Properties and classes

ColumnOrientedDocument

A column-oriented document, typically a spreadsheet or data table
type: rdfs:Class
subclass of: foaf:Document
term status: stable

CsvDocument

A CSV document broadly conforming to IETF's RFC4180 (http://tools.ietf.org/html/rfc4180); csv:hasEscapeSymbol and csv:hasColumnDelimiter are both set for this class (\ and , respectively
type: rdfs:Class
subclass of: ColumnOrientedDocument
term status: stable

Column

A column in the document
type: rdfs:Class
status: stable

Cell

[DEPRECATED: it wasn't considered useful to describe minute detail, it made more sense to provide an RDF representation of the data with links to the CSV] A cell in the document.
type: rdfs:Class
status: archaic

hasColumnDelimiter

Explicit statement of which symbol is used to delimit columns in rows
type: rdfs:Property
domain: ColumnOrientedDocument
range: literal
status: stable

hasEscapeSymbol

Explicit statement of which symbol is used to escape characters in data
type: rdfs:Property
domain: ColumnOrientedDocument
range: literal
status: stable

hasColumn

Denotes the relationship between a CSV document and a column within that document.
type: rdfs:Property
domain: CsvDocument
range: Column
status: stable

hasCell

[DEPRECATED: it wasn't considered useful to describe minute detail, it made more sense to provide an RDF representation of the data with links to the CSV] Denotes the relationship between a column and a cell within that column.
type: rdfs:Property
domain: Column
range: Cell
status: archaic

hasIndex

An index that denotes the position of a column in a document, numbered from left to right from 1 and sequentially upwards.
type: rdfs:Property
domain: Column
range: nonNegativeInteger
status: stable

hasCharacterEncoding

Description of character encoding of a document, for example UTF-8, US-ASCII, ISO8859-1. This information corresponds to the optional MIME parameter 'charset' defined in RFC4180.
type: rdfs:Property
domain: CsvDocument
range: literal
status: stable

encodesLinebreaksAs

Description of how linebreaks are encoded in a document, a value such as LF, CRLF or CR+LF is expected.
type: rdfs:Property
domain: CsvDocument
range: literal
status: stable

hasHeaderLine

Description of whether or not the first line of the document contains column headers; a boolean is expected. 'false' indicates that there is no header, while 'true' indicates that there is a header and that data thereby begins at row 2. This information corresponds to the optional MIME parameter 'header' defined in RDF4180, where a boolean value 'false' here represents 'absent', while a boolean value 'true' represents 'present'.
type: rdfs:Property
domain: CsvDocument
range: boolean
status: stable

hasColumnIndex

[DEPRECATED: it wasn't considered useful to describe minute detail, it made more sense to provide an RDF representation of the data with links to the CSV] An index that denotes the position of a cell in a column.
type: rdfs:Property
domain: Cell
range: Column
status: archaic

hasRowIndex

[DEPRECATED: it wasn't considered useful to describe minute detail, it made more sense to provide an RDF representation of the data with links to the CSV] An index that denotes the position of a cell in a row.
type: rdfs:Property
domain: Cell
range: nonNegativeInteger
status: archaic

mapsTo

Which RDF class values in the column map to.
type: rdfs:Property
domain: Column
range: rdfs:Property
status: stable

hasMultivalueSeparator

In cases where columns contain multiple values, indicate the separator symbol as (escaped) text. Please note: we provide no solution to your obvious meatspace problem.
type: rdfs:Property
domain: Column
range: literal
status: stable

Usage example

For a file "file.txt", with the following structure:
	
First name,Last name,Age,Pets
John,Doe,38,"Cat,Dog"
Jane,Doe,31,"Dog,Parakeet"
Maxie,Doe,33,Mouse
Les,Doe,39,"Horse,Dog,Sausage"
	
Which we represent in the following way:
	
<http://example.com/file.txt> a csv:CsvDocument ;
  dcterms:title "people, ages and pets" ;
  dcterms:creator "J. Doe" ;
  dcterms:date "2011-04-21" ;
  csv:hasCharacterEncoding "ASCII" ;
  csv:encodesLinebreaksAs "CRLF" ;
  csv:hasHeader "true"^^^xsd:boolean ;
  csv:hasColumn :column1 ;
  csv:hasColumn :column2 ;
  csv:hasColumn :column3 ;
  csv:hasColumn :column4 .

:column1 a csv:Column ;
  rdfs:label "First name" ;
  rdfs:comment "Contains the first name of a person" ;
  csv:mapsTo foaf:givenName ;
  csv:hasIndex "1" .

:column2 a csv:Column ;
  rdfs:label "Last name" ;
  rdfs:comment "Contains the last name of a person" ;
  csv:mapsTo foaf:familyName ;
  csv:hasIndex "2" .

:column3 a csv:Column ;
  rdfs:label "Age" ;
  rdfs:comment "Contains the age of a person" ;
  csv:mapsTo foaf:age ;
  csv:hasIndex "3" .

:column4 a csv:Column ;
  rdfs:label "Pets" ;
  rdfs:comment "Contains the pets people own separated by commas (yes, a useful textual comment for your computer)" ;
  csv:mapsTo ex:pet ;
  csv:hasMultivalueSeparator "," ;
  csv:hasIndex "4" .