Delimited icon

Introduction to the CSV format

CSV (Comma-Separated Values) files are a simple and widely-used format for storing tabular data in plain text. Each line in a CSV file represents a row of data (otherwise known as a “record”), with individual fields separated by commas. Other delimiters, like tabs (in TSV files), can also be used.

Because CSV files are plain text, they can be opened and edited with any text editor, making them highly portable and compatible across different systems. However, this flexibility has led to inconsistent implementations over time. In an attempt to standardise the format, the Internet Engineering Task Force (IETF) published RFC 4180 - a technical standard that defines the commonly accepted format for CSV files. The standard aims to establish a uniform structure for creating and parsing CSV files, ensuring compatibility between different tools and systems. Delimited is built to follow the RFC 4180 specification, ensuring seamless compatibility with standard CSV files.

The RFC 4180 specification can be found online.


Definition of the CSV Format

  1. Each record is located on a separate line, delimited by a line break (CRLF). For example:

    aaa,bbb,ccc CRLF
    zzz,yyy,xxx CRLF
                            
  2. The last record in the file may or may not have an ending line break. For example:

    aaa,bbb,ccc CRLF
    zzz,yyy,xxx
                            
  3. There maybe an optional header line appearing as the first line of the file with the same format as normal record lines. This header will contain names corresponding to the fields in the file and should contain the same number of fields as the records in the rest of the file (the presence or absence of the header line should be indicated via the optional "header" parameter of this MIME type). For example:

    field_name,field_name,field_name CRLF
    aaa,bbb,ccc CRLF
    zzz,yyy,xxx CRLF
                            
  4. Within the header and each record, there may be one or more fields, separated by commas. Each line should contain the same number of fields throughout the file. Spaces are considered part of a field and should not be ignored. The last field in the record must not be followed by a comma. For example:

    aaa,bbb,ccc
                            
  5. Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields. For example:

    "aaa","bbb","ccc" CRLF
    zzz,yyy,xxx
                            
  6. Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes. For example:

    "aaa","b CRLF
    bb","ccc" CRLF
    zzz,yyy,xxx
                            
  7. If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:

    "aaa","b""bb","ccc"