A Poor Man’s CSV Parser (in Java)

For one of my personal projects, I needed a small piece of code to fetch a well-formatted CSV file into memory. Since I didn’t want to have dependencies to external libraries, I wrote a small “Poor Man’s CSV parser”, which still is able to handle most of the rules of CSV files (like double quotes in quoted fields, optional quoting, etc.).

The code works very well, but clearly needs a better “exception handling”. In case the CSV file is not well-formatted, the parser doesn’t detect that yet. On the other hand, that’s nothing that cannot be handled with some flags and if-else statements in various corners :)

An example output:

-- Input:

>Lorem ipsum,dolor sit amet,"consectetur adipiscing, elit sed do eiusmod tempor", incididunt ut labore,"et dolore ""magna"" aliqua,↙
 Ut enim", ad minim,"",veniam quis nostrud, "exercitation↵ullamco"<

-- Output:

>Lorem ipsum<

>dolor sit amet<

>consectetur adipiscing, elit sed do eiusmod tempor<

> incididunt ut labore<

>et dolore "magna" aliqua, Ut enim<

> ad minim<

><

>veniam quis nostrud<

> exercitation
ullamco<

━━━━━━━━━━━━

-- Input:

>"Lorem ipsum,dolor sit amet,",consectetur adipiscing,"elit ,,sed, do eiusmod tempor, ""incid,,,idunt"" ut labore,et dolore magna aliqua,↙
 ",Ut enim, ad minim,",veniam ""quis"" nostrud,",",exercitation ullamco"<

-- Output:

>Lorem ipsum,dolor sit amet,<

>consectetur adipiscing<

>elit ,,sed, do eiusmod tempor, "incid,,,idunt" ut labore,et dolore magna aliqua, <

>Ut enim<

> ad minim<

>,veniam "quis" nostrud,<

>,exercitation ullamco<

The script:

package be.imifos.csvparser;

import java.util.ArrayList;
import java.util.List;

/**
 * Parses a single line of a CSV file.
 * Licence: Wineware, Freeware, Use it on your own risk.
 *
 * https://en.wikipedia.org/wiki/Comma-separated_values
 * - CSV is a delimited data format that has fields/columns separated by the comma character and records/rows terminated by newlines.
 * - A record ends at a line terminator. However, line-terminators can be embedded as data within fields, so software must recognize
 *   quoted line-separators in order to correctly assemble an entire record from perhaps multiple lines.
 *   (|_ Note: this will have to be handled by the loader, not my the line parser.)
 * - All records should have the same number of fields, in the same order.
 * - Data within fields is interpreted as a sequence of characters, not as a sequence of bits or bytes.
 * - Adjacent fields must be separated by a single comma.
 * - Any field may be quoted. Some fields must be quoted, as specified in following rules.
 * - Fields with embedded commas or double-quote characters must be quoted.
 * - Fields with embedded line breaks must be quoted.
 * - In CSV implementations that do trim leading or trailing spaces, fields with such spaces as meaningful data must be quoted.
 * - Double quotes are not allowed in unquoted fields.
 */
public class PoorMansCSVParser {

    /**
     * Parses a line of a CSV file and adds the fields to the list passed as input.
     */
    public static List<String> parse(String line, List<String> target) {

        StringBuilder sb=new StringBuilder();

        boolean betweenQuotes=false;
        boolean firstOfDouble=false;

        for (int i=0; i<line.length();i++)  {

            char c=line.charAt(i);

            if (c!='"') {
                firstOfDouble=false;
                if (betweenQuotes || (!betweenQuotes && c!=','))
                    sb.append(c);
            }

            if (c=='"' && !firstOfDouble) {
                firstOfDouble = true;
            }
            else if (c=='"' && firstOfDouble) {
                firstOfDouble=false;
                // condition: Special case of empty quoted field which is
                // treated like a double quote above.
                if (c=='"' && sb.length()!=0)
                    sb.append(c);
            }

            if (!betweenQuotes && c==',') {
                target.add(sb.toString()); // End of field
                sb.delete(0,sb.length());
            }

            if (c=='"')
                betweenQuotes=!betweenQuotes;
        }

        target.add(sb.toString());

        return target;
    }

    /**
     * Parses a line of a CSV file and returns the fields in a List.
     */
    public static List<String> parse(String line) {
        List<String> target=new ArrayList<>();
        return parse(line,target);
    }


    /**
     * Entry Point for testing.
     */
    public static void main(String [] args) {

        String str = "Lorem ipsum,dolor sit amet,\"consectetur adipiscing, elit sed do eiusmod tempor\", incididunt ut labore,\"et dolore \"\"magna\"\" aliqua, Ut enim\", ad minim,\"\",veniam quis nostrud, \"exercitation\nullamco\"";

        System.out.println("All   :>"+str+"<");
        for (String s:parse(str))
            System.out.println("Field:>"+s+"<");

        str = "\"Lorem ipsum,dolor sit amet,\",consectetur adipiscing,\"elit ,,sed, do eiusmod tempor, \"\"incid,,,idunt\"\" ut labore,et dolore magna aliqua, \",Ut enim, ad minim,\",veniam \"\"quis\"\" nostrud,\",\",exercitation ullamco\"";

        System.out.println("All   :>"+str+"<");
        for (String s:parse(str))
            System.out.println("Field:>"+s+"<");
    }
}

Leave a Reply

Your email address will not be published. Required fields are marked *