PureBytes Links
Trading Reference Links
|
> I wrote a Perl script, tailored to their format, to check
> for duplicate or out of order bars. I append it below, though with
> other data, you may need to alter the script for different formats.
Sanford is obviously a much studlier Perl hacker than I am. :-)
I have a rather more verbose script that works for TS-format
data, which I've improved based on ideas from his script.
Gary
==========================================================
# Check standard TS-format data for duplicate or out-of-order lines.
# Expected data format: MM/DD/YYYY,date,time,(etc)
# Handles 1-digit MM & DD, 2-digit YYYY, and :'s in time
# Usage: perl datatest.pl < inputfile
$errors = 0;
$lastdatenum = 0;
$lasttime = 0;
$linenum = 0;
while (defined($line = <STDIN>)) {
$linenum++;
if (index(lc($line), "date") >= 0) { # skip header line, if any
}
else {
($date, $time, $extra) = split(/,/, $line);
# Convert $date to YYYYMMDD format
$date =~ m;^(\d*)/(\d*)/(\d*)$; or die;
$mm = $1; $dd = $2; $yy = $3;
if ($yy < 1900) { $yy = ($yy < 20 ? '20'.$yy : '19'.$yy); }
$datenum = sprintf("%4d%02d%02d", $yy, $mm, $dd);
$time =~ s/://; # remove : from time
if ($datenum < $lastdatenum) {
print "Date out of order in line $linenum: \n $lastline $line";
$errors++;
}
if (($datenum == $lastdatenum) && ($time == $lasttime)) {
print "Repeated time in line $linenum:\n $lastline $line";
$errors++;
}
if (($datenum == $lastdatenum) && ($time < $lasttime)) {
print "Time out of order in line $linenum:\n $lastline $line";
$errors++;
}
}
$lastline = $line;
$lastdatenum = $datenum;
$lasttime = $time;
}
if ($errors == 0) { print "No errors found!\n"; }
|