如何确定 csv 文件字段是制表符分隔还是逗号分隔
2022-08-30 23:55:10
如何确定csv文件字段是制表符分隔还是逗号分隔。我需要 php 验证。任何人都可以帮忙。提前致谢。
现在回答这个问题已经太晚了,但希望它能帮助别人。
下面是一个简单的函数,它将返回文件的分隔符。
function getFileDelimiter($file, $checkLines = 2){
$file = new SplFileObject($file);
$delimiters = array(
',',
'\t',
';',
'|',
':'
);
$results = array();
$i = 0;
while($file->valid() && $i <= $checkLines){
$line = $file->fgets();
foreach ($delimiters as $delimiter){
$regExp = '/['.$delimiter.']/';
$fields = preg_split($regExp, $line);
if(count($fields) > 1){
if(!empty($results[$delimiter])){
$results[$delimiter]++;
} else {
$results[$delimiter] = 1;
}
}
}
$i++;
}
$results = array_keys($results, max($results));
return $results[0];
}
使用此功能,如下所示:
$delimiter = getFileDelimiter('abc.csv'); //Check 2 lines to determine the delimiter
$delimiter = getFileDelimiter('abc.csv', 5); //Check 5 lines to determine the delimiter
附言:我用preg_split()代替explode(),因为explode('\t',$value)不会给出正确的结果。
更新:感谢您@RichardEB指出代码中的错误。我现在已经更新了这个。
这就是我的工作。
这不会100%工作,但这是一个不错的起点。它至少会减少可能的分隔符的数量(使用户更容易选择正确的分隔符)。
/* Rearrange this array to change the search priority of delimiters */
$delimiters = array('tab' => "\t",
'comma' => ",",
'semicolon' => ";"
);
$handle = file( $file ); # Grabs the CSV file, loads into array
$line = array(); # Stores the count of delimiters in each row
$valid_delimiter = array(); # Stores Valid Delimiters
# Count the number of Delimiters in Each Row
for ( $i = 1; $i < 6; $i++ ){
foreach ( $delimiters as $key => $value ){
$line[$key][$i] = count( explode( $value, $handle[$i] ) ) - 1;
}
}
# Compare the Count of Delimiters in Each line
foreach ( $line as $delimiter => $count ){
# Check that the first two values are not 0
if ( $count[1] > 0 and $count[2] > 0 ){
$match = true;
$prev_value = '';
foreach ( $count as $value ){
if ( $prev_value != '' )
$match = ( $prev_value == $value and $match == true ) ? true : false;
$prev_value = $value;
}
} else {
$match = false;
}
if ( $match == true ) $valid_delimiter[] = $delimiter;
}//foreach
# Set Default delimiter to comma
$delimiter = ( $valid_delimiter[0] != '' ) ? $valid_delimiter[0] : "comma";
/* !!!! This is good enough for my needs since I have the priority set to "tab"
!!!! but you will want to have to user select from the delimiters in $valid_delimiter
!!!! if multiple dilimiter counts match
*/
# The Delimiter for the CSV
echo $delimiters[$delimiter];