首页 > sorting > unix sort为不同的用户提供不同的结果

unix sort为不同的用户提供不同的结果 (unix sort is giving different results for different users)

2019-03-09 sortingunixcsh

问题

我想用以下内容对文件名reportA进行排序

pat_int_parallel_all


/projects/test
-v ../../../../../../te
min_custom.v
-v ../../../../../../tes
-y ../../../../../../test_
-y ../../../../../../test_lib/test
../../../../../../tesla
/projects/checklist
../../../../../../test_lib/LIB
../../../../../../telib/av
../../../../../../telib/te
+libext+.v
+incdir+/projectsst_relea/ana

当我尝试排序-u -r reportA>输出。我得到了这个结果

-y ../../../../../../test_lib/test
-y ../../../../../../test_
-v ../../../../../../tes
-v ../../../../../../te 
../../../../../../test_lib/LIB
../../../../../../test 
../../../../../../telib/te
../../../../../../telib/av
/projects/test /projects/checklist 
pat_int_parallel_all min_custom.v
+libext+.v
+incdir+/projectsst_relea/ana

我的语言环境输出是en_US

LANG=en_US
LC_CTYPE="en_US"
LC_NUMERIC="en_US"
LC_TIME="en_US"
LC_COLLATE="en_US"
LC_MONETARY="en_US"
LC_MESSAGES="en_US"
LC_PAPER="en_US"
LC_NAME="en_US"
LC_ADDRESS="en_US"
LC_TELEPHONE="en_US"
LC_MEASUREMENT="en_US"
LC_IDENTIFICATION="en_US"
LC_ALL=

但是对于具有相同排序命令的其他用户,它导致不同的输出。

pat_int_parallel_all
min_custom.v
/projects/test
/projects/checklist
../../../../../../test_lib/LIB
../../../../../../tesla
../../../../../../telib/te
../../../../../../telib/av
-y ../../../../../../test_lib/test
-y ../../../../../../test_
-v ../../../../../../tes
-v ../../../../../../te
+libext+.v
+incdir+/projectsst_relea/ana

我的朋友语言环境输出是C.

LANG=C
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=C 

我想知道为什么正常的uinx sort命令在我的排序别名SHELL版本与其他用户相同时会给出两个不同的结果。甚至cshrc设置也是一样的。是由于特殊字符?

谁能解释这里有什么问题。

解决方法

不同行为的sort根本原因是价值LC_COLLATE。输出man 7 locale说:

LC_COLLATE

此类别控制用于排序和正则表达式的排序规则,包括字符等效类和多字符排序元素。此区域设置类别更改函数的行为,strcoll(3)strxfrm(3)用于比较本地字母表中的字符串。例如,德国锋利s被分类为“ss”。

我(非常快速)对sort 源代码的分析是,它转换要排序的文本行strxfrm()以获得比较基础,因此即使它们的字节不同,在这里被认为相等的字节串也被视为相等(sic) )。

关于你仍然获得相同输出的事实,正如@Amadan所说,非常奇怪。您确定已正确设置了区域设置吗?你能试试吗LC_COLLATE="C" sort -ru your_file

问题

I wanted to sort a file name reportA with following contents

pat_int_parallel_all


/projects/test
-v ../../../../../../te
min_custom.v
-v ../../../../../../tes
-y ../../../../../../test_
-y ../../../../../../test_lib/test
../../../../../../tesla
/projects/checklist
../../../../../../test_lib/LIB
../../../../../../telib/av
../../../../../../telib/te
+libext+.v
+incdir+/projectsst_relea/ana

when i tried sort -u -r reportA >output . I got this result

-y ../../../../../../test_lib/test
-y ../../../../../../test_
-v ../../../../../../tes
-v ../../../../../../te 
../../../../../../test_lib/LIB
../../../../../../test 
../../../../../../telib/te
../../../../../../telib/av
/projects/test /projects/checklist 
pat_int_parallel_all min_custom.v
+libext+.v
+incdir+/projectsst_relea/ana

My locale output is en_US

LANG=en_US
LC_CTYPE="en_US"
LC_NUMERIC="en_US"
LC_TIME="en_US"
LC_COLLATE="en_US"
LC_MONETARY="en_US"
LC_MESSAGES="en_US"
LC_PAPER="en_US"
LC_NAME="en_US"
LC_ADDRESS="en_US"
LC_TELEPHONE="en_US"
LC_MEASUREMENT="en_US"
LC_IDENTIFICATION="en_US"
LC_ALL=

But for the other user with same sort command it resulted in a different output.

pat_int_parallel_all
min_custom.v
/projects/test
/projects/checklist
../../../../../../test_lib/LIB
../../../../../../tesla
../../../../../../telib/te
../../../../../../telib/av
-y ../../../../../../test_lib/test
-y ../../../../../../test_
-v ../../../../../../tes
-v ../../../../../../te
+libext+.v
+incdir+/projectsst_relea/ana

My friends locale output is C

LANG=C
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=C 

I was wondering why a normal uinx sort command is giving two different results when my sort alias,SHELL version is same as other user. Even cshrc settings are same. Is it due to the special characters?

Can someone explain what's wrong here.

解决方法

The ground reason of the different behavior of sort is the value of LC_COLLATE. The output of man 7 locale says:

LC_COLLATE

This category governs the collation rules used for sorting and regular expressions, including character equivalence classes and multicharacter collating elements. This locale category changes the behavior of the functions strcoll(3) and strxfrm(3), which are used to compare strings in the local alphabet. For example, the German sharp s is sorted as "ss".

My (very quick) analysis of sort source code, is that it transforms lines of text to be sorted with strxfrm() to get a basis of comparison, so that byte strings that would otherwise considered to be equal are considered equal here even if their bytes differ (sic).

Regarding the fact that you still get the same output is, as said by @Amadan, quite strange. Are you sure you have set the locale properly? Could you try LC_COLLATE="C" sort -ru your_file.

相似信息