首页 > python > 如何连接文本文件中的行?

如何连接文本文件中的行? (How to concatenate lines in text file?)

2019-03-04 python

问题

所以我的任务是在文本文件中加入文本,但它像我尝试的任何东西都不起作用。我尝试拆分,但它需要字符串,而不是数组和连接根本没有帮助我,因为我有已经完成这项工作的代码。

带有单词的文本文件如下(filename = demo_fasta_file_2019.fsa):

>sequence_1
GATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGA
>sequence_2
GATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGA
GATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGA
GATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGA
GATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGA
GATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGA
>sequence_3
TTTTGGAAAATTTTGGAAAATTTTGGAAAATTTTGGAAAATTTTGGAAAATTTTGGAAAATTTTGGAAAATTTTGGAAAA
TTTTGGAAAATTTTGGAAAATTTTGGAAAATTTTGGAAAATTTTGGAAAATTTTGGAAAATTTTGGAAAATTTTGGAAAA
>sequence_4
GGTTAACCATGGATC

我的代码如下:

#def Read_FastA_Names_And_Sequences(filepath):

#############
filepath=str("demo_fasta_file_2019.fsa")
##sequence_names,sequences = Read_FastA_Names_And_Sequences(filepath)
sequence_names=[]
sequences=[]
number_of_sequences=4
#############
textfile = open(filepath, 'r')

sequence = textfile.readlines()

for i in sequence:
    if i.__contains__('>'):
        a=i[1:]
        sequence_names.append(a[:a.__len__()-1])
    i=+1
print(sequence)
#list1 = sequence
#s = "\n"
#s = s.join(list1)
#print(s)
list2 = sequence
words2 = list2.split(">")
print(words2)

所以我的问题是,我如何只加入没有> sequence_1,> sequence_2,> sequence_3,> sequence_4的文本?

解决方法

您可以>使用生成器表达式过滤不以a开头的行,并使用str.join它们连接它们:

print(''.join(line for line in open("demo_fasta_file_2019.fsa") if not line.startswith('>')))

问题

So i have been tasked with joining text in a text file but its like whatever i try is not working. I tried split but it needs strings and not arrays and join doesn't help me at all, since i have code that already does that job.

The text file with the words is as follows (filename = demo_fasta_file_2019.fsa):

>sequence_1
GATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGA
>sequence_2
GATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGA
GATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGA
GATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGA
GATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGA
GATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGA
>sequence_3
TTTTGGAAAATTTTGGAAAATTTTGGAAAATTTTGGAAAATTTTGGAAAATTTTGGAAAATTTTGGAAAATTTTGGAAAA
TTTTGGAAAATTTTGGAAAATTTTGGAAAATTTTGGAAAATTTTGGAAAATTTTGGAAAATTTTGGAAAATTTTGGAAAA
>sequence_4
GGTTAACCATGGATC

And the code that i have is as follows:

#def Read_FastA_Names_And_Sequences(filepath):

#############
filepath=str("demo_fasta_file_2019.fsa")
##sequence_names,sequences = Read_FastA_Names_And_Sequences(filepath)
sequence_names=[]
sequences=[]
number_of_sequences=4
#############
textfile = open(filepath, 'r')

sequence = textfile.readlines()

for i in sequence:
    if i.__contains__('>'):
        a=i[1:]
        sequence_names.append(a[:a.__len__()-1])
    i=+1
print(sequence)
#list1 = sequence
#s = "\n"
#s = s.join(list1)
#print(s)
list2 = sequence
words2 = list2.split(">")
print(words2)

So my question is, how can i join only the text without >sequence_1, >sequence_2, >sequence_3, >sequence_4 ?

解决方法

You can filter lines that do not start with a > with a generator expression and use str.join to concatenate them:

print(''.join(line for line in open("demo_fasta_file_2019.fsa") if not line.startswith('>')))
相似信息