python - Try opening a file as an archive, otherwise read as a regular file -


I'm trying to process a list of files, where each can be a regular text file or archive a bz2 .

How can I try the most efficiently except for the block in an effort to open each file in the appropriate format? I will not check the extension of the file, because it can not always depend on (and not much EAFP).

I am currently I have:

  def data_generator (* corpora): Diiarf parse_lines (Fobije): The line Fob: # A lot of processing # .. in. # fruit for corpus corporate were made several rows ( 'lots' 'of' 'data'): try with bz2.BZ2File (corpus, mode = 'r') f: parse_lines (f) data: yield except IOError data: Parse_lines (f) with codecs.open (corpus, encoding = 'utf-8') for the data: production data   

I think That is repeated in parse_lines TA looks for (F): ... address urgent, but I can not think of a way to get rid of it. Is there any way to reduce the last time, or is there another way to try a file "smart open"?

Edit: Optional Followup

What would be a suitable method on the scale Check the number of formats checked? As an example, the program 7zip does support you tries allows you to right-click on a file and open it as a collection (7zip). With the current effort-except block strategy, it seems that you will begin to start nested blocks after some formats, such as:

  try: f = ... except IOError Try: IOError ... Try: ...    

if This is really just a duplicate loop, which has worried you, you can move f from the try-catch block to the point of speaking and working After that copy a loop: For data in parse_lines (f), except for IOError: f = codecs.open (corpus, encoding = 'utf-8'): try

 : f = Bz2.BZ2File (corpus, mode = 'r'): output data f.close ()   

However, I will try to open the file only once but the BZ2 header (character The first two bytes like Beas ), and to decide whether to continue reading it as plain text, or passing the data in the bz2.BZ2Decompressor example.

Comments

Popular posts from this blog

excel vba - How to delete Solver(SOLVER.XLAM) code -

github - Teamcity & Git - PR merge builds - anyway to get HEAD commit hash? -

ios - Replace text in UITextView run slowly -