Thread: Hi
View Single Post
  #6 (permalink)  
Old September 4th, 2009, 10:06 AM
allenbir allenbir is offline
Registered User
 
Join Date: Sep 2009
Posts: 5
Thanks: 0
Thanked 0 Times in 0 Posts
Default HI

Hi,

I have a 2 files like

File 1 has entries like this
PF00912
PF00913
PF00914
PF00915
PF00916
..
..

File2

>128UP_DROME |==============================================| P32234.2 368 a.a.
MMR_HSR1 1 ______________ (9407) PF01926.15 GTPase of unknown function 65-176
TGS 1 _________ (2078) PF02824.13 TGS domain 292-366

>12KD_FRAAN |=====================================| Q05349.1 111 a.a.
Auxin_repressed 1 __________________________________ (61) PF05564.4 Dormancy/auxin associated protein 7-110

>12S1_ARATH |================================================| P15455.2 472 a.a.
Cupin_1 2 _______________ _______________ (1556) PF00190.14 Cupin 41-199 295-444

>12S2_ARATH |==============================================| P15456.2 455 a.a.
Cupin_1 2 ________________ _______________ (1556) PF00190.14 Cupin 35-192 282-431

>12S_PROFR |===============================================| Q8GBW6.3 611 a.a.
Carboxyl_trans 1 ______________________________________ (2491) PF01039.14 Carboxyl transferase domain 34-522

>13S1_FAGES |================================================| O23878.1 565 a.a.
Cupin_1 2 __________________ ____________ (1556) PF00190.14 Cupin 49-275 390-539

Problem is that I need to take one entry at a time from File1 , i.e PF00912 and match it with the File 2 in such a way that , if it found match then it should do the following ,

1. Take one entry from the file1 say like PF00912 and look it into file2.
2. Take only that enteries that starts from > to next > in file 2 that has PF00912 in it.
3. In file 2 look for the position after this |=====|, say like this P32234 (in file2) take this name and store it.
4. Look for PF* entries and numbers like 65-176 (can be any) in that same P32234 and then classify accordingly

a. if no. of PF* (1 only) and 65-176 (1 only any number) put P32234 into bin1 and

b. if no. PF* (more than 1) and 65-176 (1 only) for each PF* put P32234 into bin2 and

c. if no. PF* (1 only) and 65-176 (more than 1 times) put P32234
or no. if PF* (more than 1) and 65-176 (more than 1) put P32234 put them into bin3

d. if no. PF* (1 only) and 65-176 (more than 5) put them in bin 4

* note 65-176 can be any number.

5. At final we need to count the number of entries in each bin.

Thanks in advance

allen