Wrox Home  
Search P2P Archive for: Go

  Return to Index  

application_development thread: large text files processing in .NET


Message #1 by "come2study" <come2study@y...> on Mon, 17 Mar 2003 12:11:42
Hi,
We are in the process of designing of a system which basically reads data 
from large text files (GB of data!), do some processing and store them in 
the database. The application is in .NET. Has anyone of you used such 
large text files? Can you please tell me which method of I/O processing is 
faster in .NET to give high performance?  Even if u had gone thru some 
case studies related to this on .NET, please let me know the url. Pls 
help. 

Thanks.
Message #2 by Tim.Musschoot@f... on Mon, 17 Mar 2003 13:20:35 +0100
Well, I suggest you use native C++ code to do this.  This allows programming
at a much lower level, so it can seriously improve performance.

Also, whatever language you use, do NOT use techniques that buffer to much
data.  I recommend splitting the data in blocks.  In theory, the ideal size
of a datablock should be a multiple of a physical block size of your
harddisk...

Where I'm afraid of, when I read your question, is the garbage collection of
.NET.  When you want to read multiple Gb of data, all data will be cached in
buffers, but only be released by the garbase collector long after you don't
need the data anymore.  I have serious doubts about the performance of this
method...

HTH,
Tim

> ----------
> From: 	come2study[SMTP:come2study@y...]
> Reply To: 	Application Development
> Sent: 	lundi 17 mars 2003 13:11
> To: 	Application Development
> Subject: 	[application_development] large text files processing in
> .NET
> 
> Hi,
> We are in the process of designing of a system which basically reads data 
> from large text files (GB of data!), do some processing and store them in 
> the database. The application is in .NET. Has anyone of you used such 
> large text files? Can you please tell me which method of I/O processing is
> 
> faster in .NET to give high performance?  Even if u had gone thru some 
> case studies related to this on .NET, please let me know the url. Pls 
> help. 
> 
> Thanks.
> 


=======================================================
Voir texte francais apres texte neerlandais

Deze email, met inbegrip van elk bijgevoegd document, is vertrouwelijk. Indien u niet de geadresseerde bent, is het openbaar maken,
kopieren of gebruik maken ervan verboden. Indien u dit bericht verkeerdelijk hebt ontvangen, gelieve het te vernietigen en de
afzender onmiddellijk te verwittigen. De veiligheid en juistheid van email-berichten kunnen niet gewaarborgd worden, aangezien de
informatie kan onderschept of gesaboteerd worden, verloren gaan of virussen kan bevatten. De afzender wijst bijgevolg elke
aansprakelijkheid af in dergelijke gevallen. Indien een controle zich opdringt, gelieve een papieren kopie te vragen. 


Ce message electronique, y compris tout document joint, est confidentiel. Si vous n'etes pas le destinataire de ce message, toute
divulgation, copie ou utilisation en est interdite. Si vous avez recu ce message par erreur, veuillez le detruire et en informer
immediatement l'expediteur. La securite et l'exactitude des transmissions de messages electroniques ne peuvent etre garanties etant
donne que les informations peuvent etre interceptees, alterees, perdues ou infectees par des virus; l 'expediteur decline des lors
toute responsabilite en pareils cas. Si une verification s'impose, veuillez demander une copie papier. 
=======================================================
Message #3 by "come2study" <come2study@y...> on Tue, 18 Mar 2003 04:39:01
Thanks a lot for the reply. I need to use only .NET. I don't have the 
option to change the platform. Yes, garbage collection is a serious 
problem to look into. There are lots of objects in .NET like filestream, 
bufferstream etc and streamreader and streamwriter. In comparing them, 
when I browsed thru generally it is said that reading thru streamreader is 
faster.  Any idea which method is efficient, faster and whether buffering 
should be done? Also we have other tools like biztalk. How files can be 
handled efficiently?  Thanks..


> Well, I suggest you use native C++ code to do this.  This allows 
programming
at a much lower level, so it can seriously improve performance.

Also, whatever language you use, do NOT use techniques that buffer to much
data.  I recommend splitting the data in blocks.  In theory, the ideal size
of a datablock should be a multiple of a physical block size of your
harddisk...

Where I'm afraid of, when I read your question, is the garbage collection 
of
.NET.  When you want to read multiple Gb of data, all data will be cached 
in
buffers, but only be released by the garbase collector long after you don't
need the data anymore.  I have serious doubts about the performance of this
method...

HTH,
Tim

> ----------
> From: 	come2study[SMTP:come2study@y...]
> Reply To: 	Application Development
> Sent: 	lundi 17 mars 2003 13:11
> To: 	Application Development
> Subject: 	[application_development] large text files processing in
> .NET
> 
> Hi,
> We are in the process of designing of a system which basically reads 
data 
> from large text files (GB of data!), do some processing and store them 
in 
> the database. The application is in .NET. Has anyone of you used such 
> large text files? Can you please tell me which method of I/O processing 
is
> 
> faster in .NET to give high performance?  Even if u had gone thru some 
> case studies related to this on .NET, please let me know the url. Pls 
> help. 
> 
> Thanks.
> 


=======================================================
Voir texte francais apres texte neerlandais

Deze email, met inbegrip van elk bijgevoegd document, is vertrouwelijk. 
Indien u niet de geadresseerde bent, is het openbaar maken, kopieren of 
gebruik maken ervan verboden. Indien u dit bericht verkeerdelijk hebt 
ontvangen, gelieve het te vernietigen en de afzender onmiddellijk te 
verwittigen. De veiligheid en juistheid van email-berichten kunnen niet 
gewaarborgd worden, aangezien de informatie kan onderschept of gesaboteerd 
worden, verloren gaan of virussen kan bevatten. De afzender wijst 
bijgevolg elke aansprakelijkheid af in dergelijke gevallen. Indien een 
controle zich opdringt, gelieve een papieren kopie te vragen. 


Ce message electronique, y compris tout document joint, est confidentiel. 
Si vous n'etes pas le destinataire de ce message, toute divulgation, copie 
ou utilisation en est interdite. Si vous avez recu ce message par erreur, 
veuillez le detruire et en informer immediatement l'expediteur. La 
securite et l'exactitude des transmissions de messages electroniques ne 
peuvent etre garanties etant donne que les informations peuvent etre 
interceptees, alterees, perdues ou infectees par des virus; l 'expediteur 
decline des lors toute responsabilite en pareils cas. Si une verification 
s'impose, veuillez demander une copie papier. 
=======================================================
Message #4 by "jerry Weidong Lo" <cswdluo@c...> on Tue, 18 Mar 2003 12:49:47 +0800
   In my experience of handling the reading and writing of larde text files.
The streamwriter and streamreader may be faster than other provided by .net
or java such as bufferedreader(in java) and bufferstream(.net).
   As you need to use only .NET,i suggest you use the streamreader and split
the large files in blocks,best use buffer  strategy.
   Regs!
                                              jerry.lo

----- Original Message -----
From: "come2study" <come2study@y...>
To: "Application Development" <application_development@p...>
Sent: Tuesday, March 18, 2003 4:39 AM
Subject: [application_development] RE: large text files processing in .NET


> Thanks a lot for the reply. I need to use only .NET. I don't have the
> option to change the platform. Yes, garbage collection is a serious
> problem to look into. There are lots of objects in .NET like filestream,
> bufferstream etc and streamreader and streamwriter. In comparing them,
> when I browsed thru generally it is said that reading thru streamreader is
> faster.  Any idea which method is efficient, faster and whether buffering
> should be done? Also we have other tools like biztalk. How files can be
> handled efficiently?  Thanks..
>
>
> > Well, I suggest you use native C++ code to do this.  This allows
> programming
> at a much lower level, so it can seriously improve performance.
>
> Also, whatever language you use, do NOT use techniques that buffer to much
> data.  I recommend splitting the data in blocks.  In theory, the ideal
size
> of a datablock should be a multiple of a physical block size of your
> harddisk...
>
> Where I'm afraid of, when I read your question, is the garbage collection
> of
> .NET.  When you want to read multiple Gb of data, all data will be cached
> in
> buffers, but only be released by the garbase collector long after you
don't
> need the data anymore.  I have serious doubts about the performance of
this
> method...
>
> HTH,
> Tim
>
> > ----------
> > From: come2study[SMTP:come2study@y...]
> > Reply To: Application Development
> > Sent: lundi 17 mars 2003 13:11
> > To: Application Development
> > Subject: [application_development] large text files processing in
> > .NET
> >
> > Hi,
> > We are in the process of designing of a system which basically reads
> data
> > from large text files (GB of data!), do some processing and store them
> in
> > the database. The application is in .NET. Has anyone of you used such
> > large text files? Can you please tell me which method of I/O processing
> is
> >
> > faster in .NET to give high performance?  Even if u had gone thru some
> > case studies related to this on .NET, please let me know the url. Pls
> > help.
> >
> > Thanks.
> >
>
>
> =======================================================
> Voir texte francais apres texte neerlandais
>
> Deze email, met inbegrip van elk bijgevoegd document, is vertrouwelijk.
> Indien u niet de geadresseerde bent, is het openbaar maken, kopieren of
> gebruik maken ervan verboden. Indien u dit bericht verkeerdelijk hebt
> ontvangen, gelieve het te vernietigen en de afzender onmiddellijk te
> verwittigen. De veiligheid en juistheid van email-berichten kunnen niet
> gewaarborgd worden, aangezien de informatie kan onderschept of gesaboteerd
> worden, verloren gaan of virussen kan bevatten. De afzender wijst
> bijgevolg elke aansprakelijkheid af in dergelijke gevallen. Indien een
> controle zich opdringt, gelieve een papieren kopie te vragen.
>
>
> Ce message electronique, y compris tout document joint, est confidentiel.
> Si vous n'etes pas le destinataire de ce message, toute divulgation, copie
> ou utilisation en est interdite. Si vous avez recu ce message par erreur,
> veuillez le detruire et en informer immediatement l'expediteur. La
> securite et l'exactitude des transmissions de messages electroniques ne
> peuvent etre garanties etant donne que les informations peuvent etre
> interceptees, alterees, perdues ou infectees par des virus; l 'expediteur
> decline des lors toute responsabilite en pareils cas. Si une verification
> s'impose, veuillez demander une copie papier.
> =======================================================

Message #5 by "Jerry Lanphear" <jerrylan@q...> on Wed, 19 Mar 2003 06:45:50 -0700
I have one other suggestion for you. This may not help because your program
will probably not be large, just your files, but you may get a performance
improvement by using NGEN.EXE to precompile your executable into native
machine code. NGEN.EXE is part of the .NET framework

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cptools/htm
l/cpgrfnativeimagegeneratorngenexe.asp

Regards

----- Original Message -----
From: "come2study" <come2study@y...>
To: "Application Development" <application_development@p...>
Sent: Tuesday, March 18, 2003 4:39 AM
Subject: [application_development] RE: large text files processing in .NET


> Thanks a lot for the reply. I need to use only .NET. I don't have the
> option to change the platform. Yes, garbage collection is a serious
> problem to look into. There are lots of objects in .NET like filestream,
> bufferstream etc and streamreader and streamwriter. In comparing them,
> when I browsed thru generally it is said that reading thru streamreader is
> faster.  Any idea which method is efficient, faster and whether buffering
> should be done? Also we have other tools like biztalk. How files can be
> handled efficiently?  Thanks..
>
>
> > Well, I suggest you use native C++ code to do this.  This allows
> programming
> at a much lower level, so it can seriously improve performance.
>
> Also, whatever language you use, do NOT use techniques that buffer to much
> data.  I recommend splitting the data in blocks.  In theory, the ideal
size
> of a datablock should be a multiple of a physical block size of your
> harddisk...
>
> Where I'm afraid of, when I read your question, is the garbage collection
> of
> .NET.  When you want to read multiple Gb of data, all data will be cached
> in
> buffers, but only be released by the garbase collector long after you
don't
> need the data anymore.  I have serious doubts about the performance of
this
> method...
>
> HTH,
> Tim
>
> > ----------
> > From: come2study[SMTP:come2study@y...]
> > Reply To: Application Development
> > Sent: lundi 17 mars 2003 13:11
> > To: Application Development
> > Subject: [application_development] large text files processing in
> > .NET
> >
> > Hi,
> > We are in the process of designing of a system which basically reads
> data
> > from large text files (GB of data!), do some processing and store them
> in
> > the database. The application is in .NET. Has anyone of you used such
> > large text files? Can you please tell me which method of I/O processing
> is
> >
> > faster in .NET to give high performance?  Even if u had gone thru some
> > case studies related to this on .NET, please let me know the url. Pls
> > help.
> >
> > Thanks.
> >
>
>
> =======================================================
> Voir texte francais apres texte neerlandais
>
> Deze email, met inbegrip van elk bijgevoegd document, is vertrouwelijk.
> Indien u niet de geadresseerde bent, is het openbaar maken, kopieren of
> gebruik maken ervan verboden. Indien u dit bericht verkeerdelijk hebt
> ontvangen, gelieve het te vernietigen en de afzender onmiddellijk te
> verwittigen. De veiligheid en juistheid van email-berichten kunnen niet
> gewaarborgd worden, aangezien de informatie kan onderschept of gesaboteerd
> worden, verloren gaan of virussen kan bevatten. De afzender wijst
> bijgevolg elke aansprakelijkheid af in dergelijke gevallen. Indien een
> controle zich opdringt, gelieve een papieren kopie te vragen.
>
>
> Ce message electronique, y compris tout document joint, est confidentiel.
> Si vous n'etes pas le destinataire de ce message, toute divulgation, copie
> ou utilisation en est interdite. Si vous avez recu ce message par erreur,
> veuillez le detruire et en informer immediatement l'expediteur. La
> securite et l'exactitude des transmissions de messages electroniques ne
> peuvent etre garanties etant donne que les informations peuvent etre
> interceptees, alterees, perdues ou infectees par des virus; l 'expediteur
> decline des lors toute responsabilite en pareils cas. Si une verification
> s'impose, veuillez demander une copie papier.
> =======================================================
>
>


  Return to Index