|
 |
application_development thread: large text files processing in .NET
Message #1 by "come2study" <come2study@y...> on Mon, 17 Mar 2003 12:11:42
|
|
Hi,
We are in the process of designing of a system which basically reads data
from large text files (GB of data!), do some processing and store them in
the database. The application is in .NET. Has anyone of you used such
large text files? Can you please tell me which method of I/O processing is
faster in .NET to give high performance? Even if u had gone thru some
case studies related to this on .NET, please let me know the url. Pls
help.
Thanks.
Message #2 by Tim.Musschoot@f... on Mon, 17 Mar 2003 13:20:35 +0100
|
|
Well, I suggest you use native C++ code to do this. This allows programming
at a much lower level, so it can seriously improve performance.
Also, whatever language you use, do NOT use techniques that buffer to much
data. I recommend splitting the data in blocks. In theory, the ideal size
of a datablock should be a multiple of a physical block size of your
harddisk...
Where I'm afraid of, when I read your question, is the garbage collection of
.NET. When you want to read multiple Gb of data, all data will be cached in
buffers, but only be released by the garbase collector long after you don't
need the data anymore. I have serious doubts about the performance of this
method...
HTH,
Tim
> ----------
> From: come2study[SMTP:come2study@y...]
> Reply To: Application Development
> Sent: lundi 17 mars 2003 13:11
> To: Application Development
> Subject: [application_development] large text files processing in
> .NET
>
> Hi,
> We are in the process of designing of a system which basically reads data
> from large text files (GB of data!), do some processing and store them in
> the database. The application is in .NET. Has anyone of you used such
> large text files? Can you please tell me which method of I/O processing is
>
> faster in .NET to give high performance? Even if u had gone thru some
> case studies related to this on .NET, please let me know the url. Pls
> help.
>
> Thanks.
>
=======================================================
Voir texte francais apres texte neerlandais
Deze email, met inbegrip van elk bijgevoegd document, is vertrouwelijk. Indien u niet de geadresseerde bent, is het openbaar maken,
kopieren of gebruik maken ervan verboden. Indien u dit bericht verkeerdelijk hebt ontvangen, gelieve het te vernietigen en de
afzender onmiddellijk te verwittigen. De veiligheid en juistheid van email-berichten kunnen niet gewaarborgd worden, aangezien de
informatie kan onderschept of gesaboteerd worden, verloren gaan of virussen kan bevatten. De afzender wijst bijgevolg elke
aansprakelijkheid af in dergelijke gevallen. Indien een controle zich opdringt, gelieve een papieren kopie te vragen.
Ce message electronique, y compris tout document joint, est confidentiel. Si vous n'etes pas le destinataire de ce message, toute
divulgation, copie ou utilisation en est interdite. Si vous avez recu ce message par erreur, veuillez le detruire et en informer
immediatement l'expediteur. La securite et l'exactitude des transmissions de messages electroniques ne peuvent etre garanties etant
donne que les informations peuvent etre interceptees, alterees, perdues ou infectees par des virus; l 'expediteur decline des lors
toute responsabilite en pareils cas. Si une verification s'impose, veuillez demander une copie papier.
=======================================================
Message #3 by "come2study" <come2study@y...> on Tue, 18 Mar 2003 04:39:01
|
|
Thanks a lot for the reply. I need to use only .NET. I don't have the
option to change the platform. Yes, garbage collection is a serious
problem to look into. There are lots of objects in .NET like filestream,
bufferstream etc and streamreader and streamwriter. In comparing them,
when I browsed thru generally it is said that reading thru streamreader is
faster. Any idea which method is efficient, faster and whether buffering
should be done? Also we have other tools like biztalk. How files can be
handled efficiently? Thanks..
> Well, I suggest you use native C++ code to do this. This allows
programming
at a much lower level, so it can seriously improve performance.
Also, whatever language you use, do NOT use techniques that buffer to much
data. I recommend splitting the data in blocks. In theory, the ideal size
of a datablock should be a multiple of a physical block size of your
harddisk...
Where I'm afraid of, when I read your question, is the garbage collection
of
.NET. When you want to read multiple Gb of data, all data will be cached
in
buffers, but only be released by the garbase collector long after you don't
need the data anymore. I have serious doubts about the performance of this
method...
HTH,
Tim
> ----------
> From: come2study[SMTP:come2study@y...]
> Reply To: Application Development
> Sent: lundi 17 mars 2003 13:11
> To: Application Development
> Subject: [application_development] large text files processing in
> .NET
>
> Hi,
> We are in the process of designing of a system which basically reads
data
> from large text files (GB of data!), do some processing and store them
in
> the database. The application is in .NET. Has anyone of you used such
> large text files? Can you please tell me which method of I/O processing
is
>
> faster in .NET to give high performance? Even if u had gone thru some
> case studies related to this on .NET, please let me know the url. Pls
> help.
>
> Thanks.
>
=======================================================
Voir texte francais apres texte neerlandais
Deze email, met inbegrip van elk bijgevoegd document, is vertrouwelijk.
Indien u niet de geadresseerde bent, is het openbaar maken, kopieren of
gebruik maken ervan verboden. Indien u dit bericht verkeerdelijk hebt
ontvangen, gelieve het te vernietigen en de afzender onmiddellijk te
verwittigen. De veiligheid en juistheid van email-berichten kunnen niet
gewaarborgd worden, aangezien de informatie kan onderschept of gesaboteerd
worden, verloren gaan of virussen kan bevatten. De afzender wijst
bijgevolg elke aansprakelijkheid af in dergelijke gevallen. Indien een
controle zich opdringt, gelieve een papieren kopie te vragen.
Ce message electronique, y compris tout document joint, est confidentiel.
Si vous n'etes pas le destinataire de ce message, toute divulgation, copie
ou utilisation en est interdite. Si vous avez recu ce message par erreur,
veuillez le detruire et en informer immediatement l'expediteur. La
securite et l'exactitude des transmissions de messages electroniques ne
peuvent etre garanties etant donne que les informations peuvent etre
interceptees, alterees, perdues ou infectees par des virus; l 'expediteur
decline des lors toute responsabilite en pareils cas. Si une verification
s'impose, veuillez demander une copie papier.
=======================================================
Message #4 by "jerry Weidong Lo" <cswdluo@c...> on Tue, 18 Mar 2003 12:49:47 +0800
|
|
In my experience of handling the reading and writing of larde text files.
The streamwriter and streamreader may be faster than other provided by .net
or java such as bufferedreader(in java) and bufferstream(.net).
As you need to use only .NET,i suggest you use the streamreader and split
the large files in blocks,best use buffer strategy.
Regs!
jerry.lo
----- Original Message -----
From: "come2study" <come2study@y...>
To: "Application Development" <application_development@p...>
Sent: Tuesday, March 18, 2003 4:39 AM
Subject: [application_development] RE: large text files processing in .NET
> Thanks a lot for the reply. I need to use only .NET. I don't have the
> option to change the platform. Yes, garbage collection is a serious
> problem to look into. There are lots of objects in .NET like filestream,
> bufferstream etc and streamreader and streamwriter. In comparing them,
> when I browsed thru generally it is said that reading thru streamreader is
> faster. Any idea which method is efficient, faster and whether buffering
> should be done? Also we have other tools like biztalk. How files can be
> handled efficiently? Thanks..
>
>
> > Well, I suggest you use native C++ code to do this. This allows
> programming
> at a much lower level, so it can seriously improve performance.
>
> Also, whatever language you use, do NOT use techniques that buffer to much
> data. I recommend splitting the data in blocks. In theory, the ideal
size
> of a datablock should be a multiple of a physical block size of your
> harddisk...
>
> Where I'm afraid of, when I read your question, is the garbage collection
> of
> .NET. When you want to read multiple Gb of data, all data will be cached
> in
> buffers, but only be released by the garbase collector long after you
don't
> need the data anymore. I have serious doubts about the performance of
this
> method...
>
> HTH,
> Tim
>
> > ----------
> > From: come2study[SMTP:come2study@y...]
> > Reply To: Application Development
> > Sent: lundi 17 mars 2003 13:11
> > To: Application Development
> > Subject: [application_development] large text files processing in
> > .NET
> >
> > Hi,
> > We are in the process of designing of a system which basically reads
> data
> > from large text files (GB of data!), do some processing and store them
> in
> > the database. The application is in .NET. Has anyone of you used such
> > large text files? Can you please tell me which method of I/O processing
> is
> >
> > faster in .NET to give high performance? Even if u had gone thru some
> > case studies related to this on .NET, please let me know the url. Pls
> > help.
> >
> > Thanks.
> >
>
>
> =======================================================
> Voir texte francais apres texte neerlandais
>
> Deze email, met inbegrip van elk bijgevoegd document, is vertrouwelijk.
> Indien u niet de geadresseerde bent, is het openbaar maken, kopieren of
> gebruik maken ervan verboden. Indien u dit bericht verkeerdelijk hebt
> ontvangen, gelieve het te vernietigen en de afzender onmiddellijk te
> verwittigen. De veiligheid en juistheid van email-berichten kunnen niet
> gewaarborgd worden, aangezien de informatie kan onderschept of gesaboteerd
> worden, verloren gaan of virussen kan bevatten. De afzender wijst
> bijgevolg elke aansprakelijkheid af in dergelijke gevallen. Indien een
> controle zich opdringt, gelieve een papieren kopie te vragen.
>
>
> Ce message electronique, y compris tout document joint, est confidentiel.
> Si vous n'etes pas le destinataire de ce message, toute divulgation, copie
> ou utilisation en est interdite. Si vous avez recu ce message par erreur,
> veuillez le detruire et en informer immediatement l'expediteur. La
> securite et l'exactitude des transmissions de messages electroniques ne
> peuvent etre garanties etant donne que les informations peuvent etre
> interceptees, alterees, perdues ou infectees par des virus; l 'expediteur
> decline des lors toute responsabilite en pareils cas. Si une verification
> s'impose, veuillez demander une copie papier.
> =======================================================
Message #5 by "Jerry Lanphear" <jerrylan@q...> on Wed, 19 Mar 2003 06:45:50 -0700
|
|
I have one other suggestion for you. This may not help because your program
will probably not be large, just your files, but you may get a performance
improvement by using NGEN.EXE to precompile your executable into native
machine code. NGEN.EXE is part of the .NET framework
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cptools/htm
l/cpgrfnativeimagegeneratorngenexe.asp
Regards
----- Original Message -----
From: "come2study" <come2study@y...>
To: "Application Development" <application_development@p...>
Sent: Tuesday, March 18, 2003 4:39 AM
Subject: [application_development] RE: large text files processing in .NET
> Thanks a lot for the reply. I need to use only .NET. I don't have the
> option to change the platform. Yes, garbage collection is a serious
> problem to look into. There are lots of objects in .NET like filestream,
> bufferstream etc and streamreader and streamwriter. In comparing them,
> when I browsed thru generally it is said that reading thru streamreader is
> faster. Any idea which method is efficient, faster and whether buffering
> should be done? Also we have other tools like biztalk. How files can be
> handled efficiently? Thanks..
>
>
> > Well, I suggest you use native C++ code to do this. This allows
> programming
> at a much lower level, so it can seriously improve performance.
>
> Also, whatever language you use, do NOT use techniques that buffer to much
> data. I recommend splitting the data in blocks. In theory, the ideal
size
> of a datablock should be a multiple of a physical block size of your
> harddisk...
>
> Where I'm afraid of, when I read your question, is the garbage collection
> of
> .NET. When you want to read multiple Gb of data, all data will be cached
> in
> buffers, but only be released by the garbase collector long after you
don't
> need the data anymore. I have serious doubts about the performance of
this
> method...
>
> HTH,
> Tim
>
> > ----------
> > From: come2study[SMTP:come2study@y...]
> > Reply To: Application Development
> > Sent: lundi 17 mars 2003 13:11
> > To: Application Development
> > Subject: [application_development] large text files processing in
> > .NET
> >
> > Hi,
> > We are in the process of designing of a system which basically reads
> data
> > from large text files (GB of data!), do some processing and store them
> in
> > the database. The application is in .NET. Has anyone of you used such
> > large text files? Can you please tell me which method of I/O processing
> is
> >
> > faster in .NET to give high performance? Even if u had gone thru some
> > case studies related to this on .NET, please let me know the url. Pls
> > help.
> >
> > Thanks.
> >
>
>
> =======================================================
> Voir texte francais apres texte neerlandais
>
> Deze email, met inbegrip van elk bijgevoegd document, is vertrouwelijk.
> Indien u niet de geadresseerde bent, is het openbaar maken, kopieren of
> gebruik maken ervan verboden. Indien u dit bericht verkeerdelijk hebt
> ontvangen, gelieve het te vernietigen en de afzender onmiddellijk te
> verwittigen. De veiligheid en juistheid van email-berichten kunnen niet
> gewaarborgd worden, aangezien de informatie kan onderschept of gesaboteerd
> worden, verloren gaan of virussen kan bevatten. De afzender wijst
> bijgevolg elke aansprakelijkheid af in dergelijke gevallen. Indien een
> controle zich opdringt, gelieve een papieren kopie te vragen.
>
>
> Ce message electronique, y compris tout document joint, est confidentiel.
> Si vous n'etes pas le destinataire de ce message, toute divulgation, copie
> ou utilisation en est interdite. Si vous avez recu ce message par erreur,
> veuillez le detruire et en informer immediatement l'expediteur. La
> securite et l'exactitude des transmissions de messages electroniques ne
> peuvent etre garanties etant donne que les informations peuvent etre
> interceptees, alterees, perdues ou infectees par des virus; l 'expediteur
> decline des lors toute responsabilite en pareils cas. Si une verification
> s'impose, veuillez demander une copie papier.
> =======================================================
>
>
|
|
 |