|
MacRaider4 wrote: ran it with no threads
How'd you manage that?
I guess I'd need a higher-level view of the process. The data gathering may benefit from multi-threading, but the writing to disk is less likely to, so somehow have the data-gathering threads pass the gathered data to the writing thread. There are a number of ways to accomplish that.
|
|
|
|
|
I copied the original application and then created another that was threaded... and gave it another name.
Actually the initial gathering takes about 2 seconds, the parsing of the data and the writing is what takes forever. I've actually decided that I'm going to split up the file creation part into two workers as well. Ok I'll explain what I'm doing in full maybe that will help...
- This app logs onto a mail server
- reads the number of messages on it and reports the number
- then it writes the information I need from those messages to textfiles (currently just reports progress back from a thread)
- Then it goes through those files and parses the information even more thinning down the data and writes that to the csv file
it does more after that but that's working fine so far
Does this help more? And I thought the POP stuff was hard.
|
|
|
|
|
Rather than having all those text files, maybe you should consider having a single database, storing the relevant data in appropriately typed fields as soon as possible.
Anyway, if parsing text files is taking that long, I'd venture you're doing it wrong. I wouldn't be surprised if you were using lots of regexes, a prime tool for slowing down and obfuscating your intentions.
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
|
|
|
|
|
What's taking so long is I have to search for particular lines in the email and they aren't always in the same order, same line or anything. So I have to search for the messageID, From line, the subject line, the actual text of the email, if there is a attachment and what the name of said attachment could be, then have to watch out for the end of email marker or if I found evenything then I just end it there. So yes there is some regexes in there but I think only a few lines for the one thing I'm looking for (I'd need to go look at that part again to see what it's for).
Though I am doing that line by line, isn't there a way with the streamreader to search for what ever it is you are looking for (lets say I need to find "From: " not "Received: from "). This is why I'm doing it line by line as I'm able to look at the start of the line and determine if that is what I need. This is where I'm writing that "master file" rather than the individual ones.
I could probably skip the write to the csv file and just go straight to the database, but for some reason when I first wrote this 9 months ago I had some problems with something and thus the writing to the csv file (I don't remember what they were).
This is also my first large application in C#, up till 1 1/2 years ago I was mainly doing VB, VBA & Access in M$ land.
|
|
|
|
|
one should not perform multiple passes on a (text) file, just read it once; or read a part, skip some, read some more, and never go back. Anything else is bound to be slow. If you need searching back and forth, store it all in memory or use a database. From your description it really sounds like a DB is in order. IMO you should thoroughly rethink the whole approach; a sub-optimal approach will not get fixed by throwing in some multi-threading.
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
|
|
|
|
|
Isn't some of that available as properties or something?
Without having seen what the emails look like, I'd recommend reading the whole email into one string and using one RegEx to extract what you need.
|
|
|
|
|
Luc Pattyn wrote: consider having a single database
I concur.
Luc Pattyn wrote: lots of regexes
Hadn't thought of that, but yeah, good point.
|
|
|
|
|
PIEBALDconsult wrote: Luc Pattyn wrote:
lots of regexes
here are the only lines
Regex objLongDollar = new Regex("\\d+,\\d+\\.\\d+");
Regex objNumRd = new Regex("\\d+rd");
Regex objNumSt = new Regex("\\d+st");
This is in the parse message section...
|
|
|
|
|
The result will depend on how large the search object is, and how often you execute such regexes.
When I care about performance, I avoid the Regex class. I use string methods, maybe a StringBuilder, maybe a character array, maybe several nested loops, but no regexes. Regexes are good for compact code when performance does not matter at all, and readability is not a primary concern either.
Here[^] is the report on a little experiment I once performed.
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
|
|
|
|
|
That seems reasonable. I don't know anything about reading messages from a mail server, but...
Each thread can read and parse one message and report back the result to be written. Whether or not the thread also downloads the message I don't know, but that should be doable.
So you can have a class that distributes work to a bunch of threads.
The process on the thread performs the work and reports back when finished.
For writing, you can have an event handler that locks a stream when it writes.
|
|
|
|
|
 I don't "download" the message, just read it from the server and write the needed info to the file.
This is an example of the first section of the mail that I'm working with...
+OK 670581 octets
Return-Path: <email address>
Received: from hrndva-omtalb.email host([ip address])
by hrndva-imta01.email hostwith ESMTP
id <20100324163246244.LLFZ11363@hrndva-imta01.email host>
for <email it's going to>; Wed, 24 Mar 2010 16:32:46 +0000
Return-Path: <email address>
X-Authority-Analysis: v=1.0 c=1 a=Y--C8wIrtp4A:10 a=ed-Ggqp32-PxgnFQ28IA:9 a=gFYqYUHr3cvJf5tUtWv3jj12YwYA:4 a=wPNLvfGTeEIA:10 a=SSmOFEACAAAA:8 a=Xz8RjLcVAAAA:8 a=bvyAQD6M8USi_luE8VwA:9 a=zkXRgtjM-mmOsYMX5XAA:7 a=lSBj04H3UYGbvfZ5gLKUj7ga3v4A:4 a=TQY7aazGoy4vupPYzM8A:9 a=A9QQSRYdmLSsclXicRDPQfuie2oA:4 a=IKIoO-ieCDEA:10 a=l42U5Vqe35IA:10 a=OU-3oeRcviPOZ7V7:21 a=r3OCwUNA-PGXxiAt:21
X-Cloudmark-Score: 0
X-Originating-IP: IP Address
Received: from [IP Address] ([IP Address] helo=computer it's from (I think))
by hrndva-oedge02.email host (envelope-from <email address>)
(ecelerity 2.2.2.39 r()) with ESMTP
id 8E/A4-28072-8AE3AAB4; Wed, 24 Mar 2010 16:32:45 +0000
Received: from 127.0.0.1 (AVG SMTP 8.5.437 [271.1.1/2767]); Wed, 24 Mar 2010 12:31:41 -0500
Message-ID: <006301cacb77$ddf18460$ae02a8c0@pc it's from>
From: "Name" <emailaddress>
To: "person it's going to" <their email>
Subject: kinda obvious but using this data
Date: Wed, 24 Mar 2010 12:31:41 -0500
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="----=_NextPart_000_005F_01CACB4D.F4FC82B0"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2800.1983
Disposition-Notification-To: "Person from" <email address>
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1983
There is a lot more after that but should give you an idea... hope I replaced all the stuff I should have 
|
|
|
|
|
Ok so to speed up that section you are suggesting to create a class that passes work to lets say 4 background workers? That sounds really good, but I've never done anything like that and how would I then return that info back to the Form? With a return? I would also loose my updates on the progressBar would I not?
Ok my brain is really starting to hurt now, thankfully I've only got 10 min left in my day right now... will have to get back to this tomorrow!
Thank you all for everything thus far...
|
|
|
|
|
MacRaider4 wrote: return that info back to the Form
Well, I question the use of a form at all; I'd use a Windows Service, but that's just me. You can have a Service that pulls the data into the database and then the form pulls it (already fluffed and folded) from there.
Or you could use an event.
|
|
|
|
|
I'll have to look up services as I've never done anything with that before. Though I will say doing this project has made me a better C# programmer, at this rate in another year I'll be answering some of these questions for other people.
Some one else mentioned just doing this all in one pass, now that I'm looking back at my code I think that is a very good idea. Is this something I could do with the service or event?
I could then do my initial pass to get the number, then have a couple workers work on the list storing the data in arrays. Once those are done combine the arrays or better yet just have the arrays loaded stright into the database which should take no time at all even with checking to make sure that message isn't already there?
|
|
|
|
|
MacRaider4 wrote: have the arrays loaded stright into the database
Right. The Service would periodically (once a minute?) query the email server for messages, if there are some, get them, process them, and stick the results in the database. You could still use a thread to process each message in necessary.
Depending on your needs, you could then have the same Service host a WCF Web Service that your client application can use to get the data. 
|
|
|
|
|
That was my original intent once I got it working, just didn't know about the service part.
So let me see if I have this right now:
1. Log into the server and get the number of messages
2. Decide if I need to use a bgw and how many
3. Do the work with no or a couple workers:
a. have the worker/s log in with the number of account/s each is processing
b. process the "entire" message all at once and store in an array
4. Update the database
5. Have the form check fo updates?
Do this sound about right?
|
|
|
|
|
Yeah, basically. But remember that I'm not familiar with reading messages from an email server, so I don't understand the "a. have the worker/s log in with the number of account/s each is processing" part.
I would have a worker read a message, process it, and stick it in the database; then maybe get another.
Or read all the available messages and pass them to the workers.
There are many ways to skin this cat.
|
|
|
|
|
 Still can't get my 3rd worker to "work" as I'm still getting the backgroundWorker3 does not exist in the current context in the first occurance of each line in InitializeBackgroundWorker3. So that's putting a hinderance on everything.
public Form1()
{
InitializeComponent();
InitializeBackgroundWorker();
InitializeBackgroundWorker2();
InitializeBackgroundWorker3();
btnGetMessageInfo.Enabled = false;
btnCancelConnection.Enabled = false;
}
private void InitializeBackgroundWorker()
{
backgroundWorker1.DoWork += new DoWorkEventHandler(backgroundWorker1_DoWork);
backgroundWorker1.RunWorkerCompleted += new RunWorkerCompletedEventHandler(backgroundWorker1_RunWorkerCompleted);
backgroundWorker1.ProgressChanged += new ProgressChangedEventHandler(backgroundWorker3_ProgressChanged);
}
private void InitializeBackgroundWorker2()
{
backgroundWorker2.DoWork += new DoWorkEventHandler(backgroundWorker2_DoWork);
backgroundWorker2.RunWorkerCompleted += new RunWorkerCompletedEventHandler(backgroundWorker2_RunWorkerCompleted);
backgroundWorker2.ProgressChanged += new ProgressChangedEventHandler(backgroundWorker2_ProgressChanged);
}
private void InitializeBackgroundWorker3()
{
backgroundWorker3.DoWork += new DoWorkEventHandler(backgroundWorker3_DoWork);
backgroundWorker3.RunWorkerCompleted += new RunWorkerCompletedEventHandler(backgroundWorker3_RunWorkerCompleted);
backgroundWorker3.ProgressChanged += new ProgressChangedEventHandler(backgroundWorker3_ProgressChanged);
}
So what I have now is when you click on the connect button (first thing you can do), it logs into the server and gets the total number and size of each message. And stores that in a global variable and displays some info on the form.
Then it figures out how many workers to use (based on number of messages)and assigns start and end values for each worker.
I'm now in the process of writing the work for the workers (focusing on 1 and 2 since only those work).
It's moving along, though slowly.
modified on Friday, February 11, 2011 1:33 PM
|
|
|
|
|
This is my first foray into real data binding in C# and .NET so forgive me if it's a stupid question. I have consulted MSDN and a number of CP articles but still cannot quite get my head round it.
I have a SortedList of objects (of my own Class) which I can successfully bind to my DataGridView so all the correct details are visible. I have also enabled "user can delete rows" on the DataGridView . However, when I delete some rows, the data is not removed from the datasource (the SortedList ) as I thought it would be. I presume I have to add some other property, or implement some interface method in my class to make this happen.
Any suggestions?
I must get a clever new signature for 2011.
|
|
|
|
|
|
I don't quite see how that fits (the MSDN documentation on this subject really sucks). I have a SortedList bound to a DataGridView and I want the element(s) of the SortedList to be deleted when the user deletes the related row in the DataGridView . So where do I implement INotifyCollectionChanged ? Or are you saying that I should provide my own collection class as the BindingSource ?
I must get a clever new signature for 2011.
|
|
|
|
|
Richard MacCutchan wrote: Or are you saying that I should provide my own collection class as the BindingSource?
Yup. I'm usually using an ObservableCollection as it already has this functionality, not a SortedCollection. To quote the site;
"However, to set up dynamic bindings so that insertions or deletions in the collection update the UI automatically, the collection must implement the INotifyCollectionChanged interface. This interface exposes the CollectionChanged event that must be raised whenever the underlying collection changes."
I are Troll
|
|
|
|
|
That all sounds fine, and I just found an article which covers this subject in some detail. However, my issue is that I want to be notified when the content of the DataGrid changes (user deletes a row) so I can update the contents of the data source (i.e. the collection). Both of the aforementioned features seem to be about notifying the UI when the underlying collection changes, but in my case the collection will not change on its own.
I must get a clever new signature for 2011.
|
|
|
|
|
Richard MacCutchan wrote: my issue is that I want to be notified when the content of the DataGrid changes (user deletes a row) so I can update the contents of the data source (i.e. the collection).
You were talking about a DataGridView earlier. The DataGrid has a RowDeleting[^] event, the DataGridView has a OnRowsRemoved[^].
Just dropped a grid on a form, and printed the count of the DGV-rows in that event. It does not alter the collection, only the rows in the DGV itself. ..dunno why, but something tells me that this event is probably not being raised in your project
Richard MacCutchan wrote: Both of the aforementioned features seem to be about notifying the UI when the underlying collection changes
Yes, a biased mind
I are Troll
|
|
|
|
|
Eddy Vluggen wrote: You were talking about a DataGridView earlier.
I've been talking about it throughout this thread; indeed it is part of my subject line.
Eddy Vluggen wrote: something tells me that this event is probably not being raised in your project
Which is the entire issue I am struggling with. I delete some rows in the DataGridView but nothing changes in my data source (the SortedList ). I thought data binding was there to solve problems, not to create them!
To recap: I have a SortedList bound to a DataGridView . When I delete a row from the DataGridView I expect the corresponding entry to be removed from the SortedList but I cannot find any examples or documentation to help explain why it doesn't happen. I'm somewhat surprised that no-one has come across this problem before.
I must get a clever new signature for 2011.
|
|
|
|