|
Mycroft Holmes wrote: do some serious record by record processing Will you be writing to that record (manipulating it), or will the result be written/aggregated elsewhere?
Mycroft Holmes wrote: for multi million recordsets Does it have to process the original data, or could you work from a copy?
Mycroft Holmes wrote: Both processes would run on the same server or at least of the same spec. "Could" run - if you were to cut that table in five, you could use five clients to do the processing.
That need not be dedicated clients; if you're on a network, then there are a few computers on there that are probably idle at some point. Easiest way to detect that is by writing a screensaver.
If you go this route, please write an article and hand us the resulting code
--edit
Y'er link is broken.
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
modified 16-Feb-14 6:48am.
|
|
|
|
|
We will need to extract the records (probably up to 1m), process each record 36 to 120 times and write the information back to the DB.
The original data is already a copy, partially aggregated as much as possible.
Really not interested in getting into a processing farm, another team failed to do this and I'll stick to stored procs if that is a requirement. A commercial package takes approx 4 hours to process and we don't need their level of complexity so it should be do able.
Never underestimate the power of human stupidity
RAH
|
|
|
|
|
It's heavily depends on the nature of the process...
1. There are computations involved?
2. There are string processing involved?
3. It's an in-place update or the results go to new table?
4. What the volume of the data we talk about?
I'm not questioning your powers of observation; I'm merely remarking upon the paradox of asking a masked man who he is. (V)
|
|
|
|
|
1. Yes per record dependant on the previous transaction (think compound interest on steroids)
2. No all decimal - some referencing string values external to the transaction
3. Indeterminate - this will depend on the design we take on, my preferences will be for insert as it ia faster than update
4. 4+ million records each pass.
I expect the passes to be broken down to smaller chunks but 1m is not unfeasable.
Never underestimate the power of human stupidity
RAH
|
|
|
|
|
IMHO - do it in SQL.
Do running computations (via the select itself) and put the result in some temporary table - then use it as you wish...
Passing 4+ million records over the net can be very painful. How ever a well configured SQL with properly indexed table can handle 4+ million records with no problem...
Try to turn the dependencies into some parameters (pre-calculated maybe in the previous run or fixed from some external source) if you can, so even string manipulations will not slow you down...
In an optimal case you my turn it into some SQL job and left it alone for ages...
I'm not questioning your powers of observation; I'm merely remarking upon the paradox of asking a masked man who he is. (V)
|
|
|
|
|
Kornfeld Eliyahu Peter wrote: do it in SQL
That is my default option but I don't want to eliminate what may be a better solution just because I am comfortable with TSQL.
Trust me the various transaction and working tables will be indexed within an inch of their life, one reason I want to use inserts instead of updates.
Never underestimate the power of human stupidity
RAH
|
|
|
|
|
I can't see no reason to do in SQL - as you have no nothing that beyond the capabilities of SQL, why add networking (even in server data transfer will add to it...)?
Can you explain why you doubted about SQL in the first place?
I'm not questioning your powers of observation; I'm merely remarking upon the paradox of asking a masked man who he is. (V)
|
|
|
|
|
I don't think it is a doubt SQL, but rather, an examine all possibilities. I'd do the same; examine all possibilities if no to provide justification when asked why I picked what I did.
Tim
|
|
|
|
|
Tim is right, in my original post I made it clear I was investigating options put forward by one of my senior devs. Ignoring the option would be a disservice to that dev.
Never underestimate the power of human stupidity
RAH
|
|
|
|
|
I would say that performance-wise an database engine is going to be faster than shifting all that data over the network to the client, processing it and sending it back again.
With the proviso that you set up the indexes correctly I would say that SQL will be the fastest method of processing the data from a computer point of view(in terms of the human side and GUI etc. that is something only you would know about).
Go with SQL as that is what you are most comfortable with and correctly created indexes can helps thing fly.
I would avoid cursors and use temporary tables(don’t forget to add indexes to the temporary tables too) doing the processing in steps – my experience is that this is the fastest way of processing large quantities of data.
“That which can be asserted without evidence, can be dismissed without evidence.”
― Christopher Hitchens
|
|
|
|
|
We are well experienced with TSQL and the importance of indexing correctly, it just that one of the senior devs suggested a c# solution so I though I'd get some other opinions.
I have a rule of thumb that uses table vars for small reference type info, temp table with indexing for serious volume and cursors only under duress.
Never underestimate the power of human stupidity
RAH
|
|
|
|
|
Sounds like you know exactly what you need to use
I get the checking with others because a senior dev suggested something
“That which can be asserted without evidence, can be dismissed without evidence.”
― Christopher Hitchens
|
|
|
|
|
There's a third option, you don't need to choose between C# or processing in the database.
You can do both using a CLR SQL Server User-Defined Function[^].
That should appeal to both you and that senior developer.
Note that I've never done it, so I can't say how much fuzz there is to it. But I know that MS invented it with performance in mind.
|
|
|
|
|
Per the other post... "process each record 36 to 120 times"
Given that it seems likely that the processing isn't going to be simple. So that suggests the TSQL is going to be rather CPU intensive.
So what is the expectation of other work that the database needs to do at the same time that this runs, now and in the future?
And what is the growth rate of the record set?
Does this occur every day?
Moving records out of and into the system is of concern but given the processing numbers above it is something I would consider. A separate application allows processing to be moved off box (easier at least with my experience.)
|
|
|
|
|
I have done this on both sides and the DB side is much much faster. I have big data modeling app (35 million transactions across 20 tables) with DB procedures and a big ETL app that has to exist and run on the MSoft client side. The DB is much much faster.
|
|
|
|
|
Quote:
select distinct(Emp_Status),
count ( Att_Mints/60) as OTHOUR,
COUNT (Att_Totalmints /60 ) as ProductionHours
from Attendence
inner join EmployeeMaster on fk_Att_EmpCode=fk_Att_EmpCode
where year(Att_Date)='2014' and Month (Att_Date)='1'
group by Emp_Status
in above query Count of
Att_Mints is =150
and count of
Att_Totalmints is 120
i want output of OTHOURS=1:30min
and
ProductionHours =2 hour
please help me
|
|
|
|
|
|
<pre lang="vb">DECLARE @CheckQuantity INT
SET @ParmDefinition = N'@CheckQuantity INT'
SET @SQL = N'
SET @CheckQuantity= 12'
PRINT @SQL
EXEC [dbo].sp_executesql @SQL, @ParmDefinition,
@CheckQuantity=@CheckQuantity;
PRINT @CheckQuantity</pre>
It Should Print 12 but the output is NULL , Please tell me How to SET VALUE For the VAriable declare outside the SQL Text cretaed for dynamic query
|
|
|
|
|
any one who has solution for this
|
|
|
|
|
You need to make the parameter an OUTPUT parameter.
https://support.microsoft.com/kb/262499[^]
DECLARE @CheckQuantity INT;
SET @ParmDefinition N'@CheckQuantity INT OUTPUT';
SET @SQL = N'SET @CheckQuantity= 12';
PRINT @SQL;
EXEC [dbo].sp_executesql @SQL, @ParmDefinition, @CheckQuantity = @CheckQuantity OUTPUT;
PRINT @CheckQuantity;
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
Thank you friend it's working now
|
|
|
|
|
Hi Team,
We have an application called as TMART. Basically TMART is used to monitor each application such as web,citrix etc.
Under a particular project we have multiple monitor that are getting monitored.
We would like to have a sql query through which we can pull the availability error.
Can some one please provide us with the sql query for the same.
Its bit urgent...
Your response would be highly appreciated.
|
|
|
|
|
It is impossible to give you an answer without details of the tables and columns involved, so please give us the information needed to help you.
=========================================================
I'm an optoholic - my glass is always half full of vodka.
=========================================================
|
|
|
|
|
Hi Chris,
I am really happy for the quick response.
But there is no table and columns involved into this.
We have the data as below:-
(Complete Time of the Period – (Addition of Failures Durations)) / (N° of Failures + 1)*
*for n failures, there n + 1 periods of functioning, and therefore:
For example for a month as January with 44 640 seconds for the complete period and errors as below:
• From 1/1/14 09:17 to 1/1/14 17:20, (error 1 duration = 483 minutes)
• From 14/1/14 17:20 to 15/1/14 07:40 (error 2 duration = 860 minutes)
• From 21/1/14 07:40 to 21/1/14 11:12 (error 3 duration = 212 minutes)
The MTBF would be:
(44 640 - (483 + 860 + 212)) / 4 = 5 385 minutes 30s, MTBF is 89 hours 45mn 30s = 3 days 17 hours 45mn 30s.
This is what we require for... 
|
|
|
|
|
If there are not tables or columns, mhow can you expect a SQL query to work?
How is the data held, where is the data held?
=========================================================
I'm an optoholic - my glass is always half full of vodka.
=========================================================
|
|
|
|