|
Is this a NoSQL table, or a flattened table as is common in high performance environments such as banking?
|
|
|
|
|
it is a high performance table in Teradata...
diligent hands rule....
|
|
|
|
|
What type of database? If it's SQL Server use
exec sp_help '<your table name>' , Oracle:
sp_helptable '<your table name>' , anything else - use Google. Informix or Interbase - you are out of luck...
Advertise here – minimum three posts per day are guaranteed.
|
|
|
|
|
very helpful. Thank you!
diligent hands rule....
|
|
|
|
|
Southmountain wrote: now I have a big table in a database, what is the best way to get understanding of this table quickly? Documentation. If there's a table, there's a developer and there should be documentation.
Southmountain wrote: it has hundreds of fields and I only know some keys. Hundreds of fields??
DROP TABLE would be the best start; no normalized table contains that much fields.
I'm serious; no such table should exist. You asking how to understand it implies no documentation either.
Name your company.
Bastard Programmer from Hell
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
Eddy Vluggen wrote: DROP TABLE would be the best start;
lol...
I worked with a table like that. It was batch loaded every night from some other mysterious source that was definitely COBOL and probably DB2. What I did know was that on the COBOL side they had reached the maximum number of columns of the system. It would not allow them to add any more.
I think there was something like 300 or 400 columns.
But 200 or so were just for a single indexed value. So something like column 30 had an int. Then the value in that column pointed to one of another sequential 200 columns with a value. The other 200 columns were null.
Probably could not have dropped it. It held credit card transaction data.
|
|
|
|
|
If you design a database, you normalize the model.
jschell wrote: I think there was something like 300 or 400 columns Give or take 50 columns.
That's not design, that's a disaster.
jschell wrote: Probably could not have dropped it. It held credit card transaction data. That's why I stopped visiting the hospital. I don't wont to die by VB6.
Bastard Programmer from Hell
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
Eddy Vluggen wrote: That's why I stopped visiting the hospital. I don't wont to die by VB6 You'll be fine if you believe in reincarnation...
On Error Resume Next Life
|
|
|
|
|
In the early 1980s, one model that was proposed was 'the universal relation'. The data base had a single relation (table), for all applications. A new application might need some new fields/columns, and added those, but usually it also made use of columns already in the universal relation.
There was at least one implementation of this model - I'm sorry, I can't remember what it was called - and the developers claimed that having everything in one relation drastically simplified some query optimizations. I see that the idea even has a brief Wikipedia entry: Universal relation assumption[^] stating that "real database designs is often plagued with a number of difficulties". So there were reasons why it didn't succeed. Yet, it did have some pros and benefits as well. Maybe those designing this relation you have been introduced to were trying to collect some of those.
The Wikipedia article links to a slide set for a talk, "Who won the Universal Relation war?". It is very much a slide set - you can't learn much about Universal Relations from it. But it gives you a certain impression of the magnitude and intensity of the debate, 30-40 years ago.
|
|
|
|
|
There's a good reason why it is not practiced anymore:
It didn't work.
--edit
I still like the story though.
Bastard Programmer from Hell
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
The only acceptable reason for having such a flat schema would be performance, and there are many, many better ways of capturing the performance required if that's a concern. At the heart of it, that relations table would have a lot of null values and would seem to only simply joins - in which case perhaps they should just better learn SQL views if they wish to reduce joins.
For the performance side, if read speed needs to be optimized, it's ok to have a flatten, cached table or NoSQL doc storage with flattened data that is hydrated from the unflattened tabled in a one-way sync. But the core data model that's the source of truth shouldn't be janky.
Jeremy Falcon
|
|
|
|
|
 Although it used SQL as the backend, I remember a Customer Relationship Management system called Maximiser that took a similar approach. There were, ISTR just two tables, one to hold all the relatively constant client data itself and one to hold the collection of notes linked to that.
In the Maximiser app there were complex joins on one table producing 'subtables' that held various views on the data. Some columns contained numbers that indicated what other columns actually held! I was given the job of moving all the data held in this system to another SQL based program.
It took ages (in the absence of any database schema documentation) to unravel the various actual combinations of joins required to get what we wanted. Here's just one query to extract a little of the info: All the tables named as a, b, c, d etc duplicate joins used in 'built-in' queries on the maximiser database.
I thought you might find an example of the stuff I had to build mildly amusing 8)
-- Build the View of the Maximiser data that shows what we want and store it
SELECT
CASE
WHEN c.Record_Type = 1 THEN c.Name
WHEN c.Record_Type = 31 THEN d.Name + ' - ' + c.First_Name + ' ' + c.Name
WHEN c.Record_Type = 2 AND len(c.Firm) > 0 THEN c.Firm
WHEN c.Record_Type = 2 AND len(c.Firm) < 1 THEN c.First_Name + ' ' + c.Name
WHEN c.Record_Type = 32 THEN
(
CASE
WHEN len(d.Firm) > 0 THEN d.Firm + ' - ' + c.First_Name + ' ' + c.Name
WHEN len(d.Firm) < 1 THEN d.First_Name + ' ' + d.Name + ' - ' + c.First_Name + ' ' + c.Name
END
)
ELSE c.Name
END AS Company,
CASE
WHEN c.Address_Id > 0 AND c.Record_Type IN (1, 31) THEN e.Address_Line_1
WHEN c.Address_Id < 1 AND c.Record_Type = 31 THEN g.Address_Line_1
WHEN c.Address_Id > 0 AND c.Record_Type IN (2, 32) THEN f.Address_Line_1
WHEN c.Address_Id < 1 AND c.Record_Type = 32 THEN g.Address_Line_1
ELSE c.Address_Line_1
END AS Address_1,
CASE
WHEN c.Address_Id > 0 AND c.Record_Type IN (1, 31) THEN e.Address_Line_2
WHEN c.Address_Id < 1 AND c.Record_Type = 31 THEN g.Address_Line_2
WHEN c.Address_Id > 0 AND c.Record_Type IN (2, 32) THEN f.Address_Line_2
WHEN c.Address_Id < 1 AND c.Record_Type = 32 THEN g.Address_Line_2
ELSE c.Address_Line_2
END AS Address_2,
CASE
WHEN c.Address_Id > 0 AND c.Record_Type IN (1, 31) THEN e.City
WHEN c.Address_Id < 1 AND c.Record_Type = 31 THEN g.City
WHEN c.Address_Id > 0 AND c.Record_Type IN (2, 32) THEN f.City
WHEN c.Address_Id < 1 AND c.Record_Type = 32 THEN g.City
ELSE c.City
END AS City,
CASE
WHEN c.Address_Id > 0 AND c.Record_Type IN (1, 31) THEN e.State_Province
WHEN c.Address_Id < 1 AND c.Record_Type = 31 THEN g.State_Province
WHEN c.Address_Id > 0 AND c.Record_Type IN (2, 32) THEN f.State_Province
WHEN c.Address_Id < 1 AND c.Record_Type = 32 THEN g.State_Province
ELSE c.State_Province
END AS State,
CASE
WHEN c.Address_Id > 0 AND c.Record_Type IN (1, 31) THEN e.Zip_Code
WHEN c.Address_Id < 1 AND c.Record_Type = 31 THEN g.Zip_Code
WHEN c.Address_Id > 0 AND c.Record_Type IN (2, 32) THEN f.Zip_Code
WHEN c.Address_Id < 1 AND c.Record_Type = 32 THEN g.Zip_Code
ELSE c.Zip_Code
END AS Zip,
CASE
WHEN c.Address_Id > 0 AND c.Record_Type IN (1, 31) THEN e.Country
WHEN c.Address_Id < 1 AND c.Record_Type = 31 THEN g.Country
WHEN c.Address_Id > 0 AND c.Record_Type IN (2, 32) THEN f.Country
WHEN c.Address_Id < 1 AND c.Record_Type = 32 THEN g.Country
ELSE c.Country
END AS Country,
CASE
WHEN n.Type = 0 THEN 'Manual Note'
WHEN n.Type = 1 THEN 'Mail - Out'
WHEN n.Type = 2 THEN 'Phone Call'
WHEN n.Type = 3 THEN 'Timed Note'
WHEN n.Type = 4 THEN 'Transfer'
WHEN n.Type = 5 THEN 'Task'
WHEN n.Type = 6 THEN 'Reserved'
WHEN n.Type = 7 THEN 'Reserved'
WHEN n.Type = 8 THEN 'Opportunity'
WHEN n.Type = 12 THEN 'Customer Service'
ELSE 'Unknown'
END AS Activity_Type,
n.DateCol, n.TextCol, n.Owner_Id, n.Client_Id, n.Contact_Number, n.Note_Type,
' ' AS sndex
INTO BookerNotes
FROM
dbo.AMGR_Client_Tbl AS c
LEFT OUTER JOIN dbo.AMGR_Client_Tbl AS d ON c.Client_Id = d.Client_Id AND d.Contact_Number = 0
LEFT OUTER JOIN dbo.AMGR_Client_Tbl AS e ON c.Client_Id = e.Client_Id AND c.Address_Id = e.Contact_Number
LEFT OUTER JOIN dbo.AMGR_Client_Tbl AS f ON c.Client_Id = f.Client_Id AND c.Address_Id = f.Contact_Number
LEFT OUTER JOIN dbo.AMGR_Client_Tbl AS g ON c.Client_Id = g.Client_Id AND g.Contact_Number = 0
RIGHT OUTER JOIN dbo.AMGR_Notes_Tbl AS n ON c.Client_Id = n.Client_Id AND c.Contact_Number = n.Contact_Number
WHERE c.Record_Type IN (1, 2, 31, 32)
GO
|
|
|
|
|
Ooof, that would be nasty to maintain!
|
|
|
|
|
I just saw something similar the other day with additional complications.
The developer that inherited was trying to figure out how to make it a little more maintainable by removing some conditions that were nonsensical and others that were just bad hard codes.
|
|
|
|
|
old posting, but reddit had Key Value, then moved to what sounds like a few tables which are thing/data, so basically instead of just 1 key/value table, its many more key/value tables
|
|
|
|
|
yes, it has limited documentation ....
diligent hands rule....
|
|
|
|
|
I will double check the table column number and have very limited documents...
diligent hands rule....
|
|
|
|
|
Southmountain wrote: it has hundreds of fields
Presumably you mean 'columns'...
Southmountain wrote: what is the best way to get understanding of this table quickly?
It is unlikely there is a way to do it quickly. The number of columns suggest it is probably overloaded so there are multiple uses. Best you might be able to do quickly is determine how the data is created in the first place. And that would only be true if it is just a batch load.
|
|
|
|
|
I found MS Access and Excel, with some SQL management studio, good enough for "data analysis".
Access and Excel can connect to SQL server. You can then tap into their analytics and query ability.
There's also MS Power BI (Desktop), to top it off.
"Before entering on an understanding, I have meditated for a long time, and have foreseen what might happen. It is not genius which reveals to me suddenly, secretly, what I have to say or to do in a circumstance unexpected by other people; it is reflection, it is meditation." - Napoleon I
|
|
|
|
|
following your ideas, I will try to load it into an Excel pivot table and play around with it...
diligent hands rule....
|
|
|
|
|
Having done this sort of thing in the past (and yes it was for the banking industry) you are going to need someone with domain knowledge, making an incorrect assumption on the relevance/relationship of a column can lead you down some nasty cul de sacs.
Never underestimate the power of human stupidity -
RAH
I'm old. I know stuff - JSOP
|
|
|
|
|
Delete a column, see who complains and then get them to explain what it's for.
// TODO: Insert something here Top ten reasons why I'm lazy
1.
|
|
|
|
|
see if you can get an input screen (or a few) and some reports and open/run it for a specific record. Next you need to try and see if you can match the data for a specific record to fields on the input screens or reports. That will give you a good understanding of how some of the fields fit together.
|
|
|
|
|
Southmountain wrote: what is the best way to get understanding of this table quickly?
There isn't one. There are tools and scripts for most DBs that can create some type of analysis for you, but you really don't need to understand the fields, you need to understand the data.
We can all grouse and speculate about the "100s of fields", but let's assume there is a valid reason for them even though I'm hard pressed to come up with one.
What type of understanding are you trying to achieve? Data is data and the question is if and where it is used. I'd suspect there could be a lot of drop columns in your future, but that requires a detailed look at your recordset objects in the code that is using the DB. Honestly, it's a flat table so despite the crazy column count, it should be clear to understand. If it has a bunch of relations, that could take a lot of caffeine or alcohol, or both
What's your scope of work in relation to this monster? Crazy as it seems, the DB could be oddly efficient depending on the use of the data. You know, Select * (perish the thought!) from tablename where id=x is pretty simple, lol. If your task is to clean up and reduce the size of the database, that is one thing. If you're stuck with it, it is what it is and how the data is used is of the utmost importance.
|
|
|
|
|
Understand ‘quickly?’: probably not possible.
Step back. Look at the application and the interfaces that update the table. Depending on the database tech, there’ll be a way to search procedures for the table name. Study these procedures.
Then (or while doing the above) look at a subset of the data, such as the last day’s worth of records.
Good luck.
Time is the differentiation of eternity devised by man to measure the passage of human events.
- Manly P. Hall
Mark
Just another cog in the wheel
|
|
|
|