Luc <
[email protected]> wrote:
On Sat, 19 Nov 2022 23:07:09 -0000 (UTC), Rich wrote:
First question: Are you talking about an Sqlite db, or another
(Oracle/PostgreSQL/Mysql/etc.) db?
Strictly Sqlite. Nothing corporate. Very humble home desktop application.
Neither PostgreSQL nor Mysql imply 'corporate' -- both being 'free to
use'. Oracle, well, at the price for an oracle license, yes, it would
imply corporate.
One very common pitfall: Did you create indexes on the columns that
were part of the 'where' clauses used for selecting rows from the
database?
Without indexes, the database engine will be doing what is known as
'full table scans' - looking through every row in the table being
selected from for any rows that match the 'where' clause limits.
Full table scans, esp. if you repeat them for every select, produce
in the end an O(N^2) complexity factor.
Excuse me, did you say 'indexes' in the plural?
Yes, intentionally so.
Yes, you did. How many am I supposed to create then?
As many as are necessary for your needs (yes, I know, a vague answer,
but it /is/ the answer I can give, since you've revealed insufficient information about your actual design for me to provide more than that
level of 'general answer'.
It's an honest question. I suspect I have been overestimating the
efficiency of databases. Do I have to create one index per column?
The answer there depends upon what kinds of queries you plan to run.
For example, lets take a 'music database' (i.e., catalog of CD's, vinyl records, compressed audio files). You might have a table like (note,
all made up here on the fly):
create table music (artist text, album text, disk_num text,
song text, format text, shelf text, row integer,
column integer);
Where 'format' might be "CD" or "Vinyl" or "mp3" or "flac".
If all you ever did was retrieve all records from this table, you would
'need' zero indexes (note, I'm skipping the 'need' for a unique index
if you want to avoid duplicate rows).
But, lets say you often want to retrieve just your collection of Vinyl
entries. You'd run this query:
db eval {select * from music where format = 'Vinyl';}
And with on indexes, that would have to look at every row to decide
which rows had 'Vinyl' in the format column.
But, if you indexed the format column:
db eval {create index format_idx on music(format);}
Then the same query can utilize the index to (essentially) directly
retrieve only the rows with "Vinyl" as format, without ever looking at
any other rows.
If you commonly want to look for "music by artist A, and in flac
format" (note, made up query again), you would do:
db eval {select * from music where artist = 'Frank Sinatra' and
format = 'flac';}
Again, with no index this looks through the entire set of rows. But
with an index:
db eval {create index my_new_idx on music(artist, format);}
Then the same query can retrieve all of Frank's music, that is in flac
format, without looking at all the other rows (and it is the 'without
looking at the other rows' that provides the speedup, it literally does
not look at the other non-Sinatra rows).
So the answer to your question is: you need indexes that cover the
queries you are performing and that you want to speed up. But we don't
know enough details to recommend more than that.
My database has an index, but it's a 'unique' column which I thought
would be good enough because I admit I don't fully understand indices.
That index is not used in my WHERE queries.
If the unique column is never part of a where clause, it is never used
for a retrieval query.
So, 1) How many indices do I have to create?
The number necessary to cover your slow queries.
2) Should I use indices or views?
Given that your questions imply this is your initial foray into the
world of SQL queries, just stick to indexes for the moment. You can
level-up to views once you've become more proficient at the next level
down basics.
I did some reading at bed time
yesterday and found this page (among many others):
https://stackoverflow.com/questions/439056/is-a-view-faster-than-a-simple-query
You may skip to "Update 2" where he mentions Lithuania. He argues that indices may not be the cat's pajamas after all.
He's discussing a rather advanced 'view' there where the view looks to
be an actual materialized view containing only the subset of the
records to which the create view statement pertains. This is several
levels up in SQL knowledge from where your postings imply you are now,
and likely unnecessary for what you are trying to do. Second, I'm not
sure Sqlite has added support for materialized views, and if not, you
can't (yet) do with sqlite what this poster is discussing.
Anyway, do you think there could be some hazard or pitfall in my
way for doing this? What do the elder of the village say?
Yes, you could eventually encounter a dataset too large to "slurp"
into RAM all up front.
If hundreds of thousands of lines take up 80MB, how many lines does it
take to cause someone to run out of RAM with modern computers?
Note that 80MB on disk does not equate to 80MB of RAM if you hoover
everything into TCL lists or arrays. This is due to Tcl_Obj overhead
(see
https://www.tcl.tk/man/tcl/TclLib/Object.html for the C definition
of a Tcl_Obj). On a 64-bit machine, this is somewhere around at least
5*8bytes (40 bytes) of overhead for each Tcl_Obj. If you retrieve each
row as a single string, and keep it a single string, then your in Tcl
memory footprint will be the byte length of each string, plus 40 bytes
per string, so for 100,000 rows you'd have 4,000,000 bytes just in
Tcl_Obj overhead.
If, instead, you convert each row into a nested list, and you have
100,000 rows, each row having 8 columns, then you get somewhere around 100000*40 + 100000*8*40 or 36,000,000 bytes for the Tcl_Obj overhead (each
row is a list, 40 byte Tcl_Obj per row, each element of each row is
itself another Tcl_Obj, 40 bytes each). And that does not count the
space taken up by the actual data in each list element.
As for "how much does your dataset take" -- well on that point you'll
have to do some of the math yourself.
How does a database like Sqlite manage its own RAM use footprint?
You'd have to peruse the Sqlite source to know that with certainty.
But one way it does is by not (intentionally) hoovering everything into
ram up-front. Now, with sufficient spare cache, eventually much of the
file will be cached by the OS, giving the effect of "it is all in RAM
now" as far as disk reads go, but OS caching of the file is not Sqlite intentionally "hoovering it all in up front".
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)