Comments:" The Codex » Do Not Pass This Way Again "
URL:http://grimoire.ca/mysql/choose-something-else
Considering MySQL? Use something else. Already on MySQL? Migrate. For every successful project built on MySQL, you could uncover a history of time wasted mitigating MySQL's inadequacies, masked by a hard-won, but meaningless, sense of accomplishment over the effort spent making MySQL behave.
Thesis: databases fill roles ranging from pure storage to complex and interesting data processing; MySQL is differently bad at both tasks. Real apps all fall somewhere between these poles, and suffer variably from both sets of MySQL flaws.
Much of this is inspired by the principles behind PHP: A Fractal of Bad Design. I suggest reading that article too -- it's got a lot of good thought in it even if you already know to stay well away from PHP. (If that article offends you, well, this page probably will too.)
Storage
Storage systems have four properties:
Take and store data they receive from applications. Keep that data safe against loss or accidental change. Provide stored data to applications on demand. Give administrators effective management tools.In a truly "pure" storage application, data-comprehension features (constraints and relationships, nontrivial functions and aggregates) would go totally unused. There is a time and a place for this: the return of "NoSQL" storage systems attests to that.
Pure storage systems tend to be closely coupled to their "main" application: consider most web/server app databases. "Secondary" clients tend to be read-only (reporting applications, monitoring) or to be utilities in service of the main application (migration tools, documentation tools). If you believe constraints, validity checks, and other comprehension features can be implemented in "the application", you are probably thinking of databases close to this pole.
Storing Data
MySQL has many edge cases which reduce the predictability of its behaviour when storing information. Most of these edge cases are documented, but violate the principle of least surprise (not to mention the expectations of users familiar with other SQL implementations).
- Implicit conversions (particularly to and from string types) can modify
MySQL's behaviour.
- Many implicit conversions are also silent (no warning, no diagnostic), by design, making it more likely developers are entirely unaware of them until one does something surprising.
- Conversions that violate basic constraints (range, length) of the output
type often coerce data rather than failing.
- Sometimes this raises a warning; does your app check for those?
- This behaviour is unlike many typed systems (but closely like PHP and remotely like Perl).
- Conversion behaviour depends on a per-connection configuration value
(
sql_mode
) that has a large constellation of possible states, making it harder to carry expectations from manual testing over to code or from tool to tool. - MySQL uses non-standard and rather unique interpretations of several common
character encodings, including UTF-8 and Latin-1. Implementation details of
these encodings within MySQL, such as the
utf8
encoding's MySQL-specific 3-byte limit, tend to leak out into client applications. Data that does not fit MySQL's understanding of the storage encoding will be transformed until it does, by truncation or replacement, by default.- Collation support is per-encoding, with one of the stranger default configurations: by default, the collation orders characters according to Swedish alphabetization rules, case-insensitively.
- Since it's the default, lots of folks who don't know the manual
inside-out and backwards observe MySQL's case-insensitive collation
behaviour (
'a' = 'A'
) and conclude that "MySQL is case-insensitive", complicating any effort to use a case-sensitive locale. - Both the encoding and the collation can vary, independently, bycolumn. Do you keep your schema definition open when you write queries to watch out for this sort of shit?
- The
TIMESTAMP
type tries to do something smart by storing values in a canonical timezone (UTC), but it's done with so few affordances that it's very hard to even tell that MySQL's done a right thing with your data.- And even after that, the result of
foo < '2012-04-01 09:00:00'
still depends on what time of year it is when you evaluate the query, unless you're very careful with your connection timezone. TIMESTAMP
is also special-cased in MySQL's schema definition handling, making it easy to accidentally create (or to accidentally fail to create) an auto-updating field when you didn't (did) want one.DATETIME
does not get the same timezone handlingTIMESTAMP
does. What? And you can't provide your own without resorting to hacks like extra columns.- Oh, did you want to use MySQL's timezone support? Too bad, none of
that data's loaded by default. You have to process the OS's
tzinfo
files into SQL with a separate tool and import that. If you ever want to update MySQL's timezone settings later, you need to take the server down just to make sure the changes apply.
- And even after that, the result of
Preserving Data
... against unexpected changes: like most disk-backed storage systems, MySQL is as reliable as the disks and filesystems its data lives on. MySQL makes very little effort to do its own storage validation and error correction, but this is a limitation shared with many, many other systems.
The implicit conversion rules that bite when storing data also bite when
asking MySQL to modify data - my favourite example being a fat-fingeredUPDATE
query where a mistyped =
(as -
, off by a single key) caused 90%
of the rows in the table to be affected, instead of one row, because of
implicit string-to-integer conversions.
... against loss: hoo boy. MySQL, out of the box, gives you three approaches to backups:
- Take "blind" filesystem backups with
tar
orrsync
. Unless you meticulously lock tables or make the database read-only for the duration, this produces a backup that requires crash recovery before it will be usable, and can produce an inconsistent database.- This can bite quite hard if you use InnoDB, as InnoDB crash recovery takes time proportional to both the number of InnoDB tables and the total size of InnoDB tables, with a large constant.
- Dump to SQL with
mysqldump
: slow, relatively large backups, and non-incremental. - Archive binary logs: fragile, complex, over-configurable, and configured badly by default. (Binary logging is also the basis of MySQL's replication system.)
If neither of these are sufficient, you're left with purchasing a backup tool from Oracle or from one of the third-party MySQL vendors.
Like many of MySQL's features, the binary logging feature istooconfigurable,
while still, somehow, defaulting to modes that are hazardous or surprising:
thedefaultbehaviour
is to log SQL statements, rather than logging their side effects. This has
lead to numerous bugs over the years; MySQL (now) makes an effort to make
common "non-deterministic" cases such as NOW()
and RANDOM()
act
deterministically but these have been addressed using ad-hoc solutions.
Restoring binary-log-based backups can easily lead to data that differs from
the original system, and by the time you've noticed the problem, it's too late
to do anything about it.
(Seriously. The binary log entries for each statement contain the "current" time on the master and the random seed at the start of the statement, just in case. If your non-deterministic query uses any other function, you're stillfucked by default.)
Additionally, a number of apparently-harmless features can lead to backups or replicas wandering out of sync with the original database, in the default configuration:
AUTO_INCREMENT
andUPDATE
statements.AUTO_INCREMENT
andINSERT
statements (sometimes). SURPRISE.- Triggers.
- User-defined (native) functions.
- Stored (procedural SQL) functions.
DELETE ... LIMIT
andUPDATE ... LIMIT
statements, though if you use these, you've misunderstood how SQL is supposed to work.INSERT ... ON DUPLICATE KEY UPDATE
statements.- Bulk-loading data with
LOAD DATA
statements. - Operations on floating-point values.
Retrieving Data
This mostly works as expected. Most of the ways MySQL will screw you happen when you store data, not when you retrieve it. However, there are a few things that implicitly transform stored data before returning it:
MySQL's surreal type conversion system works the same way during
SELECT
that it works during other operations, which can lead to queries matching unexpected rows:owen@scratch> CREATE TABLE account ( -> accountid INTEGER -> AUTO_INCREMENT -> PRIMARY KEY, -> discountid INTEGER -> ); Query OK, 0 rows affected (0.54 sec) owen@scratch> INSERT INTO account -> (discountid) -> VALUES -> (0), -> (1), -> (2); Query OK, 3 rows affected (0.03 sec) Records: 3 Duplicates: 0 Warnings: 0 owen@scratch> SELECT * -> FROM account -> WHERE discountid = 'banana'; +-----------+------------+ | accountid | discountid | +-----------+------------+ | 1 | 0 | +-----------+------------+ 1 row in set, 1 warning (0.05 sec)
Ok, unexpected, but there's at least a warning (do your apps check for those?) - let's see what it says:
owen@scratch> SHOW WARNINGS; +---------+------+--------------------------------------------+ | Level | Code | Message | +---------+------+--------------------------------------------+ | Warning | 1292 | Truncated incorrect DOUBLE value: 'banana' | +---------+------+--------------------------------------------+ 1 row in set (0.03 sec)
I can count on one hand the number of
DOUBLE
columns in this example and still have five fingers left over.You might think this is an unreasonable example: maybe you should always make sure your argument types exactly match the field types, and the query should use
57
instead of'banana'
. (This does actually "fix" the problem.) It's unrealistic to expect every single user to runSHOW CREATE TABLE
before every single query, or to memorize the types of every column in your schema, though. This example derived from a technically-skilled but MySQL-ignorant tester examining MySQL data to verify some behavioural changes in an app.Actually, you don't even need a table for this:
SELECT 0 = 'banana'
returns1
. Did the PHP folks design MySQL's=
operator?This isn't affected by
sql_mode
, even though so many other things are.
TIMESTAMP
columns (and onlyTIMESTAMP
columns) can return apparently-differing values for the same stored value depending on per-connection configuration even during read-only operation. This is done silently and the default behaviour can change as a side effect of non-MySQL configuration changes in the underlying OS.- String-typed columns are transformed for encoding on output if the connection is not using the same encoding as the underlying storage, using the same rules as the transformation on input.
- Values that stricter
sql_mode
settings would reject during storage can still be returned during retrieval; it is impossible to predict in advance whether such data exists, since clients are free to setsql_mode
to any value at any time.
Efficiency
For purely store-and-retrieve applications, MySQL's query planner (which transforms the miniature program contained in each SQL statement into a tree of disk access and data manipulation steps) is sufficient, but only barely. Queries that retrieve data from one table, or from one table and a small number of one-to-maybe-one related tables, produce relatively efficient plans.
MySQL, however, offers a number of tuning options that can have dramatic and counterintuitive effects, and the documentation provides very little advice for choosing settings. Tuning relies on the administrator's personal experience, blog articles of varying quality, and consultants.
- The MySQL query cache defaults to a non-zero size in some commonly-installed configurations. However, the larger the cache, the slower writes proceed: invalidating cache entries that include the tables modified by a query means considering every entry in the cache. This cache also uses MySQL's LRU implementation, which has its own performance problems during eviction that get worse with larger cache sizes.
- Memory-management settings, including
key_buffer_size
andinnodb_buffer_pool_size
, have non-linear relationships with performance. The standard
advice advises making whichever value you care about more to a large value, but this can be counterproductive if the related data is larger than the pool can hold: MySQL is once again bad at discarding old buffer pages when the buffer is exhausted, leading to dramatic slowdowns when query load reaches a certain point.- This also affects filesystem tuning settings such as
table_open_cache
.
- This also affects filesystem tuning settings such as
- InnoDB, out of the box, comes configured to use one large (and automatically
growing) tablespace file for all tables, complicating backups and storage
management. This is fine for trivial databases, but MySQL provides no tools
(aside from
DROP TABLE
and reloading the data from an SQL dump) for transplanting a table to another tablespace, and provides no tools (aside from a filesystem-levelrm
, and reloading all InnoDB data from an SQL dump) for reclaiming empty space in a tablespace file. - MySQL itself provides very few tools to manage storage; tasks like storing large or infrequently-accessed tables and databases on dedicated filesystems must be done on the filesystem, with MySQL shut down.
Data Processing
Data processing encompasses tasks that require making decisions about data and tasks that derive new data from existing data. This is a huge range of topics:
- Deciding (and enforcing) application-specific validity rules.
- Summarizing and deriving data.
- Providing and maintaining alternate representations and structures.
- Hosting complex domain logic near the data it operates on.
The further towards data processing tasks applications move, the more their SQL resembles tiny programs sent to the data. MySQL is totally unprepared for programs, and expects SQL to retrieve or modify simple rows.
Validity
Good constraints are like assert
s: in an ideal world, you can't tell if they
work, because your code never violates them. Here in the real world,
constraint violations happen for all sorts of reasons, ranging from buggy code
to buggy human cognition. A good database gives you more places to describe
your expectations and more tools for detecting and preventing surprises.
MySQL, on the other hand, can't validate your data for you, beyond simple (and
fixed) type constraints:
As with the data you store in it, MySQL feels free to change your table definitions implicitly and silently. Many of these silent schema changes have important performance and feature-availability implications.
Foreign keys are ignored if you spell them certain, common, ways:
CREATE TABLE foo ( -- ..., parent INTEGER NOT NULL REFERENCES foo_parent (id) -- , ... )
silently ignores the foreign key specification, while
CREATE TABLE foo ( -- ..., parent INTEGER NOT NULL, FOREIGN KEY (parent) REFERENCES foo_parent (id) -- , ... )
preserves it.
Foreign keys, one of the most widely-used database validity checks, are an engine-specific feature, restricting their availability in combination with other engine-specific features. (For example, a table cannot have both foreign key constraints and full-text indexes, as of MySQL 5.5.)
- Configurations that violate assumptions about foreign keys, such as a foreign key pointing into a MyISAM or NDB table, do not cause warnings or any other diagnostics. The foreign key is simply discarded. SURPRISE. (MySQL is riddled with these sorts of surprises, and apologists lean very heavily on the "that's documented" excuse for its bad behaviour.)
- The MySQL parser recognizes
CHECK
clauses, which allow schema developers to make complex declarative assertions about tuples in the database, butdiscards them without warning. If you wantCHECK
-like constraints, you must implement them as triggers - but see below... - MySQL's comprehension of the
DEFAULT
clause is, uh, limited: only constants are permitted, except for the special case of at most oneTIMESTAMP
column per table and at most one sequence-derived column. Who designed this mess?- Furthermore, there's no way to say "no default" and raise an error when
an INSERT forgets to provide a value. The default
DEFAULT
is eitherNULL
or a zero-like constant (0
,''
, and so on). Even for types with no meaningful zero-like values (DATETIME
).
- Furthermore, there's no way to say "no default" and raise an error when
an INSERT forgets to provide a value. The default
- MySQL has no mechanism for introducing new types, which might otherwise provide a route to enforcing validity. Counting the number of special cases in MySQL's existing type system illustrates why that's probably unfixable.
I hope every client with write access to your data is absolutely perfect, because MySQL cannot help you if you make a mistake.
Summarizing and Deriving Data
SQL databases generally provide features for doing "interesting" things with sets of tuples, and MySQL is no exception. However, MySQL's limitations mean that actually processing data in the database is fraught with wasted money, brains, and time:
- Aggregate (
GROUP BY
) queries run up against limits in MySQL's query planner: a query with bothWHERE
andGROUP BY
clauses can only satisfy one constraint or the other with indexes, unless there's an index that covers all the relevant fields in both clauses, in the right order. (What this order is depends on the complexity of the query and on the distribution of the underlying data, but that's hardly MySQL-specific.)- If you have all three of
WHERE
,GROUP BY
, andORDER BY
in the same query, you're more or less fucked. Good luck designing a single index that satisfies all three.
- If you have all three of
- Even though MySQL allows database administrators to define normal functions in a procedural SQL dialect,custom aggregate functions can only be defined by native plugins. Good thing, too, because procedural SQL in MySQL is its own kind of awful - more on that below.
- Subqueries are often convenient and occasionally necessary for expressing
multi-step transformations on some underlying data. MySQL's query planner
has only one strategy for optimizing them: evaluate the innermost query as
written, into an in-memory table, then use a nested loop to satisfy joins or
IN
clauses. For large subquery results or interestingly nested subqueries, this is absurdly slow.- MySQL's query planner can't fold constraints from outer queries into subqueries.
- The generated in-memory table never has any indexes, ever, even when appropriate indexes are "obvious" from the surrounding query; you cannot even specify them.
- These limitations also affect views, which are evaluated as if they were subqueries. In combination with the lack of constraint folding in the planner, this makes filtering or aggregating over large views completely impractical.
- MySQL lacks common table expressions. Even if subquery efficiency problems get fixed, the inability to give meaningful names to subqueries makes them hard to read and comprehend.
- I hope you like
CREATE TEMPORARY TABLE AS SELECT
, because that's your only real alternative.
- Window functions do not exist at all in MySQL. This complicates many kinds of analysis, including time series analyses and ranking analyses.
- Even interesting joins run into trouble. MySQL's query planner has trouble
with a number of cases that can easily arise in well-normalized data:
- Joining and ordering by rows from multiple tables often forces MySQL to
dump the whole join to a temporary table, then sort it -- awful,
especially if you then use
LIMIT BY
to paginate the results. JOIN
clauses with non-trivial conditions, such as joins by range or joins by similarity, generally cause the planner to revert to table scans even if the same condition would be indexable outside of a join.- Joins with
WHERE
clauses that span both tables, where the rows selected by theWHERE
clause are outliers relative to the table statistics, often cause MySQL to access tables in suboptimal order.
- Joining and ordering by rows from multiple tables often forces MySQL to
dump the whole join to a temporary table, then sort it -- awful,
especially if you then use
- Ok, forget about interesting joins. Even interesting
WHERE
clauses can run into trouble: MySQL can't index deterministic functions of a row, either. While some deterministic functions can be eliminated from theWHERE
clause using simple algebra, many useful cases (whitespace-insensitive comparison, hash-based comparisons, and so on) can't.- You can fake these by storing the computed value in the row alongside the "real" value. This leaves your schema with some ugly data repetition and a chance for the two to fall out of sync, and clients must use the "computed" column explicitly.
- Oh, and they must maintain the "computed" version explicitly.
- Or you can use triggers. Ha. See above.
And now you know why MySQL advocates are such big fans of doing dataprocessing in "the client" or "the app".
Alternate Representations and Derived Tables
Many databases let schema designers and administrators abstract the underlying "physical" table structure from the presentation given to clients, or to some specific clients, for any of a number of reasons. MySQL tries to let you do this, too! And fumbles it quite badly.
- As mentioned above, non-trivial views are basically useless. Queries like
SELECT some columns FROM a_view WHERE id = 53
are evaluated in the stupidest -- and slowest -- possible way. Good luck hiding unusual partitioning arrangements or a permissions check in a view if you want any kind of performance. - The poor interactions between triggers and binary logging's default
configuration make it impractical to use triggers to maintain "materialized"
views to avoid the problems with "real" views.
- It also effectively means triggers can't be used to emulate
CHECK
constraints and other consistency features. - Code to maintain materialized views is also finicky and hard to get "right", especially if the view includes aggregates or interesting joins over its source data. I hope you enjoy debugging MySQL's procedural SQL…
- It also effectively means triggers can't be used to emulate
- For the relatively common case of wanting to abstract partitioned storage
away for clients, MySQL actually has a
tool for it! But
it comes with enough caveats to strangle a
horse:
- It's a separate table engine wrapping a "real" storage engine, which
means it has its own, separate support for engine-specific features:
transactions, foreign keys, and index types,
AUTO_INCREMENT
, and others. The syntax for configuring partitions makes selecting the wrong underlying engine entirely too easy, too. - Partitioned tables may not be the referrent of foreign keys: you can't have both enforced relationships and this kind of storage management.
- MySQL doesn't actually know how to store partitions on separate disks or
filesystems. You still need to reach underneath of MySQL do to actual
storage management.
- Partitioning an InnoDB table under the default InnoDB configuration stores all of the partitions in the global tablespace file anyways. Helpful! For per-table configurations, they still all end up together in the same file. Partitioning InnoDB tables is a waste of time for managing storage.
- TL,DR: MySQL's partition support is so finicky and limited that MySQL-based apps tend to opt for multiple MySQL servers ("sharding") instead.
- It's a separate table engine wrapping a "real" storage engine, which
means it has its own, separate support for engine-specific features:
transactions, foreign keys, and index types,
Hosting Logic In The Database
Yeah, yeah, the usual reaction to stored procedures and in-DB code is "eww, yuck!" for some not-terrible reasons, but hear me out on two points:
- Under the freestanding-database-server paradigm, there will usually be network latency between database clients and the database itself. There are two ways to minimize the impact of that: move the data to the code in bulk to minimize round-trips, or move the code to the data.
- Some database administration tasks are better implemented using in-database code than as freestanding clients: complex data migrations that can't be expressed as freestanding SQL queries, for example.
MySQL, as of version5.0
(released in 2003 -- remember that date, I'll come back to it), has support
for in-database code via a procedural SQL-like dialect, like many other SQL
databases. This includes server-side procedures (blocks of stored code that
are invoked outside of any other statements and return statement-like
results), functions (blocks of stored code that compute a result, used in any
expression context such as a SELECT
list or WHERE
clause), and triggers
(blocks of stored code that run whenever a row is created, modified, or
deleted).
Given the examples ofothercontemporaneousprocedurallanguages, MySQL's procedural dialect contains some very strange and unfortunate design choices:
- There is no language construct for looping over a query result. This seems like a pretty fundamental feature for a database-hosted language, but no.
- There is no language construct for looping while a condition holds. This seems like a pretty fundamental feature for an imperative language designed any time after about 1975, but no.
- There is no language construct for looping over a range.
There is, in fact, one language construct for looping: the unconditional loop. All other iteration control is done via conditional
LEAVE
statements, asBEGIN DECLARE c CURSOR FOR SELECT foo, bar, baz FROM some_table WHERE some_condition; DECLARE done INT DEFAULT 0; DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1; DECLARE c_foo INTEGER; DECLARE c_bar INTEGER; DECLARE c_baz INTEGER; OPEN c; process_some_table: LOOP FETCH c INTO c_foo, c_bar, c_baz; IF done THEN LEAVE process_some_table; END IF; -- do something with c_foo, c_bar, c_baz END LOOP; END;
The original "structured programming" revolution in the 1960s seems to have passed the MySQL team by.
Okay, I lied. There are two looping constructs: there's also the
REPEAT ... UNTIL condition END REPEAT
construct, analogous to C'sdo {} while (!condition);
loop. But you still can't loop over query results, and you can't run zero iterations of the loop's main body this way.- There is nothing resembling a modern exception system with automatic scoping
of handlers or declarative exception management. Error handling is entirely
via Visual Basic-style "on condition X, do Y" instructions, which remain in
effect for the rest of the program's execution.
- In the language shipped with MySQL 5.0, there wasn't a way to signal
errors, either: programmers had to resort to stunts like intentionally
issuing failing
queries,
instead. Later versions of the language addressed this with the
SIGNAL
statement: see, they can learn from better languages, eventually.
- In the language shipped with MySQL 5.0, there wasn't a way to signal
errors, either: programmers had to resort to stunts like intentionally
issuing failing
queries,
instead. Later versions of the language addressed this with the
- You can't escape to some other language, since MySQL doesn't have an extension mechanism for server-side languages or a good way to call out-of-process services during queries.
The net result is that developing MySQL stored programs is unpleasant, uncomfortable, and far more error-prone than it could have been.
Why Is MySQL The Way It Is?
MySQL's technology and history contain the seeds of all of these flaws.
Pluggable Storage Engines
Very early in MySQL's life, the MySQL dev team realized that MyISAM was not the only way to store data, and opted to support other storage backends within MySQL. This is basically an alright idea; while I personally prefer storage systems that focus their effort on making one backend work very well, supporting multiple backends and letting third-party developers write their own is a pretty good approach too.
Unfortunately, MySQL's storage backend interface puts a very low ceiling on the ways storage backends can make MySQL behave better.
MySQL's data access paths through table engines are very simple: MySQL asks the engine to open a table, asks the engine to iterate through the table returning rows, filters the rows itself (outside of the storage engine), then asks the engine to close the table. Alternately, MySQL asks the engine to open a table, asks the engine to retrieve rows in range or for a single value over a specific index, filters the rows itself, and asks the engine to close the table.
This simplistic interface frees table engines from having to worry about query optimization - in theory. Unfortunately, engine-specific features have a large impact on the performance of various query plans, but the channels back to the query planner provide very little granularity for estimating cost and prevent the planner from making good use of the engine in unusual cases. Conversely, the table engine system is totally isolated from the actual query, and can't make query-dependent performance choices "on its own". There's no third path; the query planner itself is not pluggable.
Similar consequences apply to type checking, support for new types, or even
something as "obvious" as multiple automatic TIMESTAMP
columns in the same
table.
Table manipulation -- creation, structural modification, and so on -- runs
into similar problems. MySQL itself parses each CREATE TABLE
statement, then
hands off a parsed representation to the table engine so that it can manage
storage. The parsed representation is lossy: there are plenty of forms MySQL's
parser recognizes that aren't representable in a TABLE
structure, preventing
engines from implementing, say, column or tuple CHECK
constraints without
MySQL's help.
The sheer number of table engines makes that help very slow in coming. Any change to the table engine interface means perturbing the code to each engine, making progress on new MySQL-level features that interact with storage such as better query planning or new SQL constructs necessarily slow to implement and slow to test.
Poor Priorities
Early on, the MySQL team focused on pure read performance and on "ease of use" (for new users with simple needs, as far as I can tell) over correctness and completeness, violating Knuth's laws of optimization. Many of these decisions locked MySQL into behaviours very early in its life that it still displays now. Features like implicit type conversions legitimately do help streamline development in very simple cases; experience with other languages unfortunately shows that the same behaviours sandbag development and help hide bugs in more sophisticated scenarios.
While the MySQL (and MariaDB, and Percona) teams have matured greatly, MySQL's
massive and, frequently, not terribly database-heavy userbase makes it very
hard to introduce breaking changes. At the same time, adding optional
breaking changes via server and client mode flags (such as sql_mode
)
increases the cognitive burden involved in understanding MySQL's behaviours --
especially when that behaviour can vary from client to client, or when the
server's configuration is out of your control (on a shared host).
A solution similar to Python's from __future__ import
pragmas for making
breaking changes opt-in some releases in advance of making them mandatory
might help, but MySQL doesn't have the kind of highly-invested, highly-skilled
user base that would make that effective -- and it still has all of the
problems of modal behaviour.
The Inmates are Running The Asylum
It wouldn't be so frustrating if I could assign poor faith to someone, but no, the MySQL folks are here to help.
Bad Arguments
Inevitably, someone's going to come along and tell me how wrong I am and how MySQL is just fine as a database system. These people are everywhere, and they mean well too, and they are almost all wrong. There are two good reasons to use MySQL:
Some earlier group wrote for it, and we haven't finished porting our code off of MySQL. We've considered all of these points, and many more, and decided that ___feature_x___ that MySQL offers is worth the hassle.Unfortunately, these aren't the reasons people do give, generally. The following are much more common:
- It's good enough. No it ain't. There are plenty of other equally-capable
data storage systems that don't come with MySQL's huge raft of edge cases
and quirks.
- We haven't run into these problems. Actually, a lot of these problems happen silently. Odds are, unless you write your queries and schema statements with the manual open and refer back to it constantly, or have been using MySQL since the 3.x era daily, at least some of these issues have bitten you. The ones that prevent you from using your database intelligently are very hard to notice in action.
- We already know how to use it. MySQL development and administration causes brain damage, folks, the same way PHP does. Where PHP teaches programmers that "array" is the only structure you need, MySQL teaches people that databases are awkward, slow, hard-to-tune monsters that require constant attention. That doesn't have to be true; there are comfortable, fast, and easily-tuned systems out there that don't require daily care and feeding or the love of a specialist.
- It's the only thing our host supports.Getabetterhost. It's
not like they're expensive or hard to find.
- We used it because it was there. Please hire some fucking software developers and go back to writing elevator pitches and flirting with Y Combinator.
- Everybody knows MySQL. It's easy to hire MySQL folks. It's easy to hire
MCSEs, too, but you should be hiring for attitude and ability to learn, not
for specific skillsets, if you want to run a successful software project.
- It's popular. Sure, and nobody ever got fired for buying IBM/Microsoft/Adobe. Popularity isn't any indication of quality, and if we let popularity dictate what technology we use and improve we'll never get anywhere. Marketing software to geeks is easy - it's just that lots of high-quality projects don't bother.
- It's lightweight. So's SQLite 3 orH2. If you care about deployment footprint more than any other factor, MySQL is actually pretty clunky (and embedded MySQL has even bigger problems than freestanding MySQL).
- It's getting better, so we might as well stay on it.It's true, if you go by feature checklists and the manual, MySQL is improving "rapidly". 5.6 is due out soon and superficially looks to contain a number of good changes. I have two problems with this line of reasoning: Why wait? Other databases are good now, not eventually. MySQL has a history of providing the bare minimum to satisfy a feature checkbox without actually making the feature work well, work consistently, or work in combination with other features.