TechEarl

How to Store an Array in PostgreSQL

PostgreSQL has native array types: any base type can be an array, declared with []. How to insert with the curly-brace literal or ARRAY[...], query with @>, &&, and ANY(), index with GIN, and when an array beats a junction table or jsonb.

Ishan Karunaratne⏱️ 12 min readUpdated
Share thisCopied
How to store an array in PostgreSQL: native array column types declared with [], the curly-brace and ARRAY[...] literal forms, containment and overlap queries with @> and &&, ANY() membership, and a GIN index.

PostgreSQL stores arrays natively: any base type can be an array column, declared by appending [] to the type. tags text[], scores integer[], owner_ids uuid[] are all real columns, no JSON, no separate table. You put values in with the curly-brace literal '{red,green,blue}' or the ARRAY['red','green','blue'] constructor, and you query them with set operators: @> (contains), && (overlap), and = ANY(...) for membership. Below are the literal forms, the query operators, a GIN index that makes containment fast, a worked posts schema, and the harder question of when an array is the wrong tool and a junction table or jsonb is the right one.

Short answer: declare the column as text[] (or integer[], uuid[], whatever the element type is), insert with ARRAY['a','b'], and add a GIN index so WHERE tags @> ARRAY['x'] uses it. The one gotcha that bites everyone: PostgreSQL arrays are 1-indexed, so tags[1] is the first element, not tags[0]. Reach for an array only when the elements are a small bounded set of scalars that belongs to one row; the moment they become entities with their own attributes or foreign keys, use a child table instead. MySQL, for contrast, has no array type at all (the equivalent there is a JSON array or a junction table).

Declaring and inserting an array column

Append [] to any base type to make it an array. The size in the brackets is not enforced, so text[] and text[3] behave identically; just write text[].

sql
CREATE TABLE posts (
  id    bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  title text NOT NULL,
  tags  text[] NOT NULL DEFAULT '{}'
);

There are two ways to write the value. The array literal is a single-quoted string with curly braces: '{summer,sale}'. The ARRAY[...] constructor is a SQL expression: ARRAY['summer', 'sale']. They produce the same value, but the constructor is usually clearer because each element is a normal SQL literal (so quoting and escaping follow the rules you already know), whereas the curly-brace form has its own quoting rules for elements that contain commas, spaces, or braces.

sql
INSERT INTO posts (title, tags) VALUES
  ('Postgres arrays explained', ARRAY['postgres', 'sql', 'arrays']),
  ('A short note',              '{postgres,tips}');

Read an individual element by subscript. This is the gotcha worth repeating: arrays are one-based, so the first element is index 1.

sql
SELECT tags[1] FROM posts WHERE title = 'A short note';
-- postgres

array_length(tags, 1) gives the element count (the 1 is the dimension, since Postgres arrays can be multi-dimensional). unnest(tags) expands the array into one row per element, which is how you join an array against another table or aggregate over its values. The inverse, array_agg(...), collapses a set of rows back into a single array.

Querying arrays: membership, containment, overlap

Three operators cover almost everything you will want to ask of an array column. Write them inside SQL exactly as shown.

Membership: is one value in the array? Use = ANY(...):

sql
SELECT title FROM posts WHERE 'postgres' = ANY(tags);

ANY unpacks the array and returns true if any element matches the left-hand value. It reads backwards from most languages, but value = ANY(array) is the idiomatic "is this value in the array" test.

Containment: does the array include all of these values? Use @> (contains):

sql
SELECT title FROM posts WHERE tags @> ARRAY['postgres', 'sql'];

tags @> ARRAY['postgres','sql'] is true when every element on the right appears somewhere in tags. The mirror operator <@ (is contained by) flips it: ARRAY['postgres'] <@ tags asks whether the left array sits entirely inside the right. Per the PostgreSQL docs, @> means "does the first array contain the second, that is, does each element appearing in the second array equal some element of the first array".

Overlap: do they share any value? Use &&:

sql
SELECT title FROM posts WHERE tags && ARRAY['tips', 'tutorial'];

&& is true when the two arrays have at least one element in common. It is the "tagged with any of these" query, where @> is "tagged with all of these".

Indexing arrays with GIN

The membership and containment queries above work without an index, but they scan every row and test the operator against each array. For a table of any size, that gets slow. The fix is a GIN index (Generalized Inverted Index), which is built exactly for "which rows contain this element" lookups.

sql
CREATE INDEX idx_posts_tags ON posts USING GIN (tags);

PostgreSQL ships a GIN operator class for arrays that supports indexed queries using <@, @>, =, and &&, so the same index accelerates containment (@>, <@), overlap (&&), and whole-array equality lookups. For a single-value membership test, the operator the GIN index serves directly is @>, so write WHERE tags @> ARRAY['postgres'] rather than WHERE 'postgres' = ANY(tags) when you want the index to kick in; the = ANY(...) form is the readable spelling but the planner does not always route it through the array GIN index. A default B-tree index does not help these operators; it indexes whole arrays for ordering and equality of the entire array value, which is rarely what you are querying. For tag-style columns, GIN is the index you want.

When an array is the right tool (and when it isn't)

This is the part worth slowing down on, because arrays are convenient enough to be over-used, and an array standing in for a real relationship becomes an anti-pattern fast.

An array column is a good fit when the elements are a small, bounded list of scalars that belongs to one row and is queried as a set. Tags on a post. Role names on a user. A short list of allowed values, or feature flags, or category labels. The elements have no attributes of their own, you never need to point a foreign key at one, and you almost always read or filter them together with the row they hang off. For that shape, an array is simpler than a second table and the GIN index keeps lookups fast.

Reach for a junction (child) table the moment any of these is true: the elements are entities with their own columns (a tag that has a description, a slug, a created date), you need referential integrity (a foreign key from the element to a real tags table so you cannot store a typo), the list can grow unbounded, or the elements must be queried and shared independently of the parent row. The classic giveaway is wanting to "rename a tag everywhere" or "list the ten most-used tags". With an array you are rewriting strings across every row; with a junction table that is a one-row update and a GROUP BY. Real many-to-many relationships belong in a join table, full stop.

Reach for jsonb when the elements are not uniform scalars but heterogeneous documents, structured objects whose shape varies, or nested data you want to query by path. An array of strings is a text[]; a list of { "label": ..., "weight": ... } objects is jsonb. Don't smuggle structured records into a text[] by concatenating fields with a delimiter; that is a worse version of a jsonb array.

Comparison: array column vs junction table vs jsonb

AspectArray column (text[])Junction tablejsonb column
Element shapeUniform scalarsRows with their own columnsHeterogeneous / nested documents
Referential integrityNone (free text, can hold typos)Foreign keys enforce valid valuesNone (validate in app or with CHECK)
Query a single value= ANY(), @>JOIN + WHERE@>, ->, path operators
IndexingGIN (@>, &&, = ANY)B-tree on the FK columnsGIN on the jsonb
Best whenSmall bounded set of labels per rowReal many-to-many, shared entitiesVariable-shape structured data
Rename / reuse an elementRewrite across every rowOne-row updateRewrite across every row

A worked schema: posts with a tags array

Putting the pieces together. The stable, scalar list of tags lives in a text[], indexed with GIN, and queried by containment and membership.

sql
CREATE TABLE posts (
  id         bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  title      text NOT NULL,
  tags       text[] NOT NULL DEFAULT '{}',
  created_at timestamptz NOT NULL DEFAULT now()
);

CREATE INDEX idx_posts_tags ON posts USING GIN (tags);

INSERT INTO posts (title, tags) VALUES
  ('Indexing arrays in Postgres', ARRAY['postgres', 'indexing', 'gin']),
  ('Tagging with text[]',         ARRAY['postgres', 'arrays']),
  ('A CSS aside',                 ARRAY['css', 'tips']);

Find posts tagged with all of a set (containment, uses the GIN index):

sql
SELECT title FROM posts WHERE tags @> ARRAY['postgres', 'indexing'];
-- Indexing arrays in Postgres

Find posts tagged with a single value (membership):

sql
SELECT title FROM posts WHERE 'postgres' = ANY(tags);
-- Indexing arrays in Postgres
-- Tagging with text[]

Find posts tagged with any of a set (overlap):

sql
SELECT title FROM posts WHERE tags && ARRAY['css', 'gin'];
-- Indexing arrays in Postgres
-- A CSS aside

That is the whole pattern: a typed array column, a GIN index, and the three set operators. If your tags later grow descriptions and slugs and you start wanting to rename them globally, that is your signal to migrate to a tags table plus a post_tags junction. Until then, the array is the leaner model.

Adding and removing elements in place

You will eventually want to add a tag to an existing row rather than rewrite the whole array. Three things do this. array_append(arr, x) adds an element to the end, array_remove(arr, x) strips out every element equal to x, and the || operator concatenates an element or another array onto the column.

sql
-- add a tag
UPDATE posts SET tags = array_append(tags, 'sql') WHERE id = 1;
-- same thing with the concatenation operator
UPDATE posts SET tags = tags || ARRAY['sql'] WHERE id = 1;
-- drop a tag (removes all occurrences)
UPDATE posts SET tags = array_remove(tags, 'gin') WHERE id = 1;

Two things to know. array_append does not deduplicate, so appending a tag the row already has stores it twice; if you need set semantics, guard with a WHERE NOT (tags @> ARRAY['sql']) on the update or de-dupe in the application. And every one of these rewrites the entire array value (Postgres arrays are stored as a single column value, not a set of rows), so the GIN index entry for that row is rebuilt on each update. For a column you churn constantly, that write cost is another nudge toward a junction table.

FAQ

See also

Sources

Authoritative references this article was fact-checked against.

TagsPostgreSQLArraysSchema DesignIndexingGINjsonbData Types

Found this useful? Pass it on.

Copied

Ishan Karunaratne

Tech Architect · Software Engineer · AI/DevOps

Tech architect and software engineer with 20+ years building software, Linux systems, and DevOps infrastructure, and lately working AI into the stack. Currently Chief Technology Officer at a healthcare tech startup, which is where most of these field notes come from.

Keep reading

Related posts

How to Store a bcrypt Password Hash in PostgreSQL

A bcrypt hash is a fixed 60-character string. In PostgreSQL the right column is text (varchar(255) is equivalent). Why text over char(60), app-side hashing, the pgcrypto crypt() option, and a worked users schema.

How to Store an MD5 Hash in PostgreSQL: bytea vs text

How to store an MD5 hash in PostgreSQL: the raw 16-byte bytea you should usually reach for, the readable text/char(32) hex string, and the uuid trick. Storage cost, encode()/decode() instead of HEX/UNHEX, and the pgcrypto digest() function.