Skip to content Skip to sidebar Skip to footer

Postgresql Data: Array To String Clarification

I am currently working on a task that will migrate a date from PostgreSQL to another PostgreSQL database. One field's data needs to be splitted into three columns (e.g. father_name

Solution 1:

While it is possible to turn an array into a set of columns you won't have a fixed set of columns. For example, if you split father_name into three pieces that's fine for John Wilkes Booth but what about Yarrow Hock? Or Beyoncé? Or Bernal Diaz Del Castillo? You need something more intelligent than just splitting on whitespace.

While you could write something in Postgresql, probably as a stored procedure, it's easier, though slower, to do the data transforms in Python. Since you have to run the data through Python anyway (or do something complicated to link the two databases), and since this is (hopefully) a one time thing, performance isn't critical.

I'm not very good at Python, but it would be something like this.

cur_t.execute("""SELECT studentnumber, father_name FROM something""")

for row in cur_t:
    father = parse_name(row['father_name'])
    student_id = fix_studentnumber(row['studentnumber'])

    cur_p.execute("""
        INSERT INTO "a_recipient" (student_id, f_name, f_middle_name, f_last_name)
        VALUES ('%s', '%s', '%s', '%s')
        """ % (student_id, father['first'], father['middle'], father['last'])
    )

Then you'd write parse_name and fix_studentnumber and any other necessary functions to clean up the data in Python. And you can unit test them.

Note: because accessing columns by number (ie. row[5]) is difficult to read and maintain you'll probably want to use conn_t.cursor(cursor_factory=psycopg2.extras.DictCursor) so you can access columns by name as I have above.


Solution 2:

why not do it directly n SQL:

vao@so=# create table so12(a text, b text, c text);
CREATE TABLE
vao@so=# with a(i) as (values('1,2,5'))
, s as (select string_to_array(i,',') ar from a)
insert into so12 select ar[1],ar[2],ar[3] from s;
INSERT 0 1
vao@so=# select * from so12;
┌───┬───┬───┐
│ a │ b │ c │
├───┼───┼───┤
│ 125 │
└───┴───┴───┘
(1 row)

update I missed the point that it happens in scope of several databases, thus you would need to use dblink or create a postgres_fdw foreign table. Both would still be much faster then selecting to array and then looping through rows with insert into .. values(..) statement


Post a Comment for "Postgresql Data: Array To String Clarification"