Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration Data SqL server to Postgres fails me in big tables, the process stuck #1623

Open
6 tasks
delpine72 opened this issue Nov 8, 2024 · 5 comments
Open
6 tasks

Comments

@delpine72
Copy link

Thanks for contributing to pgloader by reporting an
issue! Reporting an issue is the only way we can solve problems, fix bugs,
and improve both the software and its user experience in general.

The best bug reports follow those 3 simple steps:

  1. show what you did,
    I run the pgloader for migrate a database from SQL server to Postgres.
    we have issues in the tables with more than 2 million records, I can not migrate big tables the process stuck

  2. show the result you got,


[postgres@rbalvdpost03 Archive]$ pgloader --verbose auditRecordsMaterialized.load
2024-11-08T15:32:21.002000Z NOTICE Starting pgloader, log system is ready.
2024-11-08T15:32:21.010000Z LOG pgloader version "3.6.7~devel"
2024-11-08T15:32:21.157001Z LOG Migrating from #<MSSQL-CONNECTION mssql://sqlpostgres_user@Sqlserver.emea.com:1433/Core.Pack232Sandpit {10067E6D43}>
2024-11-08T15:32:21.157001Z LOG Migrating into #<PGSQL-CONNECTION pgsql://postgres2@Postgreserver.emea.com:5432/Core.Pack232BigTable {10067E7043}>
Max connections reached, increase value of TDS_MAX_CONN
2024-11-08T15:32:21.538003Z NOTICE Prepare PostgreSQL database.
2024-11-08T15:32:21.678004Z NOTICE COPY audittrail.auditrecordsmaterialized with 0 rows estimated [3/4]
Max connections reached, increase value of TDS_MAX_CONN


The console show this, not advances more,

In the Sql server I see the Select consult of the table is supended state and it has async_network

and in Postgres server I see the copy command but it is in idle status.


My load file is this one:

load database
from mssql://sqlpostgres_user:xxxxxxxxx@PostgresServer.emea.com/Core.Pack232Sandpit
into postgresql://postgres2:xxxxxxx@SqlServer.emea.com/Core.Pack232BigTable

WITH include drop, truncate, create tables, create indexes, reset sequences, foreign keys

including only table names like 'AuditRecordsMaterialized' in schema 'AuditTrail'

SET work_mem to '256MB', maintenance_work_mem to '1024 MB';

  1. explain how the result is not what you expected.

The tables is not migrated to the postgres server it has 6 millions of records, other tables of the same database and less records have been migrated.

Any idea how I face this.

Best Regards

In the case of pgloader, here's the information I will need to read in your
bug report. Having all of this is a big help, and often means the bug you
reported can be fixed very efficiently as soon as I get to it.

Please provide the following information:

```

pgloader version "3.6.7~devel"
compiled with SBCL 2.2.10-1.rhel9
```

  • did you test a fresh compile from the source tree? No

    Compiling pgloader from sources is documented in the
    README, it's
    easy to do, and if patches are to be made to fix your bug, you're going
    to have to build from sources to get the fix anyway…

  • did you search for other similar issues? No

  • how can I reproduce the bug? No

    Incude a self-contained pgloader command file.

    If you're loading from a database, consider attaching a database dump to
    your issue. For MySQL, use mysqldump. For SQLite, just send over your
    source file, that's easy. Maybe be the one with your production data, of
    course, the one with just the sample of data that allows me to reproduce
    your bug.

    When using a proprietary database system as a source, consider creating
    a sample database on some Cloud service or somewhere you can then give
    me access to, and see my email address on my GitHub profile to send me
    the credentials. Still open a public issue for tracking and as
    documentation for other users.

--
-- EDIT THIS FILE TO MATCH YOUR BUG REPORT
--

LOAD CSV

     FROM INLINE with encoding 'ascii'
     INTO postgresql:///pgloader
     TARGET TABLE jordane

     WITH truncate,
          fields terminated by '|',
          fields not enclosed,
          fields escaped by backslash-quote

      SET work_mem to '128MB',
          standard_conforming_strings to 'on'

   BEFORE LOAD DO
    $$ drop table if exists jordane; $$,
    $$ CREATE TABLE jordane
       (
         "NOM" character(20),
         "PRENOM" character(20)
       )
    $$;

BORDET|Jordane
BORDET|Audrey
LASTNAME|"opening quote
BONNIER|testprenombe~aucouptroplong
JOURDAIN|héhé¶
  • pgloader output you obtain
PASTE HERE THE OUTPUT OF THE PGLOADER COMMAND
  • data that is being loaded, if relevant
PASTE HERE THE DATA THAT HAS BEEN LOADED
  • How the data is different from what you expected, if relevant
@CIO-COSINE
Copy link

I had a similar issue with the above, running the Docker image in Windows 10. I had tried many ways, including allowing more resources to WSL2 and disabling swap, however the only way I got it not fail due to heap exhaustion for me on big tables was including the following parameter in the WITH clause:

prefetch rows = 10000

This can be adjusted accordingly to your data volume.

@delpine72
Copy link
Author

Hi Cio, I really appreciate your post. I did as you say increasing to 50000 however this not works, I don't know how many prefetch rows I have to set to 3000000 of register.

At the beginning, when I run the pgloader in the console using the ps command, I can see it is working. The status is COPY. After some minutes, the status appears as dle in transaction (aborted).
Also I see my CPU is very high
image
I don't know if I need more resources.
Thank you in advance

@delpine72
Copy link
Author

this is my cpu
processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 79
model name : Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz

@CIO-COSINE
Copy link

Hi Cio, I really appreciate your post. I did as you say increasing to 50000 however this not works, I don't know how many prefetch rows I have to set to 3000000 of register.

At the beginning, when I run the pgloader in the console using the ps command, I can see it is working. The status is COPY. After some minutes, the status appears as dle in transaction (aborted). Also I see my CPU is very high image I don't know if I need more resources. Thank you in advance

I am not an expert on this, but I will suggest some things to help your troubleshooting.

  1. Perhaps try with a lower value of prefetch rows? Find below my load options that I used to make it work with my setup (I am loading directly from SQLServer, you can ignore the CAST and BEFORE/AFTER LOAD DO sections I have)
LOAD DATABASE
<omitted connection strings>
WITH
     include drop,
     create tables,
     create indexes,
     reset sequences,
     prefetch rows = 1000
SET 
     work_mem to '512MB',
     maintenance_work_mem to '1024MB'
CAST
     type datetime to timestamptz using zero-dates-to-null,
     type int when (= precision 10) to bigint,
     type nvarchar to text drop typemod using remove-null-characters
BEFORE LOAD DO
	 $$ CREATE EXTENSION IF NOT EXISTS "uuid-ossp"; $$,
	 $$ CREATE EXTENSION IF NOT EXISTS "system_stats"; $$
AFTER LOAD DO
     $$ GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO postgres; $$;

With the above, I was able to load a database that had a table with 23,653,246 rows (and consumed about 4GB of space)

  1. As I am assuming you are running this with Docker on Windows, I found sometimes the console log was not reliable or helpful enough to help with troubleshooting, or even monitoring the progress. While the process is running, check the following folder on the Docker Container (the pgloader.log file will also help), and may give you some further hints on where the process is hanging:
    image

I hope this helps.

@delpine72
Copy link
Author

Hi CIO,
I’m really grateful for your help, I will test this next monday, crossfingers. At the moment I have not access to the logs. I have not root access at the moment. I will check them as soon as I have them. Thanks again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants