八、注意事项

1.构造大批量数据

若需要构造大批量数据，原生python将耗费大量时间，请使用pypy执行datafaker。例如：

pypy -m datafaker hbase localhost:9090 PIGONE 50000 --meta hbase.txt

或者多线程执行, 8个线程产生数据，每次批量写入pg 2000条数据:

datafaker mysql postgresql+psycopg2://postgres:postgres@localhost/testpg pig_fnumbe_test 100000 --meta meta.txt --worker 8 --batch 2000

2.写入Hbase报错Broken pipe

是由于hbase设置的hbase.thrift.server.socket.read.timeout参数过小，默认为60秒因此在conf/hbase-site.xml中添加上配置即可：

<property>
         <name>hbase.thrift.server.socket.read.timeout</name>
         <value>600000</value>
         <description>eg:milisecond</description>
</property>

重启hbase, 重启thrift

3. 支持关系型数据库

例子中大部分是以mysql为例子展示的。只要支持sqlachemy的关系型数据库都是可以的，例如pg, oracle, tidb，redshift等等。但type类型都是rdb，例如： datafaker rdb postgresql+psycopg2://postgres:postgres@localhost/testpg pig_fnumbe_test 100000 --meta meta.txt --worker 8 --batch 2000

写入oracle

datafaker rdb oracle://root:root@127.0.0.1:1521/helowin stu 10 --meta meta.txt

sqlalchemy连接串必须为oracle:形式

4. 测试情况

操作系统	python版本	测试情况
Mac osx	python2.7/3.5+	通过
Linux	python2.7	通过
Windows10	python3.6	通过

5. 每隔一定时间一条条写入数据i

需要设置interval和batch参数，例如： datafaker rdb postgresql+psycopg2://postgres:postgres@localhost/testpg pig_fnumbe_test 100000 --meta meta.txt --interval 0.5 --batch 1

xx. 其他问题

如果需要写入其他数据库请给我提issue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

注意事项.md

注意事项.md

八、注意事项

1.构造大批量数据

2.写入Hbase报错Broken pipe

3. 支持关系型数据库

4. 测试情况

5. 每隔一定时间一条条写入数据i

xx. 其他问题

Files

注意事项.md

Latest commit

History

注意事项.md

File metadata and controls

八、注意事项

1.构造大批量数据

2.写入Hbase报错Broken pipe

3. 支持关系型数据库

4. 测试情况

5. 每隔一定时间一条条写入数据i

xx. 其他问题