small edit on sql

jbcodeforce · Nov 11, 2024 · 6f6b56e · 6f6b56e
1 parent cf5972f
commit 6f6b56e
Show file tree

Hide file tree

Showing 3 changed files with 22 additions and 6 deletions.
diff --git a/docs/architecture/flink-sql.md b/docs/architecture/flink-sql.md
@@ -80,7 +80,6 @@ Note that the SQL Client executes each INSERT INTO statement as a separate Flink
 
 In streaming, the "ORDER BY" statement applies only to timestamps in ascending order, while in batch processing, it can be applied to any record field.
 
-
 ### Data lifecycle
 
 In a pure Kafka integration architecture, such as Confluent Cloud, the data lifecycle follows these steps:

diff --git a/docs/coding/flink-sql.md b/docs/coding/flink-sql.md
@@ -41,7 +41,7 @@ Use one of the following approaches:
     USE `marketplace`;
     SHOW TABLES;
     SHOW JOBS;
-
+    DESCRIBE tablename;
     ```
 
 * Write SQL statements and test them with Java SQL runner. The Class is in [https://github.com/jbcodeforce/flink-studies/tree/master/flink-java/sql-runner](https://github.com/jbcodeforce/flink-studies/tree/master/flink-java/sql-runner) folder.
@@ -67,7 +67,7 @@ Data Definition Language (DDL) are statements to define metadata in Flink SQL by
     );
     ```
 
-???- info "how to join two tables on key within time and store in target table in SQL?"
+???- info "how to join two tables on a key within a time window and store results in target table?"
     ```sql
     create table Transactions (ts TIMESTAMP(3), tid BIGINT, amount INT);
     create table Payments (ts TIMESTAMP(3), tid BIGINT, type STRING);
@@ -166,8 +166,19 @@ Data Definition Language (DDL) are statements to define metadata in Flink SQL by
     alter table flight_schedules add(dt string);
     ```
 
+???- question "Create a table as another by inserting record from another table with similar schema - select (CTAS)"
+    By using a primary key:
+
+    ```sql
+    create table shoe_customer_keyed(
+        primary key(id) not enforced
+    ) distributed by(id) into 1 buckets
+    as select id, first_name, last_name, email from shoe_customers;
+    ```
+
 ??? - question "How to generate data using [Flink Faker](https://github.com/knaufk/flink-faker)?"
     Create at table with records generated with `faker` connector using the [DataFaker expressions.](https://github.com/datafaker-net/datafaker). 
+    Valid only on OSS Flink or on-premises.
 
     ```sql
     CREATE TABLE `bounded_pageviews` (
@@ -188,6 +199,12 @@ Data Definition Language (DDL) are statements to define metadata in Flink SQL by
     ```
     This will only work in customized flink client with the jar from flink faker.
 
+???- info "Generate data with dataGen for Flink OSS"
+    [Use DataGen to do in-memory data generation](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/connectors/table/datagen/)
+
+???- question "How to generate test data to Confluent Cloud Flink?"
+    Use Kafka Connector with DataGen. Those connector exists with a lot of different pre-defined model. Also it is possible to define custom Avro schema and then use predicates to generate data. There is a [Produce sample data quick start tutorial from the Confluent Cloud home page](https://docs.confluent.io/cloud/current/connectors/cc-datagen-source.html). See also [this readme](2933https://github.com/jbcodeforce/flink-studies/tree/master/flink-sql/01-confluent-kafka-local-flink).
+
 ???- question "How to transfer the source timestamp to another table"
     As $rowtime is the timestamp of the record in Kafka, it may be interesting to keep the source timestamp to the downstream topic.
 

diff --git a/flink-sql/01-confluent-kafka-local-flink/README.md b/flink-sql/01-confluent-kafka-local-flink/README.md
@@ -81,7 +81,7 @@ confluent kafka topic create pageviews --cluster <cluster-id>
 confluent kafka cluster describe <cluster-id>
 ```
 
-* Use the local same local docker compose with just task manager, job manager and SQL client containers
+* Use the same local docker compose with just task manager, job manager and SQL client containers
 * Create a table to connect to Kafka change the attributes with API_KEY, API_SECRETS and BOOTSTRAP_SERVER
 
 ```sql
@@ -104,7 +104,7 @@ CREATE TABLE pageviews_kafka (
 );
 ```
 
-* Add a table to generate records
+* Add a table to generate records using the Flink sql client
 
 ```sql
 CREATE TABLE `pageviews` (
@@ -131,5 +131,5 @@ INSERT INTO pageviews_kafka SELECT * FROM pageviews;
 
 ### Problems
 
-10/08/24  the module org.apache.kafka.common.security.plain.PlainLoginModule is missing in job and task managers. We  need to add libraries some how verify if these are the good paths in the dockerfile of sql-client. This is not aligned with https://github.com/confluentinc/learn-apache-flink-101-exercises/blob/master/sql-client/Dockerfile
+10/08/24  the module org.apache.kafka.common.security.plain.PlainLoginModule is missing in job and task managers. We  need to add libraries. Verify if these are the good paths in the dockerfile of sql-client. This is not aligned with https://github.com/confluentinc/learn-apache-flink-101-exercises/blob/master/sql-client/Dockerfile