Skip to content
Jakub Puchała edited this page Aug 7, 2018 · 14 revisions

Initial assumption

I implicitly assumes you know basics of SQL. If you're not familiar with it, please read about it for example at w3schools.com, then come back here. This description won't be so detailed to explain how to use SQL in general.

Important notes to read

  1. Until this document is not saying different, you can assume that every feature not listed here repeats SQL behavior.
  2. Column names and methods are case sensitive
  3. Keywords are not case sensitive.
  4. Queries are strongly typed. Types must much.
  5. From clause for join syntax must be aliased for parametrizable source. You must reference specific join table by using this alias.
  6. Every first call of a script will be much slower than consecutive calls. This overhead occurs due to SQL -> C# translation and compilation.

Select clause

Select statement gets datas from specific source. As a part of select query, you can use ColumnName, ComplexColumn.Property and also SomeMethod(...). Returning columns or expressions must be aliased when used with set operators or are part of common table expressions. For example:

with Dummy as (
   select 1 + 2 as 'EvaluatedExpression', SomeMethod(SomeColumn) as 'MyMethod', SomeColumn from #dummy.source()
)
select EvaluatedExpression, SomeColumn, MyMethod from Dummmy

in this particular case SomeMethod(SomeColumn) and 1 + 2 must be aliased because they represents complex expression. The other case here are set operators that columns must match on boths sides. Let's look below:

select ColumnName, 1 + 2 as 'EvaluatedExpression', SomeMethod(SomeColumn) as 'MyMethod' from #dummy.source()
union (ColumnName)
select OtherMethod(ColumnName) as ColumnName, 3 + 4 as 'EvaluatedExpression', SomeMethod2(SomeColumn) as 'MyMethod' from #dummy.source()

As you can see, set operators both requires the same name. What is not visible here but is also very important that the expressions on the both sides must return the same types.

From clause

You can see two types of different from form clauses. The first one is from #a.b(param1, ...) and the second one is SomeTableAlias. The basic usages will often use the first one sytnax, the second one you will use for Common Table Expressions. I will describe when and how to use them.

  1. While typing a query, you often see syntax like from #a.b(param1, ...). This syntax specify the external data source you will use. The #a part indicates a schema name and b(...) part is indicator how to instantiate row source that feeds evaluator with the datas. As you noted, source is parametrized which allows to configure what datas you would like to receive. For example, while querying operating system drive, probably you wouldn't want to iterate over whole disc rather that only a set of folders important for you. Parametrization gives that ability to flexibly construct what exactly would you like to query. Narrow down your searches to do it faster. This clause must be aliased while using it in connection with join syntax

  2. Use of Common Table Expressions brings to you new kind of data source that is already preprocessed. By using syntax with X as ( select * from ... ) you express that the result set of the select * from ... query will be accessible with the name X which became your brand new data source. You can use multiple table expression with the following syntax:

    with X as ( select * from ... ), Y as ( select * from ... where ... ) select * from X ...

    This syntax is much more flexible and allows you also to combine table expressions each other. Please note that this kind of data source may not be aliased in contrast to first one.

Where clause

This clause allows you to filter expressions to get only rows that match conditions. Here, you can use columns ColumnName, their properties ColumnName.Property, methods SomeMethod(...) and expressions like 3 + 1 + SomeMethod(..). You are allowed to use a method result as a parameter of other method unless the type returned by method call match outer method definition. To combine conditions you can use and and or syntax.

... where ColumnName = 'something' and SomeMethod(DifferentColumnName) > 7 ...

Group by and Having operators

Group by operator allows you to define columns or expressions accord with your rows will be grouped. You can define multiple groups. While grouping, your select clause should only contains grouping expressions or column names or aggregation methods. Using non aggregation methods are also allowed. Columns that aren't part of grouping operator are not allowed.

Parent Aggregation

You can compute aggregation for parent groups by placing amount of groups you would like to skip as a parameter of a method. For example, when your query groups by Country and City and you are summing it's population Sum(Population), then you would like to sum population for overall country at the same time. If so, then you should add also Sum(1, Population) which will computes the sum only for Country without taking account different cities across the country. You can combine this across a single select clause.

Aggregation Methods

You can use methods like Sum, SumIncome, SumOutcome, Count, AggregateValue and others to perform values aggregations on rows belonging to specific group. Broadly speaking, the rules from the SQL applies here so after you use group by ColumnName, then only what your select can return is ColumnName and aggregated values like Sum(SomeColumn). Please, note that * operator in aggregation operators are not supported.

select ColumnName, Count(ColumnName) from #schema.method() group by ColumnName

You can also group by values affected by some method. The only rule here is to repeat this part in your select clause (if you need it) bothly as a group indicator and as a grouping value. This is how it should looks like

select SomeMethod(ColumnName), Sum(SomeMethod(ColumnName)) from #schema.method() group by SomeMethod(ColumnName)

CTE

A common table expression is an expression that you can think of as it creates temporary named result set. Effectively clause with TempResultSetName as (... Inner Query ...) Outer Query calculates Inner Query and store it in scope of Outer Query to being able to use it here. Temporary result set can also be used as a source of another table expression. Please, note that recursive cte are not supported.

Usefull language specific features

  1. Complex object accessing - Is a feature you can use while you're column is of complex type. Use the query syntax ColumnName.SomeProperty to get value of SomeProperty from ColumnName object. ColumnName type must contain SomeProperty property. In case it does have valid property, column become of type that SomeProperty is. You can use chained call syntax like ColumnName.SomeProperty.DifferentProperty. Also, you are allowed to get values from arrays with following syntax ColumnName.Prop[0].
  2. Like / not like operators - You can use this operators to look for some specific patterns in query. You can use both, % and _ syntax to find values matches your pattern.
  3. RLike / not rlike operators - It is the same operator as like operator but instead of using wildcard, specify regular expression here.
  4. Contains operator - this operator is working like in operator currently.
  5. In operator - this operator is usefull when you have to access your column and multiple values are considered as valid. It looks like ColumnName in ('abc', 'cda')
  6. Is null / is not null operators - Use this operator to check whetever your column is null. You can use it bothly for values or expressed by reference columns. Please note that using it for values don't have sense and in fact, evaluator will get rid of such check. Usage is like ColumnName is null or ColumnName is not null

Set operators

Set operators are a bit different for Musoq than for operators used in DBMS. Those compares each other rows and calculates the score for the whole query. For Musoq, you have to specify which columns will take part in comparsion which looks like query1 union all (Column1, Column2, ...) query2. Columns must have the same name and order. If you would like to use different columns for comparsion, then you must alias such column to align it's name. This is how it should looks like select Column from #schema.method() union all (Column) select Column2 as Column from #schema.method() You musn't mix types returned by columns and count of returned values for both columns must be the same.

  1. Union all - Combine the result sets without taking an account of possible duplicates.
  2. Union - Combine the result sets and get rid of duplicated rows by selecting the first one.
  3. Except - Return distinct rows from the left query that doesn't appear in the right query.
  4. Intersect - Get only those rows that are both in left and right queries.

Joins support

Right now, the only supported operator is inner join. It's syntax is followed ...from #schema.method(...) a inner join #schema.method() b on a.SomeColumn = b.SomeOtherColumn. Parametrizable source (ie. #schema.method()) must be aliased, In memory doesn't have to.

How do I know what parametrs can I pass to my source?

I am currently trying to choose what syntax would be the most appropriate.

How do I know what columns can I use?

Each plugin can be queried for columns it has by typing desc #schema.method(...,params). Every single plugin supports it out of the box even if it's third party plugin. See example below.

desc #os.files('path/to/folder', 'false')

may shows

Name Index Type
Name 0 String
CreationTime 1 DateTime
FullName 2 String

Which means that the virtual table has 3 columns Name, CreationTime, FullName (Hint: real disk plugin has more columns).

Why must I type parameters for source?

Because the source is computed on the fly. There is no limitation for the plugin to provide different columns while passing different parameters.