The saga continues…
In Part I of this series, we met our initial cast of characters. The VARIANT datatype, the COPY command, and how to use the
strip_outer_array parameter when loading XML content into a Snowflake table.
In Part II, the heroes of the story emerged: the SQL functions
XMLGET(). We showed how to use these functions to write standard SQL queries against an XML document stored in a variant column, and even how to reach into deeply nested XML elements.
In Part III, we met the villain: the repeating XML element array! We brought out the big…
This is the third in a series of posts that cover tips and techniques for working with XML-based data in Snowflake. Part 1 of the series was a general introduction to Snowflake’s support for semi-structured data formats, and how to use the
COPY statement to load XML files into a Snowflake
VARIANT column. Part 2 of the series introduced the SQL syntax for retrieving data out of a variant column containing XML, using the
XMLGET() functions. I showed how to walk the XML node hierarchy by “nesting” the
In this installment, I’ll show how to navigate…
In Part I of this series on working with XML data in Snowflake, I gave a little refresher on the basics of XML syntax, and covered the basics of semi-structured data in Snowflake. I showed how to create a table with a
VARIANT column, and how to stage an XML file for ingestion. We then looked at two flavors of the COPY statement which loads the file into the Snowflake table, and showed the effect of changing the value of the strip_outer_element parameter.
In this installment, I’ll show some SQL statements you can use to get data out of the…
OK, well maybe not so secret, since they’re well-documented. Everybody knows about internal stages. But there are two additional types of internal stages that have special properties, and every Snowflake developer should understand how — and when — to use them.
We all know what Stages are for, right? They function as the “landing spot” for data files on their way into Snowflake. The typical pattern is for an ETL/ELT tool to extract data from the operational system of record, and write out files as CSV, JSON, XML, or Parquet files in a cloud-based storage location. Snowflake then reads those…
I know there are developers out there that love working in the command line. I’ve heard stories... I don’t happen to be one of them. Give me a well-crafted, responsive UI with visual feedback and graphical buttons I can click on. Oh, and get off my lawn.
SnowSQL is Snowflake’s command-line utility. It’s available on all major operating systems. I use it on occasion because there are certain features (like the PUT command) that just aren’t available in the web UI. One thing I miss when using the command line is seeing my current “context”. …
Any database can handle strongly-typed, or “structured” data. With structured data, every column is assigned a specific datatype (e.g., date, string, number, timestamp, etc.), and each column can only contain data that conforms to its specified datatype. This has been a key aspect of relational databases from, like, forever. But in today’s world, data comes in all shapes and sizes, including the “semi-structured” formats of JSON, XML, Avro, ORC, and Parquet. (We’ll leave the discussion of “unstructured” data to another post…)
Many organizations now depend on these “semi-structured” formats for critical business processes. The natural impedance that exists between the…
Snowpipe is the serverless ingestion engine for the Snowflake Data Cloud. This is the ingestion option that supports “near real-time” streaming instead of traditional batch-based ETL (extract/transform/load) style data loading. Snowpipe has several great advantages over batch ETL loading:
One of Snowflake’s key differentiating features is our native support for semi-structured data formats, including JSON, XML, Parquet, ORC, and AVRO. Rather than serializing these formats into one long string and forcing the developer to parse them apart at query time, or requiring a complex pre-loading transformation, Snowflake allows them to be stored in their native format, directly in the database. This uses a special datatype called a variant. As semi-structured data is being loaded into a Snowflake variant column, the columnarized metadata for the document is captured and stored. More of a “schema on load” vs. “schema on read”.
Visual Studio Code is a free, cross-platform Integrated Development Environment (IDE) available from Microsoft, and compatible with Windows, Mac OS/X, and Linux. It’s the “baby brother” of their larger, more feature-rich Visual Studio IDE. The choice of IDE among developers is a fairly religious battle, but I’ve become very fond of VSCode and it’s my go-to for just about everything I do. One of VSCode’s great features is its simple extensibility. Rather than relying on Microsoft to add features to the IDE, which would delay release dates and bloat the base IDE package, they’ve built a simple extension mechanism called…
Principal Sales Engineer with Snowflake — US Public Sector.