Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when loading json model #170

Closed
BMJHayward opened this issue Nov 10, 2020 · 1 comment
Closed

Error when loading json model #170

BMJHayward opened this issue Nov 10, 2020 · 1 comment

Comments

@BMJHayward
Copy link
Contributor

Steps to reproduce:

Train a model using e.g. ImportanceCmd:
$./bin/variant-spark --local -- importance -if data/chr22_1000.vcf -ff data/chr22-labels.csv -fc 22_16051249 -rn 10 -rbs 10 -om target/ch22-model.java -sr 13 -v

Then load that model using e.g. AnalyzeRFCmd:
$./bin/variant-spark --local -- analyze-rf -im target/ch22-model.json

Gives the following exception:

java.io.StreamCorruptedException: invalid stream header: 7B0A2020 at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:900) at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358) at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.<init>(JavaSerializer.scala:63) at org.apache.spark.serializer.JavaDeserializationStream.<init>(JavaSerializer.scala:63) at org.apache.spark.serializer.JavaSerializerInstance.deserializeStream(JavaSerializer.scala:122) at au.csiro.variantspark.cli.AnalyzeRFCmd$$anonfun$1.apply(AnalyzeRFCmd.scala:81) at au.csiro.variantspark.cli.AnalyzeRFCmd$$anonfun$1.apply(AnalyzeRFCmd.scala:80) at au.csiro.pbdava.ssparkle.common.utils.LoanUtils$.withCloseable(LoanUtils.scala:18) at au.csiro.variantspark.cli.AnalyzeRFCmd.run(AnalyzeRFCmd.scala:80) at au.csiro.sparkle.common.args4j.ArgsApp.run(ArgsApp.java:46) at au.csiro.sparkle.cmd.CmdApp.runApp(CmdApp.java:9) at au.csiro.sparkle.cmd.CmdApp.runApp(CmdApp.java:18) at au.csiro.sparkle.cmd.MultiCmdApp.runCommandOrClass(MultiCmdApp.java:58) at au.csiro.sparkle.cmd.MultiCmdApp.run(MultiCmdApp.java:54) at au.csiro.sparkle.cmd.CmdApp.runApp(CmdApp.java:9) at au.csiro.sparkle.cmd.CmdApp.runApp(CmdApp.java:18) at au.csiro.pbdava.ssparkle.common.arg4j.AppRunner$.mains(AppRunner.scala:17) at au.csiro.variantspark.cli.VariantSparkApp$.main(VariantSparkApp.scala:26) at au.csiro.variantspark.cli.VariantSparkApp.main(VariantSparkApp.scala)

This is because we can output trained models as json, but currently don't handle json format for input models.

I suggest creating a ModelInputArgs to mirror ModelOutputArgs, and add support for reading regular json files as an instance or RandomForestModel.

@rocreguant
Copy link
Collaborator

As you mention this error is caused by saving the model in java but trying to load a json file.
I'll close this issue since it's related to #149

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants